Discussion:
[cairo] cairo_xcb_surface_create() segfaults on second call with different xcb info
Ryan Flannery
2018-11-08 03:48:20 UTC
Permalink
Hi,

I'm running into a segfault with cairo_xcb_surface_create() in an
application I'm working on. My application can (now) create an X
display (using cairo and other APIs), do some work, then destroy that
UI, and then possibly create another. Whenever I try to create a
second display, I get a crash in cairo_xcb_surface_create() reliably.

I have sample code below that I've stripped-down and reproduces the
segfault reliably on OpenBSD current (and the recent 6.4 release), as
well as a recent-ish release of Arch linux, all with latest versions
of cairo and X libs.

I'm guessing I'm either missing some tear-down/cleanup between calls
that I'm not seeing from the documentation and google'ing, or there's
some state/etc within the cairo API I'm missing.

Anyone seen something similar or have any idea why this might be caused?

Sample code below. The "test()" function is where I create an x window
(and related bits), followed by a cairo surface and cairo_t, and then
simply sleep for a second and tear everything down. The second call to
test() will reliably segfault on cairo_xcb_surface_create().

Cheers,
-Ryan

/*
* Test demonstrating how a second call to cairo_xcb_surface_create() bombs.
* Note I've stripped down most of the setup/etc to be as minimal as possible.
* Error checking on all xcb/cairo calls previous show no errors when present.
*/
#include <stdio.h>
#include <unistd.h>

#include <xcb/xcb.h>
#include <cairo/cairo-xcb.h>

xcb_visualtype_t*
get_root_visual(xcb_screen_t *screen)
{
xcb_depth_iterator_t i = xcb_screen_allowed_depths_iterator(screen);
for (; i.rem; xcb_depth_next(&i)) {
if (i.data->depth != 32)
continue;

xcb_visualtype_iterator_t vi;
vi = xcb_depth_visuals_iterator(i.data);
for (; vi.rem; xcb_visualtype_next(&vi)) {
return vi.data;
}
}

return NULL;
}

void
test()
{
xcb_connection_t *x = xcb_connect(NULL, NULL);
xcb_drawable_t w = xcb_generate_id(x);
xcb_screen_t *s = xcb_setup_roots_iterator(xcb_get_setup(x)).data;
xcb_visualtype_t *v = get_root_visual(s);

xcb_create_window(
x,
32, /* force 32 bit */
w,
s->root,
0, 0, 0, 0, /* x,y,w,h */
0, /* border width */
XCB_WINDOW_CLASS_INPUT_OUTPUT,
s->root_visual,
0, NULL);

cairo_surface_t *cs = cairo_xcb_surface_create(
x,
w,
v,
1000,
1000);
cairo_t *c = cairo_create(cs);

sleep(1);

cairo_destroy(c);
cairo_surface_destroy(cs);
xcb_destroy_window(x, w);
xcb_disconnect(x);
}

int
main()
{
printf("first call...\n");
test();
printf("second call...\n");
test(); /* segfaults on cairo_xcb_surface_create() */

return 0;
}
--
cairo mailing list
***@cairographics.org
https://lists.cairographic
Uli Schlachter
2018-11-08 20:36:17 UTC
Permalink
Hi Ryan,
Post by Ryan Flannery
I'm guessing I'm either missing some tear-down/cleanup between calls
that I'm not seeing from the documentation and google'ing, or there's
some state/etc within the cairo API I'm missing.
Anyone seen something similar or have any idea why this might be caused?
Cairo has to get quite some information from the X11 server. Which
extensions are supported? In which version? Does shared memory work?
Stuff like this.

Obviously, querying this all the time would be slow. Thus, cairo caches
this information. Namely, there is an instance of cairo_device_t. This
instance is kept around even when all surfaces using this device are
destroyed.

When a new cairo surface is created, it needs to get an instance of
cairo_device_t for all of this information. To identify an
already-existing device, the xcb_connection_t* pointer is used. Same
pointer means same XCB connection. Thus, when you call xcb_disconnect(),
the cache now contains a dangling pointer. The next call to
xcb_connect() might very well allocate an xcb_connection_t* with the
same pointer. Thus, you now get a cache hit even though there is a new
XCB connection. Bad things happen afterwards.

(But right now I do not know where you crash comes from)
Post by Ryan Flannery
Sample code below. The "test()" function is where I create an x window
(and related bits), followed by a cairo surface and cairo_t, and then
simply sleep for a second and tear everything down. The second call to
test() will reliably segfault on cairo_xcb_surface_create().
Fun fact: Does not crash here on Debian testing.
Dies in a failed assertion when run under xtrace (sometimes known as
x11trace).

[...]
Post by Ryan Flannery
cairo_destroy(c);
Add this here:

cairo_device_finish(cairo_surface_get_device(cs));
Post by Ryan Flannery
cairo_surface_destroy(cs);
xcb_destroy_window(x, w);
xcb_disconnect(x);
}
If you want to keep the device around for later (i.e. have multiple
surface using the same device), you can save a pointer via:

cairo_device_t *device = cairo_device_reference(....);

Now, you have to later call cairo_device_destroy() when you no longer
need the reference, but you get a pointer to the cairo_device_t
independent of a cairo xcb surface.

Oh and: You have to finish the device before you call xcb_disconnect().
Finishing the device can still cause X11 requests to be sent (i.e. to
send out pending drawing that was not yet done).

Also: Finishing the device automatically finishes all cairo_surface_t
instances for this XCB devices.

Cheers,
Uli

P.S.: Someone ran into the same problem just a few days ago and asked in
IRC. There, the symptom was X11 errors, because cairo was trying to use
XIDs which were already freed. Must be season for this kind of problem. :-)
--
"Why make things difficult, when it is possible to make them cryptic
and totally illogical, with just a little bit more effort?" -- A. P. J.
--
cairo mailing list
***@cairographics.org
https:
Ryan Flannery
2018-11-10 05:36:12 UTC
Permalink
Hi Uli,

Thanks much for your reply and very thorough explanation!
Post by Uli Schlachter
Hi Ryan,
Post by Ryan Flannery
I'm guessing I'm either missing some tear-down/cleanup between calls
that I'm not seeing from the documentation and google'ing, or there's
some state/etc within the cairo API I'm missing.
Anyone seen something similar or have any idea why this might be caused?
Cairo has to get quite some information from the X11 server. Which
extensions are supported? In which version? Does shared memory work?
Stuff like this.
Obviously, querying this all the time would be slow. Thus, cairo caches
this information. Namely, there is an instance of cairo_device_t. This
instance is kept around even when all surfaces using this device are
destroyed.
When a new cairo surface is created, it needs to get an instance of
cairo_device_t for all of this information. To identify an
already-existing device, the xcb_connection_t* pointer is used. Same
pointer means same XCB connection. Thus, when you call xcb_disconnect(),
the cache now contains a dangling pointer. The next call to
xcb_connect() might very well allocate an xcb_connection_t* with the
same pointer. Thus, you now get a cache hit even though there is a new
XCB connection. Bad things happen afterwards.
(But right now I do not know where you crash comes from)
THANK YOU! This explains it. There is some state cairo in tracking,
namely that pointer value (not the contents, but the pointer value
itself) for a sort-of caching (it sounds like).

With this, I believe I can work-around it in my application and keep
the current design. Specially, I can keep the "UI portions"
independent of each other (desirable) provided I guarantee the
xcb_connection_t pointer passed is a different one, for each run.
That's easy enough on my side.
Post by Uli Schlachter
Post by Ryan Flannery
Sample code below. The "test()" function is where I create an x window
(and related bits), followed by a cairo surface and cairo_t, and then
simply sleep for a second and tear everything down. The second call to
test() will reliably segfault on cairo_xcb_surface_create().
Fun fact: Does not crash here on Debian testing.
Dies in a failed assertion when run under xtrace (sometimes known as
x11trace).
[...]
Post by Ryan Flannery
cairo_destroy(c);
cairo_device_finish(cairo_surface_get_device(cs));
Post by Ryan Flannery
cairo_surface_destroy(cs);
xcb_destroy_window(x, w);
xcb_disconnect(x);
}
This sounds great.
I think I can trivially modify my code with this to fix the issue and
still keep the separate UI runs independent of each other.
Post by Uli Schlachter
If you want to keep the device around for later (i.e. have multiple
cairo_device_t *device = cairo_device_reference(....);
Now, you have to later call cairo_device_destroy() when you no longer
need the reference, but you get a pointer to the cairo_device_t
independent of a cairo xcb surface.
Oh and: You have to finish the device before you call xcb_disconnect().
Finishing the device can still cause X11 requests to be sent (i.e. to
send out pending drawing that was not yet done).
Also: Finishing the device automatically finishes all cairo_surface_t
instances for this XCB devices.
Ok - this raises a question for me about the device state between
surface creation.
Do you mean one of the following:

1. Calling cairo_device_finish(cairo_surface_get_device(cs)) when
destroying a xcb/cairo setup should, effectively, remove the state
tracked in cairo_xcb_surface_create() (and thus allow a subsequent
call with the same pointer value, but different contents, succeed)?

-or-

2. That there's still state there and I should track that
cairo_device_t* reference on my side, even if I call the
cairo_device_finish() bits?

I could probably answer that on my own if I had my dev laptop with me,
but I'm sadly away from that for 2 days and still so curious on this
:D

Thanks so much again for your explanation!! I'm confident I can get
this working on my side once I'm back.

Cheers,
-Ryan
--
cairo mailing list
***@cairographics.org
https://lists.cairog
Uli Schlachter
2018-11-10 09:11:36 UTC
Permalink
On 10.11.18 06:36, Ryan Flannery wrote:
[...]
Post by Ryan Flannery
Post by Uli Schlachter
Also: Finishing the device automatically finishes all cairo_surface_t
instances for this XCB devices.
Ok - this raises a question for me about the device state between
surface creation.
1. Calling cairo_device_finish(cairo_surface_get_device(cs)) when
destroying a xcb/cairo setup should, effectively, remove the state
tracked in cairo_xcb_surface_create() (and thus allow a subsequent
call with the same pointer value, but different contents, succeed)?
-or-
2. That there's still state there and I should track that
cairo_device_t* reference on my side, even if I call the
cairo_device_finish() bits?
I think I mean (1). Finishing the cairo_device_t instance tries to get
rid of everything that uses the underlying xcb_connection_t*. It does
not do this completely, but the difference should not matter to you.

The reason is reference counting: Finishing the cairo_device_t finishes
everything using this cairo_device_t / xcb_connection_t. "Finishes" here
means "makes unusable". However, it does not actually free everything,
since other pieces of code could still have a reference that keeps the
cairo_device_t alive (e.g. through cairo_device_reference,
cairo_surface_reference, or just cairo_xcb_surface_create). All this
"state" is cannot be freed yet, because otherwise the code having that
reference would end up with a dangling pointer. But, any attempt of
using one of these objects generates a CAIRO_STATUS_SURFACE_FINISHED or
CAIRO_STATUS_DEVICE_FINISHED error status.

The important bit for you is that finishing the connection removes the
cairo_device_t from cairo's internal cache, so any following attempts at
using the same xcb_connection_t will create a new cairo_device_t instance.
Post by Ryan Flannery
I could probably answer that on my own if I had my dev laptop with me,
If you want to dig into some code:
src/cairo-xcb-connection.c has a _cairo_xcb_device_backend struct that
defines the operations on a cairo_xcb_connection_t (which is basically a
non-public subtype of cairo_device_t).

_cairo_xcb_connection_get() is used to "turn" a xcb_connection_t into a
cairo_xcb_connection_t. This first checks a global list called
"connections" for a matching entry and otherwise creates a new instance
of a cairo_xcb_connection_t.

_device_finish() does the work in cairo_device_finish(). It gets rid of
open fonts and a cairo-specific representation of an X11 screen (which
contains a list of all cairo-xcb surfaces using this one screen; thus
this also cairo_surface_finish()es all surfaces using this device).

Later, _device_destroy() is run when the reference count of the
cairo_device_t actually drops to zero. This does not do anything with
the xcb_connection_t, but instead does all the freeing that is
necessary. E.g., some internal mutexes are destroyed. This also free()s
the cairo_xcb_connection_t itself.

Hope this helps,
Uli
--
"Why make things difficult, when it is possible to make them cryptic
and totally illogical, with just a little bit more effort?" -- A. P. J.
--
cairo mailing list
***@cairographics.org
https://lists.cairographics.org/mailma
Loading...