Discussion:
[cairo] [PATCH 3/3] win32: Allow gdi operations for argb32 surfaces (allowed by surface flags)
Vasily Galkin
2018-04-28 19:27:01 UTC
Permalink
This ends the patch series that speedups the CAIRO_OPERATOR_SOURCE
when used to copy data
to a argb32 cairo surface corresponding to a win32 dc
from a "backbuffer" - DibSection-based cairo surface
created with cairo_surface_create_similar().

This final patch allows gdi compositor to be used on argb32 surfaces.
Actually for display surfaces only copying is allowed with gdi (by BitBlt),
since other operations are filtered by flags in implementations.

But since copying pixels is the only used operation in common scenario
"prepare offscreen image and put it to screen" - this is important for
presenting argb32 windows with cairo directly
or with gtk+gdk (which nowdays always create argb32 windows)

Before this patch pixel copy worked by:
1. mapping image to memory (by copying data from window dc to system memory
which is very slow on windows maybe due to gpu or interprocess access)
2. copying new data over that image.
3. copying updated image from system memory back to window dc.
After this patch there is only one step:

2+3. Copying new data over window dc.

Completely eliminating step 1 gives a very huge speedup and allows
argb32 cairo drawing be as fats as typical dibsection-buffered gdi drawing.

There is quick&dirty cairo-vs-gdi perf test made for this patch set:
https://gitlab.gnome.org/galkinvv/cairo/snippets/109
The results show multiple times improvement:

Before speedup

Painting 5000 32bits-per-pixel single-color frames of size 1056x1056 for profiling
GDI entire pipeline : 4.123983 GB/s, 5408.053900 ms, 924.546998 FPS
GDI entire drawing : 4.156272 GB/s, 5366.039400 ms
cairo entire pipeline : 0.835951 GB/s, 26679.463300 ms, 187.410067 FPS
cairo entire drawing : 0.838992 GB/s, 26582.750800 ms
cairo fill inmem : 16.130683 GB/s, 1382.627100 ms
cairo to window : 1.102623 GB/s, 20226.963700 ms

After speedup (running several times shows that there is 5-10% inaccuracy, so this results sgouldn't be used as a source for comparing raw gdi vs cairo)

Painting 5000 32bits-per-pixel single-color frames of size 1056x1056 for profiling
GDI entire pipeline : 4.139421 GB/s, 5387.883400 ms, 928.008204 FPS
GDI entire drawing : 4.165124 GB/s, 5354.635400 ms
cairo entire pipeline : 4.029344 GB/s, 5535.075100 ms, 903.330110 FPS
cairo entire drawing : 4.063073 GB/s, 5489.126000 ms
cairo fill inmem : 22.665569 GB/s, 983.991200 ms
cairo to window : 5.049950 GB/s, 4416.423700 ms

End-user visible speedup does present too - it relates to the following bug

https://gitlab.gnome.org/GNOME/meld/issues/133

Cairo speedup allow more simultaneous meld windows
without eating 100% of cpu core time on spinner rendering.

gtk's speedup is near 1.7x, not such huge as pure cairo ~7-8x on results above
It looks that gtk has some problems in caching cairo surfaces
and recreates them every frame with initial black fill.
---
src/win32/cairo-win32-gdi-compositor.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/win32/cairo-win32-gdi-compositor.c b/src/win32/cairo-win32-gdi-compositor.c
index 0873391..4a09a70 100644
--- a/src/win32/cairo-win32-gdi-compositor.c
+++ b/src/win32/cairo-win32-gdi-compositor.c
@@ -488,7 +488,8 @@ static cairo_bool_t check_blit (cairo_composite_rectangles_t *composite)
if (dst->fallback)
return FALSE;

- if (dst->win32.format != CAIRO_FORMAT_RGB24)
+ if (dst->win32.format != CAIRO_FORMAT_RGB24
+ && dst->win32.format != CAIRO_FORMAT_ARGB32)
return FALSE;

if (dst->win32.flags & CAIRO_WIN32_SURFACE_CAN_BITBLT)
--
2.9.3
--
cairo mailing list
***@cairographics.org
https://lists.cairograp
Vasily Galkin
2018-04-28 19:27:00 UTC
Permalink
This belongs to a patch series that speedups the CAIRO_OPERATOR_SOURCE
when used to copy data
to a argb32 cairo surface corresponding to a win32 dc
from a "backbuffer" - DibSection-based cairo surface
created with cairo_surface_create_similar().

This patch introduces checks that ensure that no solid brush gdi operations
are done with argb32 surfaces.
This is needed for enabling gdi compositor usage for win32 argb32 surfaces.

To make this checks working the calculatinig function correctly fills
argb32 flags disabling STRETCHBLT, STRETCHDIB and RGB_BRUSH.

_cairo_win32_flags_for_dc refactored to make rgb24 vs argb32 distinction
more readable. All logic&flags for rgb24 surfaces are kept,
except addition of CAIRO_WIN32_SURFACE_CAN_RGB_BRUSH.

The logic of forbidding AlphaBlend on display surfaces
kept as is without investigation.
---
src/win32/cairo-win32-device.c | 36 ++++++++++++++++++++++------------
src/win32/cairo-win32-gdi-compositor.c | 18 ++++++++++++++++-
2 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/src/win32/cairo-win32-device.c b/src/win32/cairo-win32-device.c
index 309f16c..a2c9508 100644
--- a/src/win32/cairo-win32-device.c
+++ b/src/win32/cairo-win32-device.c
@@ -156,17 +156,31 @@ unsigned
_cairo_win32_flags_for_dc (HDC dc, cairo_format_t format)
{
uint32_t flags = 0;
- int cap;
+ cairo_bool_t is_display = GetDeviceCaps(dc, TECHNOLOGY) == DT_RASDISPLAY;
+
+ if (format == CAIRO_FORMAT_RGB24 || format == CAIRO_FORMAT_ARGB32)
+ {
+ int cap = GetDeviceCaps(dc, RASTERCAPS);
+ if (cap & RC_BITBLT)
+ flags |= CAIRO_WIN32_SURFACE_CAN_BITBLT;
+ if (!is_display && GetDeviceCaps(dc, SHADEBLENDCAPS) != SB_NONE)
+ flags |= CAIRO_WIN32_SURFACE_CAN_ALPHABLEND;

- cap = GetDeviceCaps(dc, RASTERCAPS);
- if (cap & RC_BITBLT)
- flags |= CAIRO_WIN32_SURFACE_CAN_BITBLT;
- if (cap & RC_STRETCHBLT)
- flags |= CAIRO_WIN32_SURFACE_CAN_STRETCHBLT;
- if (cap & RC_STRETCHDIB)
- flags |= CAIRO_WIN32_SURFACE_CAN_STRETCHDIB;
+ /* ARGB32 available operations is a strict subset of RGB24 available
+ * operations. It's because the same gdi functions can be used but most
+ * of them always reset alpha channel to 0 which is bad for ARGB32.
+ */
+ if (format == CAIRO_FORMAT_RGB24)
+ {
+ flags |= CAIRO_WIN32_SURFACE_CAN_RGB_BRUSH;
+ if (cap & RC_STRETCHBLT)
+ flags |= CAIRO_WIN32_SURFACE_CAN_STRETCHBLT;
+ if (cap & RC_STRETCHDIB)
+ flags |= CAIRO_WIN32_SURFACE_CAN_STRETCHDIB;
+ }
+ }

- if (GetDeviceCaps(dc, TECHNOLOGY) == DT_RASDISPLAY) {
+ if (is_display) {
flags |= CAIRO_WIN32_SURFACE_IS_DISPLAY;

/* These will always be possible, but the actual GetDeviceCaps
@@ -181,10 +195,6 @@ _cairo_win32_flags_for_dc (HDC dc, cairo_format_t format)
flags |= CAIRO_WIN32_SURFACE_CAN_STRETCHBLT;
flags |= CAIRO_WIN32_SURFACE_CAN_STRETCHDIB;
#endif
- } else {
- cap = GetDeviceCaps(dc, SHADEBLENDCAPS);
- if (cap != SB_NONE)
- flags |= CAIRO_WIN32_SURFACE_CAN_ALPHABLEND;
}

return flags;
diff --git a/src/win32/cairo-win32-gdi-compositor.c b/src/win32/cairo-win32-gdi-compositor.c
index 2858aff..0873391 100644
--- a/src/win32/cairo-win32-gdi-compositor.c
+++ b/src/win32/cairo-win32-gdi-compositor.c
@@ -179,6 +179,9 @@ fill_boxes (cairo_win32_display_surface_t *dst,

TRACE ((stderr, "%s\n", __FUNCTION__));

+ if ((dst->win32.flags & CAIRO_WIN32_SURFACE_CAN_RGB_BRUSH) == 0)
+ return CAIRO_INT_STATUS_UNSUPPORTED;
+
fb.dc = dst->win32.dc;
fb.brush = CreateSolidBrush (color_to_rgb(color));
if (!fb.brush)
@@ -213,6 +216,7 @@ copy_boxes (cairo_win32_display_surface_t *dst,
struct copy_box cb;
cairo_surface_t *surface;
cairo_status_t status;
+ cairo_win32_surface_t *src;

TRACE ((stderr, "%s\n", __FUNCTION__));

@@ -230,8 +234,16 @@ copy_boxes (cairo_win32_display_surface_t *dst,
&cb.tx, &cb.ty))
return CAIRO_INT_STATUS_UNSUPPORTED;

+ src = to_win32_surface(surface);
+
+ if (src->format != dst->win32.format &&
+ !(src->format == CAIRO_FORMAT_ARGB32 && dst->win32.format == CAIRO_FORMAT_RGB24))
+ {
+ /* forbid copy different surfaces unless it is from argb32 to rgb (alpha-drop) */
+ return CAIRO_INT_STATUS_UNSUPPORTED;
+ }
cb.dst = dst->win32.dc;
- cb.src = to_win32_surface(surface)->dc;
+ cb.src = src->dc;

/* First check that the data is entirely within the image */
if (! _cairo_boxes_for_each_box (boxes, source_contains_box, &cb))
@@ -614,6 +626,10 @@ _cairo_win32_gdi_compositor_glyphs (const cairo_compositor_t *compositor,
cairo_win32_display_surface_t *dst = to_win32_display_surface (composite->surface);

TRACE ((stderr, "%s\n", __FUNCTION__));
+
+ if ((dst->win32.flags & CAIRO_WIN32_SURFACE_CAN_RGB_BRUSH) == 0)
+ return CAIRO_INT_STATUS_UNSUPPORTED;
+
status = _cairo_win32_display_surface_set_clip(dst, composite->clip);
if (status)
return status;
--
2.9.3
--
cairo mailing list
***@cairographics.org
https://lists.cairograp
LRN
2018-04-29 13:46:37 UTC
Permalink
Post by Vasily Galkin
gtk's speedup is near 1.7x,
2.3-2.5x for me
Post by Vasily Galkin
not such huge as pure cairo ~7-8x on results above
It looks that gtk has some problems in caching cairo surfaces
and recreates them every frame with initial black fill.
Not exactly. Here's FPS data for GTK-3.22 fishbowl with old and new cairo:

old and busted:
Layered mode - 35.2 fps
Normal mode - 21.2 fps
Normal mode (optimized double-buffering) - 21.2 fps
Normal mode (generic double-buffering) - 20.0 fps

new hotness:
Layered mode - 35.0 fps
Normal mode - 21.2 fps
Normal mode (optimized double-buffering) - 53 fps (x2.5 faster than normal)
Normal mode (generic double-buffering) - 49 fps (x2.3 faster than normal)

Layered mode doen't blit anything, so has no improvements (the 0.2 change is
likely due to measurement error).

Normal mode doesn't seem to be covered with the "draw everything into a buffer
then blit once" case that you've optimized for, so it also has no visible
improvements.

Double-buffered mode with GDK built-in double-buffering (where GDK creates a
new double-buffer on every redraw) is x2.3 faster, and optimized backend
double-buffering (where DB surface is not re-created on every redraw) is x2.5
faster.

Note that in either case GDK will erase the painted region (well, in case of
generic DB it likely re-creates the new DB surface in a clear state) before
drawing anything, which is required for correct alpha-transparency - otherwise
semi-transparent regions will "stack up" on every redraw. If i deliberately
disable that eraser code, optimized double-buffering fps increases to 55 fps
(i.e. very little), but alpha-transparent regions are screwed.

FPS values are for the GTK fishbowl benchmark window maximized on my 4K
desktop, so, taking into account the taskbar, that makes it 3591x2160, and the
fishbowl widget itself is a bit smaller vertically.

Anyway, i'm pretty sure that GTK is drawing as best as it can, and there's no
x7 speedup anywhere in sight (that said, i was also pretty sure that cairo was
drawing as best as it can; shows what i know...).

As for the x1.7 vs x2.3, it could be attributed to you [presumably] having
smaller test windows, in which case the time spent on actual blitting is
smaller, and thus the speedup, only affecting that time, doesn't have as much
impact.
Bryce Harrington
2018-06-01 01:32:35 UTC
Permalink
This is beginning of a patch series that speedups the CAIRO_OPERATOR_SOURCE
when used to copy data
to a argb32 cairo surface corresponding to a win32 dc
from a "backbuffer" - DibSection-based cairo surface
created with cairo_surface_create_similar().
This initial patch presents only private header changes
without changing any implementation logic.
The big problem with argb32 surfaces and gdi is the gdi's inability to
correctly set alpha channel on all operations except BitBlt and AlphaBlend
So CAIRO_WIN32_SURFACE_CAN_RGB_BRUSH flag introduced in this commit
will be used as a mark that surfaces that correctly such handle brushes -
essentially all surface types except argb32.
_cairo_win32_flags_for_dc receives new argument
that would be used in flag calculation.
---
src/win32/cairo-win32-device.c | 2 +-
src/win32/cairo-win32-display-surface.c | 4 ++--
src/win32/cairo-win32-printing-surface.c | 2 +-
src/win32/cairo-win32-private.h | 5 ++++-
4 files changed, 8 insertions(+), 5 deletions(-)
I'm not able to test on win32 myself, but the code changes look
straightforward enough, and feedback from testers seems quite positive.

I've done some copyediting of the commit message and comments but
otherwise applied the patches as proposed and landed to trunk:

To ssh://git.freedesktop.org/git/cairo
85fe4de..c6e12d3 master -> master

Bryce
diff --git a/src/win32/cairo-win32-device.c b/src/win32/cairo-win32-device.c
index c60c494..309f16c 100644
--- a/src/win32/cairo-win32-device.c
+++ b/src/win32/cairo-win32-device.c
@@ -153,7 +153,7 @@ _cairo_win32_device_get (void)
}
unsigned
-_cairo_win32_flags_for_dc (HDC dc)
+_cairo_win32_flags_for_dc (HDC dc, cairo_format_t format)
{
uint32_t flags = 0;
int cap;
diff --git a/src/win32/cairo-win32-display-surface.c b/src/win32/cairo-win32-display-surface.c
index 92d1f6c..0d737f5 100644
--- a/src/win32/cairo-win32-display-surface.c
+++ b/src/win32/cairo-win32-display-surface.c
@@ -258,7 +258,7 @@ _create_dc_and_bitmap (cairo_win32_display_surface_t *surface,
}
}
- surface->win32.flags = _cairo_win32_flags_for_dc (surface->win32.dc);
+ surface->win32.flags = _cairo_win32_flags_for_dc (surface->win32.dc, format);
return CAIRO_STATUS_SUCCESS;
@@ -973,7 +973,7 @@ cairo_win32_surface_create_with_format (HDC hdc, cairo_format_t format)
surface->is_dib = FALSE;
surface->saved_dc_bitmap = NULL;
- surface->win32.flags = _cairo_win32_flags_for_dc (surface->win32.dc);
+ surface->win32.flags = _cairo_win32_flags_for_dc (surface->win32.dc, format);
device = _cairo_win32_device_get ();
diff --git a/src/win32/cairo-win32-printing-surface.c b/src/win32/cairo-win32-printing-surface.c
index 7f374a0..8496077 100644
--- a/src/win32/cairo-win32-printing-surface.c
+++ b/src/win32/cairo-win32-printing-surface.c
@@ -2159,7 +2159,7 @@ cairo_win32_printing_surface_create (HDC hdc)
return _cairo_surface_create_in_error (_cairo_error (CAIRO_STATUS_NO_MEMORY));
}
- surface->win32.flags = _cairo_win32_flags_for_dc (surface->win32.dc);
+ surface->win32.flags = _cairo_win32_flags_for_dc (surface->win32.dc, CAIRO_FORMAT_RGB24);
surface->win32.flags |= CAIRO_WIN32_SURFACE_FOR_PRINTING;
_cairo_win32_printing_surface_init_ps_mode (surface);
diff --git a/src/win32/cairo-win32-private.h b/src/win32/cairo-win32-private.h
index 6fdf96f..79e1e0f 100644
--- a/src/win32/cairo-win32-private.h
+++ b/src/win32/cairo-win32-private.h
@@ -81,6 +81,9 @@ enum {
/* Whether we can use the CHECKJPEGFORMAT escape function */
CAIRO_WIN32_SURFACE_CAN_CHECK_PNG = (1<<8),
+
+ /* Whether we can use gdi drawing with solid rgb brush with this surface */
+ CAIRO_WIN32_SURFACE_CAN_RGB_BRUSH = (1<<9),
};
typedef struct _cairo_win32_surface {
@@ -186,7 +189,7 @@ _cairo_win32_surface_get_extents (void *abstract_surface,
cairo_rectangle_int_t *rectangle);
uint32_t
-_cairo_win32_flags_for_dc (HDC dc);
+_cairo_win32_flags_for_dc (HDC dc, cairo_format_t format);
cairo_int_status_t
_cairo_win32_surface_emit_glyphs (cairo_win32_surface_t *dst,
--
2.9.3
--
cairo mailing list
https://lists.cairographics.org/mailman/listinfo/cairo
--
cairo mailing list
***@cairographics.org
https://lists.cairographics.org/mailman/listin
Loading...