ask the driver for supported modifiers for a given format.
v2: move to __DRIimageExtension v16.
v3: fail if the supplied format is not supported by driver.
v4: purge PIPE_CAP_QUERY_DMABUF_ATTRIBS.
v5:
- move to __DRIimageExtension v15, pass external_only to the driver.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> (v4)
Cc: Lucas Stach <l.stach@pengutronix.de>
format modifiers tokens are driver specific, and hence, need to come
in from the driver. this allows drivers to be queried for supported
format modifiers for EGL_EXT_image_dma_buf_import_modifiers.
v2: rebase to master.
v3: drivers must return false on query failure.
v4: use pscreen->is_format_supported instead of adding a separate
format query handle, remove PIPE_CAP_QUERY_DMABUF_ATTRIBS.
(Lucas Stach)
v5: add external_only parameter.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
ask the driver for supported dmabuf formats
v2: rebase to master.
v3: return false on failure.
v4: use pscreen->is_format_supported instead of adding a new query.
(Lucas Stach)
v5: stylefix to conform to formatting rules (Brian Paul). add fourcc list
here instead of using struct image_format from v4.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> (v4)
Cc: Lucas Stach <l.stach@pengutronix.de>
support importing dmabufs into DRIimage while taking format modifiers
in account, as per DRIimage extension version 15.
v2: initialize winsys modifier to DRM_FORMAT_MOD_INVALID (Daniel Stone)
v3: do not bump DRIimageExtension version. split out winsys changes.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
adds a pscreen->resource_create_with_modifiers() to create textures
with modifier.
v2:
- stylefixes (Emil Velikov)
- don't return selected modifier from resource_create_with_modifiers. we can
use the winsys_handle to get this.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> (v1)
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
return the modifier selected by the driver when creating this image.
v2: since we can use winsys_handle->modifier to serve these, remove
DRIimage->modifier from v1.
use DRM_API_HANDLE_TYPE_KMS instead of DRM_API_HANDLE_TYPE_FD to avoid
ownership transfer. (Lucas)
Suggested-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
we use this to import resources with format modifiers, and to support
per-resource modifier queries.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Remove c++14 generic lambda to keep compiler requirement at c++11.
No regressions on piglit or vtk test suites.
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
CC: mesa-stable@lists.freedesktop.org
LLVMAddEarlyCSEMemSSAPass() is defined in LLVM 4.0.
Fixes: 257b538 ("radeonsi: do EarlyCSEMemSSA LLVM pass)
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The workaround causes a massive performance decrease on 1-SE parts.
(Cape Verde, Hainan, Oland)
The performance regression is already part of 17.0 and 17.1.
v2: check tess_uses_prim_id
Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This and the previous clip_regs commit decrease IB sizes and the number of
si_update_shaders invocations as follows:
IB size si_update_shaders calls
Borderlands 2 -10% -27%
Deus Ex: MD -5% -11%
Talos Principle -8% -30%
v2: always dirty cb_render_state in set_framebuffer_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Fix build error.
CC i915_surface.lo
i915_surface.c:108:63: error: too few arguments to function call, expected 4, have 3
util_blitter_default_src_texture(&src_templ, src, src_level);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
../../../../src/gallium/auxiliary/util/u_blitter.h:271:1: note: 'util_blitter_default_src_texture' declared here
void util_blitter_default_src_texture(struct blitter_context *blitter,
^
Fixes: a893c91697 ("gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101340
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As TS is also allowed on sampler resources, we need to make sure to resolve
to self when binding the resource as a texture, to avoid stale content
being sampled.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
A resolve to self is only necessary if the resource is fast cleared, so
there is never a need to do so if there is no TS allocated.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Stolen from VC4. As we don't do any fancy reallocation tricks yet, it's
possible to upgrade also coherent mappings and shared resources.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
There is no need to special case compressed resources, as they are already
marked as linear on allocation. With that out of the way, there is room to
cut down on the number of if clauses used.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reduces bandwidth usage of transfers which discard the buffer contents,
as well as skipping unnecessary command stream flushes and CPU/GPU
synchronization.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
This gets rid of quite a bit of CPU/GPU sync on frequent vertex buffer
uploads and I haven't seen any of the issues mentioned in the comment,
so this one seems stale.
Ignore the flag if there exists a temporary resource, as those ones are
never busy.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
cpu_prep() already does all the required waiting, so the only thing that
needs to be done is flushing the commandstream, if a GPU write is pending.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
v2: Add a func pointer to radeon_winsys to support radeon later.
Change-Id: I614ea71424f9e5c97e4ae68654315d28c89eaa5f
Signed-off-by: Samuel Li <Samuel.Li@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
These will need to be in place to avoid regressions when
removing these includes from the u_dynarray
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.
v2: Marek - set the flags outside the loop
- also clear and set framebuffer.do_update_surf_dirtiness there
- do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that LLVM IR looks like CSE has been run on it. It's also recommended
by the instruction combining pass.
This also fixes:
- GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash)
- piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail)
The code size decrease is positive, the register usage isn't. There is
a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown
and GRID Autosport.
EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE.
SGPRS: 1935420 -> 1938076 (0.14 %)
VGPRS: 1645504 -> 1645988 (0.03 %)
Spilled SGPRs: 2493 -> 2651 (6.34 %)
Spilled VGPRs: 107 -> 115 (7.48 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 1512 -> 1516 (0.26 %) dwords per thread
Code Size: 61981592 -> 61890012 (-0.15 %) bytes
Max Waves: 371847 -> 371798 (-0.01 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We can use a union in si_shader_key::mono.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Heaven LDS usage for LS+HS is below. The masks are "outputs_written"
for LS and HS. Note that 32K is the maximum size.
Before:
heaven_x64: ls=1f1 tcs=1f1, lds=32K
heaven_x64: ls=31 tcs=31, lds=24K
heaven_x64: ls=71 tcs=71, lds=28K
After:
heaven_x64: ls=3f tcs=3f, lds=24K
heaven_x64: ls=7 tcs=7, lds=13K
heaven_x64: ls=f tcs=f, lds=17K
All other apps have a similar decrease in LDS usage, because
the "outputs_written" masks are similar. Also, most apps don't write
POSITION in these shader stages, so there is room for improvement.
(tight per-component input/output packing might help even more)
It's unknown whether this improves performance.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If the XRGB view is sampling from an ARGB svga format, change
PIPE_SWIZZLE_W to PIPE_SWIZZLE_1 for all channels.
Previously we unconditionally set PIPE_SWIZZLE_1 on the alpha channel which
could be both insufficient and incorrect.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When deciding to create a view with or without an alpha channel we need to
look at the SVGA3D format and not the PIPE format.
This fixes the glx-tfp piglit test for dri3/xa.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Gallium RGB textures may be backed by imported ARGB svga3d surfaces. In those
and similar cases we need to set the alpha value to 1 when sampling.
Fixes piglit glx::glx-tfp
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
For the purpose of surface sharing, treat SVGA3D_R5G6B5 and
SVGA3D_B5G6R5_UNORM as identical formats.
This fixes the following piglit tests with dri3/xa:
glx@glx-visuals-depth -pixmap
glx@glx-visuals-stencil -pixmap
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Singh Rawat <drawat@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Allow gallium drivers to turn off GLX_EXT_buffer_age and
GLX_OML_sync_control if needed, using driconf.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There will be situations where we want to control, for example, the
GLX behaviour based on applications and drivers. So allow DRI users access
to the driver options.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
LS is merged into TCS. If there is no TCS, LS is merged into fixed-func
TCS. The problem is the fixed-func TCS was ignored by scratch update
functions, so LS didn't have the scratch buffer set up.
Note that Mesa 17.1 doesn't have merged shaders.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>