I noticed the inefficiency in NIR-to-TGSI output while trying to debug a
failure handling some arrays in r600. While this makes reading CTS
shaders easier, the effect in the real world is pretty limited. From
softpipe shader-db:
total instructions in shared programs: 2929840 -> 2929836 (<.01%)
instructions in affected programs: 118 -> 114 (-3.39%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14321>
This allows the exported fds to be mapped for writing. My use case is
for virtio-gpu blob resources where the fds are mapped rw and mappings
are added to the guests using KVM_SET_USER_MEMORY_REGION.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14699>
With upstream mesa PIPE_CAP_IMAGE_STORE_FORMATTED needs to be set to enable
ARB_shader_image_load_store extension. This will reenable GL43 support for svga GL43 capable
device
Fixes: 3b81d2d30d ('mesa/st: do not expose ARB_shader_image_load_store if not fully implemented')
Tested with glretrace
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14688>
I missed updating this code to check res->aux.clear_color_unknown when
I added it a while back. While we're here, also refactor this code into
a helper function - I'll want to use it in another place shortly.
Fixes: e83da2d8e3 ("iris: Don't try to CPU read imported clear color BOs")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This splits iris_blorp_exec() into separate functions for executing on
the render command streamer and the blitter command streamer. A future
patch could add a separate iris_blorp_exec_compute() path that skips a
bunch of render-specific work.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This makes blits, copies, and (non-fast) clears set the appropriate
BLORP_BATCH_USE_{COMPUTE,BLITTER} flag if their batch is either
IRIS_BATCH_COMPUTE or IRIS_BATCH_BLITTER. We ignore the other
operations for now as those don't support compute or blit yet.
Of course, there is no code to attempt to launch BLORP operations on
either the compute or blitter batches yet, but that will come in time.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
We removed all the hardware blitter support from i965 years ago because
the blitter was not worth using (limited functionality, bad performance,
extra synchronization, and worse). However, on Tigerlake there are new
blitter commands that are actually fast and allow us to do proper
asynchronous copies while 3D is busy doing other work.
So, reintroduce the blitter. We'll want to use it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This introduces a new blorp_copy() path using the new XY_BLOCK_COPY_BLT
blitter command introduced on Tigerlake. Unlike the blitter commands of
old, this one is actually fast and worth using. Although it doesn't use
shaders like the rest of BLORP, we still can use some surface-munging
code from there, and BLORP also provides a nice place to put this which
is shared among the drivers.
To use the new path, set BLORP_BATCH_USE_BLITTER (much like Jordan's
recent BLORP_BATCH_USE_COMPUTE bit) and target the batch at the copy
engine.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This will be used as a performance hint for XY_BLOCK_COPY_BLT to
indicate whether the source/destination surfaces are (likely) in
device-local memory or system memory. We don't need to be precise
here - it's okay to set the fields to LOCAL even if a buffer has
been evicted out to system memory.
We should set this from Vulkan too, but I haven't yet. There isn't
a convenient anv_bo field like there is in iris...
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This should only be set on XeHP. It implies that CCS works via based on
the virtual addresses involved and a flat memory carve-out, rather than
treating CCS like a surface, or using auxiliary maps.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
The logic in st_atom_shader.c leads me to believe this was supposed
to work, but was incomplete to actually finish it. This fixes
compatibility tess tests on d3d12.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14662>
Indentation fail. This should happen once per instruction, not once per
destination. In theory, this is a minor performance win; in practice,
it's simply less wrong.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14575>
The spec says only polygons, not points/lines, should be culled when
culling is enabled. The hardware does not make this distinction, so we
have to.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reported-by: Icecream95 <ixn@disroot.org>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14575>
Use a Gallium helper that papers over the differences between primitive
types, as required by hardware operation.
[Cc'd to mesa-stable for use in the next commit.]
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14575>
We are going to need to extend the cache key to add state that effects
the program stateobj, but not necessarily the shader itself (ie. so
ir3_shader_key wouldn't be the correct place to add it).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14643>
Lowered clip planes should respect the enabled/disabled GL_CLIP_PLANEn
(aka GL_CLIP_DISTANCEn), which means updating the rast state as well.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14643>
This lets us support indirect access to UBOs easily. The existing
constant special case disappears too, since the peephole optimizer can
inline the constant later. (note: this is too conservative since we can
go up to 16-bit immediates...)
Unfortunately, nir_opt_algebraic can't seem to optimize expressions like
"((a << 3) + 4) >> 2" to "(a << 1) + 1" which would be necessary for
reasonable perf out of this...
Fixes:
dEQP-GLES2.functional.shaders.indexing.uniform_array.float_dynamic_loop_read_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14581>