There's no good reason we want to read RGBA for SNORM RB textures. Let's
correct the preferred read-format here.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28120>
Change wrap_surface to always take the ownership of the wrapped surface and
always check for an allocated value to make sure to not crash here.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28025>
The `stride` and `offset` attributes are meaningful for the "virtual"
register files (VGRFs, UNIFORMs and ATTRs). Accumulator is an ARF so
validation should check `hstride` (part of the <V,W,H> triple) and `subnr`
instead.
Fixes: 12d7aaf2b8 ("intel/compiler: add more validation for acc register usage")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>
This ensure the region triple <V,W,H> is set correctly, in this case the
desired region is a sequential like <8,8,1>. Without the helper the
sequence we get is <0,1,0> -- which the generator currently partially
adjusts when emitting code, but is not sufficient when doing validation
earlier.
The code generated code is slightly modified. From crucible test
func.shader.subtractSaturate.uint in the fragment shader for SIMD8, the
diff looks like
```
mov(8) acc0<1>UD g21<8,8,1>UD { align1 1Q $0.dst };
-add.sat(8) g22<1>UD -acc0<0,1,0>UD g16<8,8,1>UD { align1 1Q @1 $0.dst };
+add.sat(8) g22<1>UD -acc0<8,8,1>UD g16<8,8,1>UD { align1 1Q @1 $0.dst };
```
Note that without the patch generator adjusted the hstride for acc0 used
as destination (see brw_set_dest), but kept the src region as is. For
the source, it is not clear to me why the <0,1,0> would work correctly
here since it is a scalar, but using <8,8,1> it is correct.
Fixes: 58907568ec ("intel/fs: Add SHADER_OPCODE_[IU]SUB_SAT pseudo-ops")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>
ARB_shader_image_load_store requires that drivers can compile shaders
containing these tokens, but it doesn't require that they can execute
them. thus, deleting the multisample component is fine since these
shaders will never be executed
affects/fixes:
KHR-GL46.gl_spirv.spirv_validation_capabilities_test
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28078>
This register has changed a little bit for LNL.
While this fixes sparse with TR-TT, it is worth remembering that LNL
is using sparse with vm_bind by default.
v2: Use the proper value instead of hardcoding 0xF (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27316>
bi_emit_blend_op has a hardcoded assumption that 4 components were used
for blending. This causes validation errors in some situations where
fewer components were actually used, because the number of staging
registers did not match the number of registers in the actual input.
Fix this by extending the source to 4 components.
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28057>
The new panthor support needs some additional ioctl() calls. Added some
defaults for these to drm-shim, so that we can use drm-shim again for
testing shaders.
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28132>
It was correct for the parameters that the driver was using, but incorrect
for other parameters.
1. The address computation must multiply the workgroup size (wave size)
by num_mem_ops to fix the case when num_dwords_per_thread > 4.
2. nir_load_ssbo shouldn't set the number of components to 4 when
num_dwords_per_thread < 4.
Fixes: 6584088cd5 - radeonsi: "create_dma_compute" shader in nir
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28119>
Setting vindex != NULL (even if it's 0) selects a struct.buffer.load opcode,
which causes LLVM to look for "index * stride + offset" in voffset and
moves "index" to vindex (i.e. not 0 anymore), but the bounds checking
(OOB_SELECT) is set to ignore vindex. Setting vindex = NULL selects
a raw.buffer.load opcode.
Fixes: 6b573c00c9 - ac/nir: use ac_build_buffer_load() for SSBO load operations
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10794
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28119>
Xe KMD landed on drm-next, uAPI is now stable and we can remove
the build time parameter to enable support to it but platforms
older than Lunar lake will have experimental support with Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20418>
For something as basic as read_invocation(x, 0), we were emitting:
mov(8) vgrf67:D, 0d
find_live_channel(8) vgrf236:UD, NoMask
broadcast(8) vgrf237:D, vgrf67:D, vgrf236+0.0<0>:UD NoMask
broadcast(8) vgrf235+0.0:W, vgrf197+0.0:W, vgrf237+0.0<0>:D NoMask
mov(8) vgrf234+0.0:W, vgrf235+0.0<0>:W
This is way overcomplicated - if the invocation is a constant, we can
simply emit a single MOV which reads the desired channel index. Not
only that, but it's difficult to clean up:
1. If this expression appears multiple times, CSE will find all the
redundant emit_uniformize(invocation) and get rid of the duplicate
(find_live_channel+broadcast) on future instructions.
2. Copy propagation will put the 0d directly in the first broadcast.
3. Dead code elimination will get rid of the vgrf67 temp holding 0.
4. Algebraic will replace the first broadcast(x, 0) with a MOV.
5. Copy propagation will put the 0d directly in the second broadcast.
6. Dead code elimination will get rid of the vgrf237 temp.
7. Algebraic will replace the second broadcast(x, 0) with a MOV.
8. Copy propagation will finally combine the two MOVs
That's at least 7-8 optimization passes and several loops through the
same passes just to clean up something we can do trivially.
Cuts 25% of the of the optimizer steps in pipeline 22200210259a2c9c
of fossil-db/google-meet-clvk/BgBlur.1f58fdf742c27594.1 (31 to 23).
Shortens compilation time of the google-meet-clvk/Relight pipeline by
-2.87717% +/- 0.509162% (n=150).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28097>
I tried this when I was working on MR !7698, and it didn't have much
affect back then. Maybe I've added more stuff to my fossil-db?
Gfx12 platforms (Tiger Lake and DG2) are unaffected because the POW
instruction was removed.
shader-db:
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20301933 -> 20301900 (<.01%)
instructions in affected programs: 9077 -> 9044 (-0.36%)
helped: 33 / HURT: 0
total cycles in shared programs: 842797624 -> 842799471 (<.01%)
cycles in affected programs: 1361911 -> 1363758 (0.14%)
helped: 35 / HURT: 111
LOST: 0
GAINED: 9
fossil-db:
Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 165510222 -> 165510163 (-0.00%)
Cycles: 15125195835 -> 15125194484 (-0.00%); split: -0.00%, +0.00%
Spill count: 45204 -> 45196 (-0.02%)
Fill count: 74157 -> 74149 (-0.01%)
Totals from 65 (0.01% of 656118) affected shaders:
Instrs: 57426 -> 57367 (-0.10%)
Cycles: 1667918 -> 1666567 (-0.08%); split: -0.11%, +0.03%
Spill count: 137 -> 129 (-5.84%)
Fill count: 515 -> 507 (-1.55%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
The majority of cases that would have been affected by this actually
had both sources as integer constants. The earlier commit "intel/rt:
Don't directly generate umul_32x16" allowed those to be constant
folded.
v2: Move the a*-1 block to be near the existing a*-1 block.
No shader-db changes on any Intel platform.
fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165510246 -> 165510222 (-0.00%)
Cycles: 15125198238 -> 15125195835 (-0.00%); split: -0.00%, +0.00%
Totals from 46 (0.01% of 656118) affected shaders:
Instrs: 36010 -> 35986 (-0.07%)
Cycles: 2613658 -> 2611255 (-0.09%); split: -0.17%, +0.07%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
The DW source must be first on all platforms since Gfx7. On previous
platforms it's the other way around.
Unsurprisingly, no shader-db or fossil-db changes. This change is
necessary for the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
src/intel/compiler/brw_lower_logical_sends.cpp: In member function ‘bool fs_visitor::lower_logical_sends()’:
src/intel/compiler/brw_lower_logical_sends.cpp:3170:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
3170 | if (devinfo->has_lsc) {
| ^~
src/intel/compiler/brw_lower_logical_sends.cpp:3174:7: note: here
3174 | case SHADER_OPCODE_DWORD_SCATTERED_READ_LOGICAL:
| ^~~~
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
this should yield more consistent results and avoid weird cases where
various formats are queried for things they don't support and won't use
Fixes: 9a412c10b7 ("zink: set all usage flags when querying sparse features")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28115>
only certain formats are required to have the storage bit, so be more
tolerant of failure in the case where drivers actually check flags
and reject storage usage when it's actually unsupported
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28115>
The spec requires a compare on 32-bit but the hardware actually compare 64-bit.
As such, we are required to copy the value to a temporary buffer before
the compare.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: 8c25cd307a ("nvk: EXT_conditional_rendering")
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28106>