This expression accidentally performs a 32-bit sign-extension when
processing the second half of the expression (the low 16 bits).
Consider -7W, which is represented as 0xfff9fff9 in our encoding (the
16-bit word is replicated to both halves of the 32-bit dword).
Tigerlake's compaction stores the low 11-bits of an immediate as-is,
and replicates the 12th bit. So here, compacted_imm will be 0xff9.
( (int)(0xff9 << 20) >> 4) |
((short)(0xff9 << 4) >> 4))
0xfff90000 | (0xff90 >> 4)
0xfff90000 | 0xfffffff9 ...oops...
0xfffffff9
By casting the second line of the expression to unsigned short, we
prevent the sign-extension when it combines both parts, so we get:
0xfff90000 | 0x0000fff9
0xfff9fff9
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16833>
For cases with lots of very small primitives, this may improve
performance because we're not executing those dead channels all the
time.
Shader-db reports no instruction or cycle-count changes. However, by
hacking up the driver to report when this optimization triggers, it
appears to affect about 10% of shader-db.
v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now,
because using VMask on those platforms allows us to perform the
eliminate_find_live_channel() optimization. However, XeHP doesn't
seem to have packed fragment shader dispatch, so we lose that
optimization regardless, and there's no reason not to avoid vmask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
Originally, we had virtual opcodes for scratch access, and let the
generator count spills/fills separately from other sends. Later, we
started using the generic SHADER_OPCODE_SEND for spills/fills on some
generations of hardware, and simply detected stateless messages there.
But then we started using stateless messages for other things:
- anv uses stateless messages for the buffer device address feature.
- nir_opt_large_constants generates stateless messages.
- XeHP curbe setup can generate stateless messages.
So counting stateless messages is not accurate. Instead, we move the
spill/fill accounting to the register allocator, as it generates such
things, as well as the load/store_scratch intrinsic handling, as those
are basically spill/fills, just at a higher level.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>
If we saw a HALT instruction, we would forget to invalidate our analysis
pass information before returning progress.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16677>
This is set to true for all drivers that have a GLSL level
of support lower than 4.00. This matches the rule for setting the
GLSL IR option EmitNoIndirectSampler.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
With this option enabled range of input values for fsin and fcos is
limited to [-2*pi : 2*pi] by calculating the reminder after 2*pi modulo
division. This helps to improve calculation precision for large input
arguments on Intel.
-v2: Add limit_trig_input_range option to prog_key to update shader
cache (Lionel)
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>
This pattern pops up a bunch and the semantics of nir_channels() aren't
very convenient much of the time. Let's add a nir_trim_vector() which
matches nir_pad_vector().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>
This code just got duplicated a lot. There is still more, but the
remaining instances do a bit more than just removing other functions.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16348>
Ken performed some tests with shader-db to evaluate the effects
```
Across all 145,848 shaders generated, the results were:
Total bytes compacted before: 3,326,224
Total bytes compacted after: 60,963,280
```
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15399>
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.
All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
shadow2darray with NIR, since the shading languages don't expose txl of
those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
fix brw_kernel::stats member that was declared as a variable
but used as a pointer to array of 3 elements
CID: 1503279
Signed-off-by: Bozhenko Alexey <oleksii.bozhenko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15975>
Fixes a hang on Gfx9 GT1 : dEQP-VK.compute.zero_initialize_workgroup_memory.max_workgroup_memory.128
Tested-by: Mark Janes <markjanes@swizzler.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15596>
c78be5da30 ("intel/fs: lower ray query intrinsics") introduced a
helper function using nir_(push|pop)_if which invalidated dominance &
block_index for the replacement of nir_intrinsic_rt_trace_ray.
We can still keep dominance/block_index metadata for the lowering of
nir_intrinsic_rt_execute_callable though.
This change uses 2 different lowering function with correct metadata
preservation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. The
maximum number of gradient components and coordinate components should
be 2. In spite of this limitation, the Bspec lists a mysterious R
component before the min_lod, so the maximum coordinate components is 3.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment
The Fixes: tag below is a bit misleading. This commit fixes some test
cases similar to ones fixed by the Fixes: commit. I just want to make
sure this commit gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. Cube
maps and 3D surfaces were already handled, but 1D array and 2D array
surfaces were not.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment
The Fixes: tag below is a bit misleading. This commit adds another
lowering, similar to the one in the Fixes: commit, that probably should
have been added at the same time. I just want to make sure this commit
gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>
You should probably resize the sources array before accessing entries
that might be out of bounds. inst->resize_sources() always allocates
enough space for at least 3 sources, so this is really only an issue
when there are 4+ sources.
Fixes: a920979d4f ("intel/fs: Use split sends for surface writes on gen9+")
Fixes: 4f86a70599 ("intel/fs: Lower DW untyped r/w messages to LSC when available")
Fixes: d372abe397 ("intel/fs: Add surface OWORD BLOCK opcodes")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15632>
In general, an atomic intrinsic may perform separate atomics for every
enabled SIMD channel, as each channel may operate on different memory.
However, an extremely common case is for all channels to access the same
memory location. In this case, we can simply perform a reduction/scan
across the subgroup, and perform one atomic for the whole subgroup,
rather than one per channel. For example, if an intrinsic says to take
the minimum value of the existing memory and the value in each channel,
we can do a thread-local minimum of all enabled channels, then do a
single atomic to take the minimum of that and the existing memory.
Our hardware doesn't optimize the case where multiple channels ask for
atomics on the same memory location; it assumes the compiler will do so.
nir_opt_uniform_atomics() uses divergence analysis to detect this case,
adds the necessary subgroup operations, and moves the atomic inside a
conditional that disables all but a single invocation. It even detects
cases where the shader code already performs this kind of optimization,
and avoids doing it a second time.
This may not be the optimal solution for us. In the backend, we could
detect this case and emit send(1) instructions with NoMask, rather than
generating if...send(16)...endif, and a lot of unnecessary ALU ops. But
it's simple to do, reuses the same path as ACO, and still provides most
of the benefit by cutting up to 16x atomics down to a single atomic,
which is more merciful to the memory bus.
Improves performance of Shadow of the Tomb Raider by 5.5% on XeHP.
Improves performance of a customer-internal benchmark on XeHP at
3840x2160 and low settings by approximately 30%.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
We'll use this more shortly. For now, enable it to separately in case
anything bisects to this.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
Although we don't use divergence analysis yet, we've had several
work-in-progress series that make use of it. We may as well set
our options so that those series can assume they're in place.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V. However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().
We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
Instead of reusing the in/out slot mechanism, use a separated NIR
variable mode. This will make easier later to implement staging the
output in shared memory (and storing all at the end to the URB).
Note to get 64-bit type support we currently rely on the
brw_nir_lower_mem_access_bit_sizes() pass.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15022>
Use emit_predicate_on_sample_mask() helper that does check where to
get the correct mask depending on whether discard/demote was used or
not.
Fixes: 45f5db5a84 ("intel/fs: Implement "demote to helper invocation"")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>
Without this change, a check for "is helper invocation" could read
uninitialized values.
Fixes: 45f5db5a84 ("intel/fs: Implement "demote to helper invocation"")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>
As this function doesn't check for any control-flow
dependence, it only returns true for statically
(or globally) uniform values.
The same holds true for is_binding_dynamically_uniform()
in nir_opt_gcm().
Rename to better reflect that property.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14994>
We're now required to provide all modules to link at the same SPIRV
version.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>
We're running the io lowering twice so need to reset some fields so
the offset don't go over what is really needed.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>