This was originally turned into a separate struct for reuse between vec4
and fs backends, that's not needed anymore.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33334>
This will be useful for pulling constants in device bound shaders. A64
allows us to put the constants anywhere.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32895>
Now using the same model as the compute shader.
As a result we temporarily disable the use of the Inline register for
providing push constants on Task & Mesh shaders. Since that register
is also available on the compute shader we'll try to find a way to use
the same mechanism for all 3 shaders in another MR and bring back that
optimization.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32895>
Starting in Xe3, there's a variant of SEND that take the
register numbers from the ARF scalar register, and don't
require them to be contiguous. The new opcode added here
represents that kind of SEND.
To make the original sources still reachable, we keep them
around during the IR, just ignoring them at generator time.
This allow software scoreboard to properly reason the
dependencies without trying to decode the contents of ARF
scalar register being used.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32410>
This limits the address register to simple cases inside a block.
Validation ensures that the address register is only written once and
read once.
Instruction scheduling makes sure that instructions using the address
register in the generator are not scheduled while there is an usage of
the register in the IR.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
We want to reuse the brw::nr field as a virtual address register
identifer. So we can't use brw::file=ARF brw::nr=ADDRESS.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
The push_constant_loc[] array is always an identity mapping these days,
so it's kind of pointless. Just use the original uniform number and
skip the unnecessary "remap" step. With that gone, and shrinking UBO
ranges gone, assign_constant_locations() is now empty and can be removed
as well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
Now that we never shrink ranges in the backend, we never lower push
constants to pull constants late in the backend either. get_pull_loc
will never return true, and so all of brw_lower_constant_loads becomes
a noop.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
Back in the bad old days (vec4?) we had a bunch of smarts in the backend
to dead code eliminate unused vector components and re-pack regular
uniforms, so we really couldn't decide how much data we were pushing
until very late in the backend. Nowadays we have none of that - we do
all of our elimination and packing in NIR. anv shrinks ranges to deal
with Vulkan API push constants, and iris treats everything as a UBO and
as of the previous commit will also shrink appropriately.
So we don't need to do this anymore...which will let us simplify quite
a bit of code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
No shader-db changes on any Intel platform.
v2: Fix for Xe2.
v3: Rework the way that we determine that an intrinsic can actually be
convergent. This will now depend on whether or not the important
sources have previously be determined to be convergent. Fixes
intermitent failures in some test cases (including
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.push_constant_float_16_to_32.scalar_frag).
v4: s/the it/it/ in a comment. Noticed by Ken.
fossil-db:
No fossil-db changes on Lunar Lake.
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152743449 -> 152743161 (-0.00%)
Cycle count: 17399179660 -> 17399193488 (+0.00%)
Totals from 144 (0.02% of 633314) affected shaders:
Instrs: 5936 -> 5648 (-4.85%)
Cycle count: 51616 -> 65444 (+26.79%)
Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150646195 -> 150645907 (-0.00%)
Cycle count: 15618427818 -> 15618428942 (+0.00%)
Totals from 144 (0.02% of 632567) affected shaders:
Instrs: 6218 -> 5930 (-4.63%)
Cycle count: 39968 -> 41092 (+2.81%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
In SIMD16 and SIMD32, storing convergent values in full 16- or
32-channel registers is wasteful. It wastes register space, and in most
cases on SIMD32, it wastes instructions. Our register allocator is not
clever enough to handle scalar allocations. It's fundamental unit of
allocation is SIMD8. Start treating convergent values as SIMD8.
Add a tracking bit in brw_reg to specify that a register represents a
convergent, scalar value. This has two implications:
1. All channels of the SIMD8 register must contain the same value. In
general, this means that writes to the register must be
force_writemask_all and exec_size = 8;
2. Reads of this register can (and should) use <0,1,0> stride. SIMD8
instructions that have restrictions on source stride can us <8,8,1>.
Values that are vectors (e.g., results of load_uniform or texture
operations) will be stored as multiple SIMD8 hardware registers.
v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2.
v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in
brw_reg.h. Both suggested by Caio. The offset_to_scalar() change
necessitates some trickery in the fs_builder offset() function, but I
think this is an improvement overall. There is also some rework in
find_value_for_offset to account for the possibility that is_scalar
sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
This isn't used now, but future commits will add uses. Doing this as a
separate commit removes a lot of "just typing" churn from commits that
have real changes to review.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
Almost all cases now handled with default arguments. The only real
extra work that was being done was pushed to the client code in
debug_optimizer().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32596>
In this case, num_sources is bigger than this->sources, so if we loop
up to num_sources (instead of this->sources) we'll end up reading past
the end of old_src[]. Only copy up to what we originally had.
This was found by code inspection, I'm not aware of any applications
failing due to the lack of this patch.
Fixes: d9e737212d ("intel/brw: Add a src array for the common case in fs_inst")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32600>
Moving it to intel_shader_enums.h
The plan is to make it visible to OpenCL shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>
Add opcodes for VOTE_ALL, VOTE_ANY and VOTE_EQUAL. The first two
are also used for the quad variants. Move their lowering from
NIR conversion to brw_lower_subgroup_ops.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
This intrinsic was initially dedicated to mesh/task shaders, but the
mechanism it exposes also exists in the compute shaders on Gfx12.5+.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>