Instead of having abstracted opcodes, we target directly the HW format
at the NIR translation.
The payload description gives us the order of the payload sources (we
can use that for pretty printing) and we don't have to have a
complicated scheme in the logical send lowering for the ordering. All
we have to do is build the header if needed as well as the descriptors.
PTL Fossil-db stats:
Totals from 66759 (13.54% of 492917) affected shaders:
Instrs: 44289221 -> 43957404 (-0.75%); split: -0.81%, +0.06%
Send messages: 2050378 -> 2042607 (-0.38%)
Cycle count: 3878874713 -> 3712848434 (-4.28%); split: -4.44%, +0.16%
Max live registers: 8773179 -> 8770104 (-0.04%); split: -0.06%, +0.03%
Max dispatch width: 1677408 -> 1707952 (+1.82%); split: +1.85%, -0.03%
Non SSA regs after NIR: 11407821 -> 11421041 (+0.12%); split: -0.03%, +0.15%
GRF registers: 5686983 -> 5838785 (+2.67%); split: -0.24%, +2.91%
LNL Fossil-db stats:
Totals from 57911 (15.72% of 368381) affected shaders:
Instrs: 39448036 -> 38923650 (-1.33%); split: -1.41%, +0.08%
Subgroup size: 1241360 -> 1241392 (+0.00%)
Send messages: 1846696 -> 1845137 (-0.08%)
Cycle count: 3834818910 -> 3784003027 (-1.33%); split: -2.33%, +1.00%
Spill count: 21866 -> 22168 (+1.38%); split: -0.07%, +1.45%
Fill count: 59324 -> 60339 (+1.71%); split: -0.00%, +1.71%
Scratch Memory Size: 1479680 -> 1483776 (+0.28%)
Max live registers: 7521376 -> 7447841 (-0.98%); split: -1.04%, +0.06%
Non SSA regs after NIR: 9744605 -> 10113728 (+3.79%); split: -0.01%, +3.80%
Only 2 titles negatively impacted (spilling) :
- Shadow of the Tomb Raider
- Red Dead Redemption 2
All impacted shaders were already spilling.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>
We start by assigning a backend opcode to all tex instructions, use
that to figure out if we have packed sources and apply the lowering
accordingly.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>
Centralize all the information in one place and also make the mapping
decision from nir_tex_instr -> HW opcode much earlier.
This will help knowning exactly what the payload looks like early in
the backend IR and when it needs to lowered to a smaller SIMD size due
to HW limits. It will also allow NIR lowering to know when to combine
parameters into a single packed component.
Finally, this also reduces the amount of LOAD_PAYLOAD we need to carry
in the backend IR, because we don't have to generate VEC()
LOAD_PAYLOAD() for coordinates etc... Those are useless if there is
any other parameter in the payload and we need need to add one more
LOAD_PAYLOAD() when doing the logical send lowering.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>
Somehow dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_1.atan_frag
was able to generate a bitfield_select with a constant first
parameter. That makes the big comment here completely false.
Don't be clever. If the constant is in the wrong place,
commute_immediates during copy propagation will fix it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37891>
This prevents lower_regioning from doing bad things when the destination
and all the other sources are UW.
Other solutions considered:
- Have the type of src[3] match the destination type. This also required
changes in combine_constants to allow the type be UD or UW.
- Make a new subclass brw_bfn_inst, and store the Boolean function
selector outside the src[] array. This was a lot more code and a lot
more churn (+47,-27 vs +4).
Fixes: b948e6d503 ("brw: Use BFN to implement nir_opt_bitfield_select")
Suggested-by: Curro
Suggested-by: Ken
Closes: #14095
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37891>
This was something that came up in the slop MR. Not sure it's actually a
good idea or not but kind of curious what people think, given we have a
sound tool (Coccinelle) to do the transform. Saves a redundant branch
but means extra noninlined function calls.. likely no actual perf impact
but saves some code.
Via Coccinelle patches:
@@
expression ptr;
@@
-if (ptr) {
-free(ptr);
-}
+free(ptr);
@@
expression ptr;
@@
-if (ptr) {
-FREE(ptr);
-}
+FREE(ptr);
@@
expression ptr;
@@
-if (ptr) {
-ralloc_free(ptr);
-}
+ralloc_free(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-free(ptr);
-}
-
+free(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-FREE(ptr);
-}
-
+FREE(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-ralloc_free(ptr);
-}
-
+ralloc_free(ptr);
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d]
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus]
Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr]
Reviewed-by: Janne Grunau <j@jannau.net> [asahi]
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv]
Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3]
Acked-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>
dest_size is the number of outputs to be provided into the IR, but the
location of the sparse bitfield in the dst temporary SEND destination
might be different (shorter due to masking of unused components
computed above).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14094
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37876>
sfid is another field that is not preserved after brw_transform_inst_to_send()
so we need to store it before transform and retore it to preserve the sfid value.
Fixes: 0fcce2722f ("brw: Add brw_send_inst")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37823>
Opcodes SHADER_OPCODE_INTERLOCK and SHADER_OPCODE_MEMORY_FENCE are emitted as
brw_send_inst and at nir to brw conversion the desc field is set with scope and
flush type of the instruction.
But when brw_inst is converted to brw_send_inst all special fields of
brw_send_inst are set to 0, causing scope and flush type to always be 0.
So here calling lower_lsc_memory_fence_and_interlock() with brw_send_inst
parameter and storing the desc before brw_transform_inst_to_send().
I still have not figure out why we need do brw_transform_inst_to_send() even
if it is already a brw_send_inst but not doing so causes a segfault in
foreach_block_and_inst_safe(block, brw_inst, inst, s.cfg) of
brw_lower_logical_sends(), also other opcodes of that function does something
similar so I don't think that is wrong.
Fixes: 0fcce2722f ("brw: Add brw_send_inst")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37823>
This works around a long-standing synchronization issue consequence of
the HALT instruction used to implement FS discard not being considered
a control flow instruction by the back-end -- The fact that it doesn't
cause the CFG pass to introduce an edge in the graph means that the
software scoreboard pass is completely blind to the effect of discard
jumps on control flow, so it doesn't introduce the required
annotations to avoid data hazards when the discard path of the CFG is
taken. Note that because of the very limited set of instructions that
can follow the HALT target in a fragment shader this was very unlikely
to lead to issues in practice, but starting on xe3 it appears to have
become far more likely due to the use of SENDG, since SENDG requires
the scalar register to be set prior to the submission of the render
target write payloads, which can easily lead to a WaR hazard if there
was another SENDG before the HALT jump that wasn't done reading out
its payload from the GRF.
In an ideal world this would be avoided by having HALT be a normal
control flow instruction represented as an edge in the control flow
graph -- But unfortunately that would prevent the optimizations we
currently do that take advantage of the ability of reordering code
past the HALT instruction, so it would have a pretty large performance
cost. Instead this simply adds a SYNC.ALLWR instruction after the
HALT target to guarantee that all pending SEND messages have finished
execution -- That may also seem costly, however its cost in practice
appears to be minimal since at the point of the program when the
target HALT is executed there is almost nothing left to do other than
send out the render target write payloads, so any pending operations
had to be waited on at roughly this point of the program regardless.
There appear to be no statistically significant regressions in Traci
on neither BMG nor PTL. Fixes hangs observed on Dying Light 2 and
Cyberpunk on PTL.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13896
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13965
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14092
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37674>
By dynamic setting num_channel we can share more code in
lower_lsc_varying_pull_constant_logical_send().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37853>
cmod propagation needs more work. Since the result type is always UD,
BRW_CONDITION_G should be able to substitute for NZ. Either that or
users of the condition could be rewritten to use an inverted condition.
v2: Add a couple more unit tests. Suggested by Matt.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
v2 (idr): So much rebasing. Deleted a bunch of code that we're not
going to need yet.
v3 (Ken): bfn inst encoding fix
v4 (idr): Add BFN to brw_get_lowered_simd_width.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
The Intel EU fusion feature needs to be disabled on SEND messages
where either the texture handle, sampler handle, sampler header is not
identical on fused threads.
This is the case in particular with accesses on non-uniform
texture/sampler handles but could also strike with dynamic
programmable offsets (currently disabled).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
mesh doesn't use brw_vue_prog_data. Also, I had been catching TCS
shaders here, and shouldn't.
Fixes: bf76e86bc8 ("brw: Refactor clip/cull distance mask setting into a helper")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37809>
This is nearly identical, except for bindless sampler/texture/image
handling. But we only use it for inputs/outputs, not uniforms, where
there are no bindless handles to worry about.
Deletes a lot of mostly-duplicated code.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>
There are no 64-bit renderable formats so we can't have FS outputs that
are dvecs. This dates back to 2016 and a ton of the backend has been
rewritten, so I think whatever this was trying to solve is no longer a
problem.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>
replace is preferred when appropriate & should be faster. after is when
you use the result in your lowering itself.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37753>
This keeps the directory structure a bit more organized:
- brw specific code
- elk specific code
- common NIR passes that could be used in both places
It also means that you can now 'git grep' in the brw directory without
finding a bunch of elk code, or having to "grep thing b*".
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37755>
These are identical and are just hardware enum values, not related to
the structure of the backend compiler.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37755>
We were compiling these twice, one for brw, one for elk. There's no
reason to do that, just compile the common code once and link against it
in both backends.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37755>
This is from the pre-NIR era where we used GLSL IR expression opcodes
directly. We haven't done that in years.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37755>
We want to be able to emit load_reloc_const_intel intrinsics from common
NIR passes (such as printf lowering). In order to do that, we need to
have the enum with the meaning of values in common code. Once you have
that, it's easy to see the (identical) data structures as a way for the
driver to communicate about relocations, rather than a compiler backend
specific thing. So we move it all up to common code, and re-unify.
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37755>
Usage of '--outdir' argument in python scripts makes it very
complicated for tools like ninja-to-soong to generate the Android
equivalent build file.
This is because the option is less clear on what will be generated.
Instead, change it for '--out' where we give the full path of the file
to generate. This has the good point of deduplicating the locations of
the file name to have it only in 'meson.build'.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37741>