This hint tells KMD and firmware to turn into low latency but high
power usage mode.
i915 already had it now it was implemented in Xe KMD.
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214>
Lets query if this feature is supported only once, also in the next
patches support for this feature will be added to Xe KMD.
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214>
This is an alternative Curro proposed to counting the number of
serialized messages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37816>
The intermediate buffer between the 2 images is linear, its stride
should be a function of the tile's logical width.
Normally this should map to the values reported by ISL except for
TileW where for some reason it was decided to report 128 for TileW
instead of the actual 64 size (see isl_tiling_get_info() ISL_TILING_W
case)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37902>
Instead of having abstracted opcodes, we target directly the HW format
at the NIR translation.
The payload description gives us the order of the payload sources (we
can use that for pretty printing) and we don't have to have a
complicated scheme in the logical send lowering for the ordering. All
we have to do is build the header if needed as well as the descriptors.
PTL Fossil-db stats:
Totals from 66759 (13.54% of 492917) affected shaders:
Instrs: 44289221 -> 43957404 (-0.75%); split: -0.81%, +0.06%
Send messages: 2050378 -> 2042607 (-0.38%)
Cycle count: 3878874713 -> 3712848434 (-4.28%); split: -4.44%, +0.16%
Max live registers: 8773179 -> 8770104 (-0.04%); split: -0.06%, +0.03%
Max dispatch width: 1677408 -> 1707952 (+1.82%); split: +1.85%, -0.03%
Non SSA regs after NIR: 11407821 -> 11421041 (+0.12%); split: -0.03%, +0.15%
GRF registers: 5686983 -> 5838785 (+2.67%); split: -0.24%, +2.91%
LNL Fossil-db stats:
Totals from 57911 (15.72% of 368381) affected shaders:
Instrs: 39448036 -> 38923650 (-1.33%); split: -1.41%, +0.08%
Subgroup size: 1241360 -> 1241392 (+0.00%)
Send messages: 1846696 -> 1845137 (-0.08%)
Cycle count: 3834818910 -> 3784003027 (-1.33%); split: -2.33%, +1.00%
Spill count: 21866 -> 22168 (+1.38%); split: -0.07%, +1.45%
Fill count: 59324 -> 60339 (+1.71%); split: -0.00%, +1.71%
Scratch Memory Size: 1479680 -> 1483776 (+0.28%)
Max live registers: 7521376 -> 7447841 (-0.98%); split: -1.04%, +0.06%
Non SSA regs after NIR: 9744605 -> 10113728 (+3.79%); split: -0.01%, +3.80%
Only 2 titles negatively impacted (spilling) :
- Shadow of the Tomb Raider
- Red Dead Redemption 2
All impacted shaders were already spilling.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>
We start by assigning a backend opcode to all tex instructions, use
that to figure out if we have packed sources and apply the lowering
accordingly.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>
Centralize all the information in one place and also make the mapping
decision from nir_tex_instr -> HW opcode much earlier.
This will help knowning exactly what the payload looks like early in
the backend IR and when it needs to lowered to a smaller SIMD size due
to HW limits. It will also allow NIR lowering to know when to combine
parameters into a single packed component.
Finally, this also reduces the amount of LOAD_PAYLOAD we need to carry
in the backend IR, because we don't have to generate VEC()
LOAD_PAYLOAD() for coordinates etc... Those are useless if there is
any other parameter in the payload and we need need to add one more
LOAD_PAYLOAD() when doing the logical send lowering.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>
Somehow dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_1.atan_frag
was able to generate a bitfield_select with a constant first
parameter. That makes the big comment here completely false.
Don't be clever. If the constant is in the wrong place,
commute_immediates during copy propagation will fix it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37891>
This prevents lower_regioning from doing bad things when the destination
and all the other sources are UW.
Other solutions considered:
- Have the type of src[3] match the destination type. This also required
changes in combine_constants to allow the type be UD or UW.
- Make a new subclass brw_bfn_inst, and store the Boolean function
selector outside the src[] array. This was a lot more code and a lot
more churn (+47,-27 vs +4).
Fixes: b948e6d503 ("brw: Use BFN to implement nir_opt_bitfield_select")
Suggested-by: Curro
Suggested-by: Ken
Closes: #14095
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37891>
This was something that came up in the slop MR. Not sure it's actually a
good idea or not but kind of curious what people think, given we have a
sound tool (Coccinelle) to do the transform. Saves a redundant branch
but means extra noninlined function calls.. likely no actual perf impact
but saves some code.
Via Coccinelle patches:
@@
expression ptr;
@@
-if (ptr) {
-free(ptr);
-}
+free(ptr);
@@
expression ptr;
@@
-if (ptr) {
-FREE(ptr);
-}
+FREE(ptr);
@@
expression ptr;
@@
-if (ptr) {
-ralloc_free(ptr);
-}
+ralloc_free(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-free(ptr);
-}
-
+free(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-FREE(ptr);
-}
-
+FREE(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-ralloc_free(ptr);
-}
-
+ralloc_free(ptr);
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d]
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus]
Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr]
Reviewed-by: Janne Grunau <j@jannau.net> [asahi]
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv]
Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3]
Acked-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>
dest_size is the number of outputs to be provided into the IR, but the
location of the sparse bitfield in the dst temporary SEND destination
might be different (shorter due to masking of unused components
computed above).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14094
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37876>
sfid is another field that is not preserved after brw_transform_inst_to_send()
so we need to store it before transform and retore it to preserve the sfid value.
Fixes: 0fcce2722f ("brw: Add brw_send_inst")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37823>
Opcodes SHADER_OPCODE_INTERLOCK and SHADER_OPCODE_MEMORY_FENCE are emitted as
brw_send_inst and at nir to brw conversion the desc field is set with scope and
flush type of the instruction.
But when brw_inst is converted to brw_send_inst all special fields of
brw_send_inst are set to 0, causing scope and flush type to always be 0.
So here calling lower_lsc_memory_fence_and_interlock() with brw_send_inst
parameter and storing the desc before brw_transform_inst_to_send().
I still have not figure out why we need do brw_transform_inst_to_send() even
if it is already a brw_send_inst but not doing so causes a segfault in
foreach_block_and_inst_safe(block, brw_inst, inst, s.cfg) of
brw_lower_logical_sends(), also other opcodes of that function does something
similar so I don't think that is wrong.
Fixes: 0fcce2722f ("brw: Add brw_send_inst")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37823>
This works around a long-standing synchronization issue consequence of
the HALT instruction used to implement FS discard not being considered
a control flow instruction by the back-end -- The fact that it doesn't
cause the CFG pass to introduce an edge in the graph means that the
software scoreboard pass is completely blind to the effect of discard
jumps on control flow, so it doesn't introduce the required
annotations to avoid data hazards when the discard path of the CFG is
taken. Note that because of the very limited set of instructions that
can follow the HALT target in a fragment shader this was very unlikely
to lead to issues in practice, but starting on xe3 it appears to have
become far more likely due to the use of SENDG, since SENDG requires
the scalar register to be set prior to the submission of the render
target write payloads, which can easily lead to a WaR hazard if there
was another SENDG before the HALT jump that wasn't done reading out
its payload from the GRF.
In an ideal world this would be avoided by having HALT be a normal
control flow instruction represented as an edge in the control flow
graph -- But unfortunately that would prevent the optimizations we
currently do that take advantage of the ability of reordering code
past the HALT instruction, so it would have a pretty large performance
cost. Instead this simply adds a SYNC.ALLWR instruction after the
HALT target to guarantee that all pending SEND messages have finished
execution -- That may also seem costly, however its cost in practice
appears to be minimal since at the point of the program when the
target HALT is executed there is almost nothing left to do other than
send out the render target write payloads, so any pending operations
had to be waited on at roughly this point of the program regardless.
There appear to be no statistically significant regressions in Traci
on neither BMG nor PTL. Fixes hangs observed on Dying Light 2 and
Cyberpunk on PTL.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13896
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13965
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14092
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37674>
By dynamic setting num_channel we can share more code in
lower_lsc_varying_pull_constant_logical_send().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37853>
Since we don't support filmGrainSupport on previous of gen20 for AV1
decoding by default, failures start happening due to the video tests
assuming always FG supported, which is a fault of CTS.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37762>
precisely.
This way we could provide more correct infomation to applications.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37762>
In Lua, modules (i.e. files with lua code) are loaded by using
the standard library require(), e.g.
```
local mylib = require("mylib")
mylib.do_something()
```
The require() will decide where to look by peeking at `package.path`
table. By default it doesn't include the scripts directory, so running
executor from the script directory vs. from the root of the repo would
yield different results (require works vs. require fail to find the
module). This patch includes the script directory to avoid this issue.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>