This optimization doesn't work when the ray query index isn't uniform across
the subgroup, which is something the spec allows. While there are some smart
ways to fix this and still avoid unnecessary spilling, its not worth investing
the time until we find a realtime raytracing workload that actually needs to
use multiple live ray queries for something.
Fixes: 1f1de7eb ("anv,brw: Allow multiple ray queries without spilling to a shadow stack")
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39445>
Disable RHWO by default for singlesample draws and for MSAA
draws if a drirc key is set (avoid perf hit if not needed).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39404>
This commit change the BVH layout a little so that we can load the BVH
offset as constant rather than reading from memory.
We have to force the instance leaves pointer at the end which gets used
in copy.comp shader.
Totals:
Instrs: 54798 -> 54728 (-0.13%)
Send messages: 3854 -> 3847 (-0.18%)
Cycle count: 1915106 -> 1913954 (-0.06%); split: -0.07%, +0.01%
Non SSA regs after NIR: 18594 -> 18575 (-0.10%)
Totals from 7 (7.37% of 95) affected shaders:
Instrs: 5532 -> 5462 (-1.27%)
Send messages: 367 -> 360 (-1.91%)
Cycle count: 132592 -> 131440 (-0.87%); split: -1.01%, +0.14%
Non SSA regs after NIR: 1989 -> 1970 (-0.96%)
PERCENTAGE DELTAS Shaders Instrs Send messages Cycle count Non SSA regs after NIR
q2rtx-rt-pipeline 95 -0.13% -0.18% -0.06% -0.10%
--------------------------------------------------------------------------------------
All affected 7 -1.27% -1.91% -0.87% -0.96%
--------------------------------------------------------------------------------------
Total 95 -0.13% -0.18% -0.06% -0.10%
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39106>
Copying the last byte was pointless, since the next line overwrites it,
and resulted in a compiler warning:
../src/intel/common/intel_measure.c: In function 'intel_measure_init':
../src/intel/common/intel_measure.c:68:7: warning: 'strncpy' specified bound 1024 equals destination size [-Wstringop-truncation]
68 | strncpy(env_copy, env, 1024);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
This allows dropping -Wno-error=stringop-truncation from the
debian-x86_64-asan & debian-{arm64,x86_64}-ubsan CI jobs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39429>
In software scoreboard (Gfx12+) use information from previous
instructions to trim out-of-order dependencies. For example, in
send g1, g2 ($1)
mov g3, g1 ($1.dst) // Depends on g1 (destination of $1)
mov g4, g2 ($1.src) // Depends on g2 (source of $1)
mov g5, g1 ($1.dst) // Depends on g1 (destination of $1)
only the first `mov` needs to be annotated, because the execution will
stall until that dependency is fulfilled, which in this case means the
`send` is done and `g1` was already written.
Note that while `$x.dst` implies `$x.src`, the reverse is not true, so
if the first `mov` did not exist, both second and third `mov` in the
example would have to keep their annotations.
This patch add resolution of implicit out-of-order dependencies that are
visible inside a block.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3526>
There's agreement now these are helpful and widely supported. We can
always fallback to a custom vector class later if necessary.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3526>
So that we can put the coarse_pixel_dispatch value available to NIR
lowering.
LNL internal fossildb changes:
Totals from 40 (0.01% of 490838) affected shaders:
Instrs: 33321 -> 33311 (-0.03%); split: -0.04%, +0.01%
Cycle count: 780136 -> 779936 (-0.03%); split: -0.03%, +0.00%
Max live registers: 5292 -> 5298 (+0.11%)
Non SSA regs after NIR: 26638 -> 26464 (-0.65%)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38996>
Each node has their own opacity bits, so we don't need to track these
opacity flags at header level.
This commit also fixes the instance flag. Instance flag is 8bit wide,
but we were always using 4 lower bits.
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39053>
Setting this bit always might hurt performance. It might forces
traversal to treat all leafs always valid.
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39053>
Makes a bunch of copy propagation and other passes work much better.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39382>
Gfx 12.5 struct has only one major difference with gfx9, that is OaCntr lenght,
while on gfx 9 it is 36 uint64_t long on gfx 12.5 it is 38 uint64_t long.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
We are missing handling for gfx12.5 so to add it we will need a switch case over
verx.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
Looking at the reference code, there is no new struct for Xe3 so it should
use the same struct as Xe2.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
With no more users of intel_perf_load_configuration() it can be
removed with other i915 functions around it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
We have no usage of the information returned by
intel_perf_load_configuration(). It is only used to add a copy of the
configuration so we have the metric id but we could instead get the
metric id from sysfs, that is added by mdapi.
Xe KMD don't have a uAPI to query the metrics configuration, so
using sysfs also fixes the integration of mdapi with Xe KMD.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
There is no usage for register_config outside of
anv_AcquirePerformanceConfigurationINTEL(), so we don't need to store
it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Lukasz Stalmirski <lukasz.stalmirski@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32842>
Not only is it questionable for code quality to not call nir_opt_algebraic_late
after nir_opt_algebraic, it also breaks correctness for late lowerings.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39180>
We can have only one of those calls before the 'if GFX_VERx10 >= 125' block.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39362>
Fixes:
dEQP-VK.api.copy_and_blit.dedicated_allocation.resolve_image.whole_copy_before_resolving_transfer.2_bit
Otherwise we attempt to use blorp and hit various asserts later in:
- blorp_copy_supports_blitter
- blorp_xy_block_copy_blt
Fixes: 61287b00f3 ("anv: Stop using RCS companion for MSAA copy/clear on Xe3+")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39346>
Fixes a crash with:
dEQP-VK.api.external.semaphore.opaque_fd.signal_export_import_wait_temporary
when driver calls genX(CmdSetEvent2) -> emit_apply_pipe_flushes with
having NULL in emitted_flush_bits.
Fixes: 8834ef8bcd ("anv: use flushing PIPE_CONTROL for event signaling")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39343>
Allows a shader to have multiple ray queries without spilling them to a shadow
stack. Instead, the driver provides the shader with an array of multiple
RTDispatchGlobals structs to give each query its own dedicated stack.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38778>
Since the move to MEMORY_*_LOGICAL the result value was being ignored, so
change to use that.
Since the conversion to use new registers, some issues were introduced:
- Even with `has_64bit_int` ADD with 64-bit immediate value is not supported;
- `dst_high` was not being filled if there was no overflow;
- Only `dst_low` returned.
Found when writing some new code involving large block loads.
Fixes: b79e85a93f ("brw: always use new registers for load address increments")
Fixes: b55f77161d ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39282>
The previous approach does ensure that all entries are zero'd, but that
may not be clear to the reader (i.e., me). Using `{ 0 }` is clearer.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39245>
This gives us the infrastructure that allows us to slowly migrate
pieces of blorp shaders from NIR to OpenCL, which, IMHO, are much
easier to read. We can't fully migrate everything due to all the
conditional building we do with these shaders, but I'm sure we'll find
opportunities to replace some NIR with OpenCL eventually.
The conversion of blorp_check_in_bounds() serves as the first example.
I also plan to have the shaders from the new indirect copy extension
be OpenCL shaders (mixed with some NIR as well), so having this patch
merged now will reduce the diff for the extension later.
Thanks to Alyssa Rosenzweig for her help here.
v2:
- Use SPDX (Alyssa).
- Use nir_trim_vector() (Alyssa).
- Adjust CL variable declaration (Alyssa).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>
We have two distinct code paths sharing blorp_params->wm_inputs for
different purposes: the code from blorp_blit.c and the code from
blorp_clear.c. While blorp_blit.c uses most of the parameters (all
except clear_color), blorp_clear.c only uses clear_color and
bounds_rect. Split the parameters in two structs: one for blits and
the other for clears.
This not only helps save some space in the shader inputs, but it also
organizes things so it's more clear which parameters are used by what.
In addition, my plan is to later add struct blorp_wm_inputs_indirect,
which won't share anything that the others use, and would otherwise
grow the struct even more.
This change would reduce the size of struct blorp_wm_inputs from 96 to
80, but we have to add padding due to the assertion that compares it
to cs_prog_data->push.cross_thread.size. Still good, though.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>
Because blorp_params_get_clear_kernel() calls
blorp_params_get_clear_kernel_cs(), which reads params->num_samples,
which we have not properly set yet at this point.
I am also planning to have the functions that create the shader to
rely on params.op, which we have not set yet either.
I found this by inspection (when writing another patch), I'm not sure
if this fixes something relevant, but it may be relevant to ver >= 30
multi-sampled cases.
Fixes: de0c547448 ("blorp: Handle 2D MSAA array image copies on compute shader")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>
When I first looked at this struct, my tiny little brain felt
overwhelmed.
- Add some white spaces in order to group the parameters into
"logical" groups so it's easier to reason about everything.
- Change the parameter order just a little bit - without breaking the
logical groups - so the struct size decreases by 1.7% to 1864 bytes.
- Add a comment explaining what the void * pointers point to.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39046>