Minor adjustments to formatting of the copyright line, but keep
dates and holders. "Authors" entries that could be
obtained via Git logs were also removed.
The license in brw_disasm.c and elk_disasm.c don't match directly
any SPDX pattern I could find, so kept as is.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39503>
The previous Gfx12+ implementation using bit masking is failing for FP8
types, so replacing with explicit lookup tables.
For float types, the encoding now aligns with brw_data_type_float, ensuring
correct behavior for DPAS and other 3-source instructions.
Fixes: d1d4e3d530 ("brw: Add EU assembler support for float8")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39448>
Code kept track of blocks both in a linked list and
in an array. Change the client code of the list to
just use the array so we just maintain one.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39246>
The code currently don't remove blocks, when a block is about to become
empty, the code will replace the last instruction with a NOP.
If we want to have actual block removals again, there are other
strategies than removing them as we iterate (e.g. allow empty blocks
and then collect them in a pass or right after iteration).
So remove those macros.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39246>
Change the few other cases to an inline function that
does the same job. This macro will change in ways that
are not compatible with the non-assembler usages.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39363>
This optimization doesn't work when the ray query index isn't uniform across
the subgroup, which is something the spec allows. While there are some smart
ways to fix this and still avoid unnecessary spilling, its not worth investing
the time until we find a realtime raytracing workload that actually needs to
use multiple live ray queries for something.
Fixes: 1f1de7eb ("anv,brw: Allow multiple ray queries without spilling to a shadow stack")
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39445>
This commit change the BVH layout a little so that we can load the BVH
offset as constant rather than reading from memory.
We have to force the instance leaves pointer at the end which gets used
in copy.comp shader.
Totals:
Instrs: 54798 -> 54728 (-0.13%)
Send messages: 3854 -> 3847 (-0.18%)
Cycle count: 1915106 -> 1913954 (-0.06%); split: -0.07%, +0.01%
Non SSA regs after NIR: 18594 -> 18575 (-0.10%)
Totals from 7 (7.37% of 95) affected shaders:
Instrs: 5532 -> 5462 (-1.27%)
Send messages: 367 -> 360 (-1.91%)
Cycle count: 132592 -> 131440 (-0.87%); split: -1.01%, +0.14%
Non SSA regs after NIR: 1989 -> 1970 (-0.96%)
PERCENTAGE DELTAS Shaders Instrs Send messages Cycle count Non SSA regs after NIR
q2rtx-rt-pipeline 95 -0.13% -0.18% -0.06% -0.10%
--------------------------------------------------------------------------------------
All affected 7 -1.27% -1.91% -0.87% -0.96%
--------------------------------------------------------------------------------------
Total 95 -0.13% -0.18% -0.06% -0.10%
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39106>
In software scoreboard (Gfx12+) use information from previous
instructions to trim out-of-order dependencies. For example, in
send g1, g2 ($1)
mov g3, g1 ($1.dst) // Depends on g1 (destination of $1)
mov g4, g2 ($1.src) // Depends on g2 (source of $1)
mov g5, g1 ($1.dst) // Depends on g1 (destination of $1)
only the first `mov` needs to be annotated, because the execution will
stall until that dependency is fulfilled, which in this case means the
`send` is done and `g1` was already written.
Note that while `$x.dst` implies `$x.src`, the reverse is not true, so
if the first `mov` did not exist, both second and third `mov` in the
example would have to keep their annotations.
This patch add resolution of implicit out-of-order dependencies that are
visible inside a block.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3526>
There's agreement now these are helpful and widely supported. We can
always fallback to a custom vector class later if necessary.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3526>
So that we can put the coarse_pixel_dispatch value available to NIR
lowering.
LNL internal fossildb changes:
Totals from 40 (0.01% of 490838) affected shaders:
Instrs: 33321 -> 33311 (-0.03%); split: -0.04%, +0.01%
Cycle count: 780136 -> 779936 (-0.03%); split: -0.03%, +0.00%
Max live registers: 5292 -> 5298 (+0.11%)
Non SSA regs after NIR: 26638 -> 26464 (-0.65%)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38996>
Makes a bunch of copy propagation and other passes work much better.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39382>
Not only is it questionable for code quality to not call nir_opt_algebraic_late
after nir_opt_algebraic, it also breaks correctness for late lowerings.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39180>
Allows a shader to have multiple ray queries without spilling them to a shadow
stack. Instead, the driver provides the shader with an array of multiple
RTDispatchGlobals structs to give each query its own dedicated stack.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38778>
Since the move to MEMORY_*_LOGICAL the result value was being ignored, so
change to use that.
Since the conversion to use new registers, some issues were introduced:
- Even with `has_64bit_int` ADD with 64-bit immediate value is not supported;
- `dst_high` was not being filled if there was no overflow;
- Only `dst_low` returned.
Found when writing some new code involving large block loads.
Fixes: b79e85a93f ("brw: always use new registers for load address increments")
Fixes: b55f77161d ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39282>
Each group of 16 lanes inside a SIMD32 shader will load different globals.
In SIMD8/16 shaders, the divergence analysis will turn this load into
nir_load_global_constant_uniform_block_intel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36181>
If it wasn't for the workaround, it wouldn't be necessary to track the
whether instructions are exec_all or not. The workaround affects
results when mixing a dep and inst with different exec_all.
Add the predicate so that, when the workaround is disabled, none of
the effects of having different exec_all will kick in, all them will
be considered `exec_all = true`.
This patch don't change any behavior, just adds the predicate.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36659>
nr_params & params array are gone.
brw_ubo_range is not stored on the prog_data structure anymore (Anv
already stored a copy of that with its own additional information)
The backend now only deals with load_push_data_intel. load_uniform &
load_push_constant have to be lowered by the driver.
Pre Gfx12.5 platforms have to provide a subgroup_id_param to specify
where the subgroup_id value is located in the push constants.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
Anv already manages this itself. This allows removing the logic from
the compiler.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
Drivers can do all the lowering to push constants to find the only
value useful in that array (subgroup_id). Then drivers call into
brw_cs_fill_push_const_info() to get the cross/per thread constant
layout computed in the prog_data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>