This will save us the trouble of faking constant folding for the BVH level and
trace ray control values when we lower this intrinsic in the new backends.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/42006>
We now calculate it when emitting push input loads at the NIR level,
rather than in the backend.
v2: Fix missing interaction with legacy tesslevel remapping
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41821>
Calculate the end of our read as a byte offset and divide by the 32B
unit of URB reads. We were calculating 1 byte beyond the start offset
and dividing by 8 (the number of 4 byte DWords in 1 unit of URB read).
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41821>
Coverity notices that there is an error case where
`nir_get_io_data_src_number` could return `-1`, and that is then used to
index into an array. Given that that is an exceptional case, we can just
assert here.
CID: 1681480
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40146>
On Xe2 and Xe3, the flushing is necessary due to aliasing of TGM data
in L1 memory (HSD 14020414266). On newer platforms, it is necessary
for proper post-format data conversion handling (HSD 22020984324).
See the Instruction_Fence page (63969) for documentation on the fact
that the threadgroup scope ignores flushes.
Thanks to Francisco Jerez and Kenneth Graunke on their help for this
patch.
v2: restrict the flushing to TGM (Lionel).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40732>
We generalize the sample_mask_in lowering to handle this too.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
In case of nir_intrinsic_load_inline_data_intel it was not using base_offset to
create the uniform, instead it was using only the special BRW_INLINE_PARAM_REG
value that later will be replaced by the inline_data fixed register.
So here using base_offset for both intrinsics, adding BRW_INLINE_PARAM_REG if
nir_intrinsic_load_inline_data_intel and then in brw_shader::assign_curb_setup
checking for inst->src[i].nr >= BRW_INLINE_PARAM_REG and adjusting brw_reg by
the remaining of the subtraction with BRW_INLINE_PARAM_REG.
Fixes: 7f19814414 ("brw/nir: handle inline_data_intel more like push_data_intel")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41607>
We'll get three new opcodes to properly model float multiply-add.
ffma_old is temporary and will be deleted at the end of this series.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
Passes the ACCESS_CAN_REORDER flag from NIR on to the backend so that we
can lower the loads to a non-volatile SEND. This allows the scheduler to
freely reorder them around stores or fences.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41008>
We were previously assuming that potentially stale divergence data was
valid. On some paths the register pressure estimator would recalculate
this, but, as is obvious from the results, not always.
v2: Add an assertion in brw_from_nir_emit_impl to ensure we don't end
up in this situation again.
v3: Call nir_divergence_analysis from
brw_nir_lower_deferred_urb_writes. This fixes assertion failures (the
assertion added in v2) in basically every graphics shader. The
altnerative was to call it from brw_compile_vs, brw_compile_gs, and
brw_compile_tes.
shader-db:
All Intel platformms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17050403 -> 17054033 (0.02%)
instructions in affected programs: 296344 -> 299974 (1.22%)
helped: 0 / HURT: 376
total cycles in shared programs: 876063126 -> 875817316 (-0.03%)
cycles in affected programs: 78627328 -> 78381518 (-0.31%)
helped: 91 / HURT: 276
LOST: 1
GAINED: 10
fossil-db:
All Intel platformms had similar results. (Lunar Lake shown)
Totals:
Instrs: 913770429 -> 916075391 (+0.25%); split: -0.00%, +0.26%
CodeSize: 14647414640 -> 14726176320 (+0.54%); split: -0.02%, +0.56%
Cycle count: 102308091527 -> 102290664775 (-0.02%); split: -0.26%, +0.24%
Spill count: 3469632 -> 3469124 (-0.01%); split: -0.08%, +0.07%
Fill count: 5007038 -> 4998674 (-0.17%); split: -0.51%, +0.34%
Max live registers: 192568853 -> 192595355 (+0.01%); split: -0.00%, +0.02%
Max dispatch width: 48713168 -> 48712880 (-0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 140252767 -> 140253718 (+0.00%)
Totals from 223099 (11.11% of 2007586) affected shaders:
Instrs: 314077245 -> 316382207 (+0.73%); split: -0.01%, +0.75%
CodeSize: 5335583824 -> 5414345504 (+1.48%); split: -0.06%, +1.54%
Cycle count: 45868025821 -> 45850599069 (-0.04%); split: -0.58%, +0.54%
Spill count: 2062649 -> 2062141 (-0.02%); split: -0.14%, +0.11%
Fill count: 3343019 -> 3334655 (-0.25%); split: -0.76%, +0.51%
Max live registers: 36762498 -> 36789000 (+0.07%); split: -0.02%, +0.09%
Max dispatch width: 5542224 -> 5541936 (-0.01%); split: +0.03%, -0.03%
Non SSA regs after NIR: 43727142 -> 43728093 (+0.00%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v1]
Fixes: 1bff4f93ca ("brw: Basic infrastructure to store convergent values as scalars")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41370>
Add a new intrinsic to read the raw shading rate provided to the FS
payload, and lower load_frag_shading_rate in NIR using it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
We'll need the raw coverage mask provided to the fragment shader in a
future patch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38879>
The new FRAG_RESULT_DUAL_SRC_BLEND option is easier to work with than
looking for FRAG_RESULT_DATA0 with an index of 1. This also means we
no longer care about the dual source blend index, and can just use the
FRAG_RESULT location. That cascades to meaning we no longer have to
store a tuple in driver_location. And, if we just need location, we
can avoid populating that at all and use nir_io_semantics to get it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
We can just have iris look at its own program key and change the
fragment shader output variable's location/index in the NIR. By
doing this before lowering fragment shader outputs, the rest of
the output lowering does the right thing, and the backend no longer
has to consider hacks for broken OpenGL apps.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
Several things were wrong :
- incorrect offset in the FS push constant data
- incorrect encoding of the 32bit values with 2 fields (remap table offset & provoking vertex)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
To simplify things, our backend rounds convergent block loads up to a full
register. This causes page faults with the scratch page disabled since the
address is not always aligned to a register size. Loading smaller blocks is
slightly more difficult because the SEND instruction can only write back a
multiple of full registers, even if the actual data is smaller.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Since inline parameter is the last field of the thread payload, the
backend can always assume they may exist. They won't affect the
position of other payload fields and the register allocator will
reuse any unused space.
In Anv, also update EmitInlineParameter for Task/Mesh/CS to reflect
previous changes in inline parameter setup. Remove/Update some stale
comments since we are here.
Finally, remove the prog_key/prog_data bits that tracked whether inline
data or a push address was needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41230>
For DG2 (Bspec 47937) has the same programming note as of Xe2+,
"When this bit is set in the header, Trace Ray Message behaves like a
Ray Query. This message requires a write-back message indicating
RayQuery for all valid Rays (SIMD lanes) have completed."
So this patch is just passing a write back destination register when we
have ray query message.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41039>
This will let us share a common scratch swizzling between brw and jay.
Changes by Ken:
- Use an immediate SIMD width when known so we don't need to re-lower
- Switch to load_simd_width_intel because it may not match
info->api_subgroup_size on Vulkan without VK_EXT_subgroup_size_control
- Stop using DWord Scattered Write messages for scratch. These take an
offset in DWords, and our offsets are now always in bytes. This also
means that we no longer create MEMORY_OPCODE_* IR with inconsistent
units of either bytes or dwords. Yikes. We use byte scattered
messages now.
fossil-db stats on Battlemage:
Instrs: 500477504 -> 500450056 (-0.01%); split: -0.01%, +0.00%
CodeSize: 7807432368 -> 7806786192 (-0.01%); split: -0.01%, +0.00%
Cycle count: 62404008370 -> 62398437734 (-0.01%); split: -0.01%, +0.00%
Fill count: 546690 -> 546695 (+0.00%); split: -0.00%, +0.00%
Max live registers: 141257956 -> 141258100 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 72350283 -> 72336544 (-0.02%)
Totals from 99 (0.01% of 1581969) affected shaders:
Instrs: 366593 -> 339145 (-7.49%); split: -7.58%, +0.09%
CodeSize: 6425936 -> 5779760 (-10.06%); split: -10.06%, +0.00%
Cycle count: 2412009876 -> 2406439240 (-0.23%); split: -0.26%, +0.03%
Fill count: 19675 -> 19680 (+0.03%); split: -0.02%, +0.04%
Max live registers: 17600 -> 17744 (+0.82%); split: -0.09%, +0.91%
Non SSA regs after NIR: 37894 -> 24155 (-36.26%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
This lets us emit NIR code based on the SIMD size. For non-fragment
stages, we'll replace it with a constant and optimize, but for FS,
we delay it until the backend.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
Just return the register instead of having multiple functions stash the
register in an array of registers. Way too much hoopla here.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
Use is_scalar to know if we can do transpose loading.
Also enable vectorization if 2 intrinsics share the same source (it
means the only difference is the base).
Fixes: e14d6b535c ("brw/nir: add new intrinsics to load data from the indirect address")
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40308>
This address is delivered on Gfx12.5+ in compute/mesh/task shaders
from the command stream instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>
For Gfx125 workloads that use systolic mode, this might mean
an extra PIPELINE_SELECT when flipping between a compute shader
that use the mode and another that doesn't use the mode
(or vice-versa).
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40014>
It's pretty much the same mechanism, except it's a different register
location.
With this change we gain indirect loading support.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
If we have extended bindless surface offset (ExBSO) support, we want to
use it. Consolidate the anv_physical_device and brw_compiler bits into
a single static inline that take devinfo.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
This was incorrect for OpenCL due to the possibility of variable shared memory
existing despite shared_size == 0. Fortunately the optimization it was trying to
do should be done in NIR via nir_opt_barrier_modes so we can just drop the brw
code and move on with our merry lives. Fixes OpenCL tests on Iris:
non_uniform_work_group non_uniform_3d_barriers
basic async_strided_copy_local_to_global
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39795>
Will use the load/store_ssbo with nir_resource_intel_internal later in
this series.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
The entire array is always initialized to zero and never modified.
Cuts the size of brw_wm_prog_data by 32%.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39791>
This is the shader key for the fragment shader. Nobody even knows
what the windowizer/masker unit is or does anymore. Even on Gen4-6,
"fs" is still clearer. This makes the codebase easier to read.
This is only about 15 years overdue.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
This is the program data for the fragment shader. Nobody even knows
what the windowizer/masker unit is or does anymore. Even on Gen4-6,
"fs" is still clearer. This makes the codebase easier to read.
This is only about 15 years overdue.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
This started out as dynamic configuration for MSAA related state, but
has since expanded to cover many dynamic fragment shader options.
We rename it to intel_fs_config, similar to intel_tess_config, to
better indicate its purpose.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>