The new FRAG_RESULT_DUAL_SRC_BLEND option is easier to work with than
looking for FRAG_RESULT_DATA0 with an index of 1. This also means we
no longer care about the dual source blend index, and can just use the
FRAG_RESULT location. That cascades to meaning we no longer have to
store a tuple in driver_location. And, if we just need location, we
can avoid populating that at all and use nir_io_semantics to get it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
We can just have iris look at its own program key and change the
fragment shader output variable's location/index in the NIR. By
doing this before lowering fragment shader outputs, the rest of
the output lowering does the right thing, and the backend no longer
has to consider hacks for broken OpenGL apps.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
Several things were wrong :
- incorrect offset in the FS push constant data
- incorrect encoding of the 32bit values with 2 fields (remap table offset & provoking vertex)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31384>
To simplify things, our backend rounds convergent block loads up to a full
register. This causes page faults with the scratch page disabled since the
address is not always aligned to a register size. Loading smaller blocks is
slightly more difficult because the SEND instruction can only write back a
multiple of full registers, even if the actual data is smaller.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40149>
Since inline parameter is the last field of the thread payload, the
backend can always assume they may exist. They won't affect the
position of other payload fields and the register allocator will
reuse any unused space.
In Anv, also update EmitInlineParameter for Task/Mesh/CS to reflect
previous changes in inline parameter setup. Remove/Update some stale
comments since we are here.
Finally, remove the prog_key/prog_data bits that tracked whether inline
data or a push address was needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41230>
For DG2 (Bspec 47937) has the same programming note as of Xe2+,
"When this bit is set in the header, Trace Ray Message behaves like a
Ray Query. This message requires a write-back message indicating
RayQuery for all valid Rays (SIMD lanes) have completed."
So this patch is just passing a write back destination register when we
have ray query message.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41039>
This will let us share a common scratch swizzling between brw and jay.
Changes by Ken:
- Use an immediate SIMD width when known so we don't need to re-lower
- Switch to load_simd_width_intel because it may not match
info->api_subgroup_size on Vulkan without VK_EXT_subgroup_size_control
- Stop using DWord Scattered Write messages for scratch. These take an
offset in DWords, and our offsets are now always in bytes. This also
means that we no longer create MEMORY_OPCODE_* IR with inconsistent
units of either bytes or dwords. Yikes. We use byte scattered
messages now.
fossil-db stats on Battlemage:
Instrs: 500477504 -> 500450056 (-0.01%); split: -0.01%, +0.00%
CodeSize: 7807432368 -> 7806786192 (-0.01%); split: -0.01%, +0.00%
Cycle count: 62404008370 -> 62398437734 (-0.01%); split: -0.01%, +0.00%
Fill count: 546690 -> 546695 (+0.00%); split: -0.00%, +0.00%
Max live registers: 141257956 -> 141258100 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 72350283 -> 72336544 (-0.02%)
Totals from 99 (0.01% of 1581969) affected shaders:
Instrs: 366593 -> 339145 (-7.49%); split: -7.58%, +0.09%
CodeSize: 6425936 -> 5779760 (-10.06%); split: -10.06%, +0.00%
Cycle count: 2412009876 -> 2406439240 (-0.23%); split: -0.26%, +0.03%
Fill count: 19675 -> 19680 (+0.03%); split: -0.02%, +0.04%
Max live registers: 17600 -> 17744 (+0.82%); split: -0.09%, +0.91%
Non SSA regs after NIR: 37894 -> 24155 (-36.26%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
This lets us emit NIR code based on the SIMD size. For non-fragment
stages, we'll replace it with a constant and optimize, but for FS,
we delay it until the backend.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
Just return the register instead of having multiple functions stash the
register in an array of registers. Way too much hoopla here.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40328>
Use is_scalar to know if we can do transpose loading.
Also enable vectorization if 2 intrinsics share the same source (it
means the only difference is the base).
Fixes: e14d6b535c ("brw/nir: add new intrinsics to load data from the indirect address")
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40308>
This address is delivered on Gfx12.5+ in compute/mesh/task shaders
from the command stream instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40174>
For Gfx125 workloads that use systolic mode, this might mean
an extra PIPELINE_SELECT when flipping between a compute shader
that use the mode and another that doesn't use the mode
(or vice-versa).
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40014>
It's pretty much the same mechanism, except it's a different register
location.
With this change we gain indirect loading support.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39405>
If we have extended bindless surface offset (ExBSO) support, we want to
use it. Consolidate the anv_physical_device and brw_compiler bits into
a single static inline that take devinfo.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39839>
This was incorrect for OpenCL due to the possibility of variable shared memory
existing despite shared_size == 0. Fortunately the optimization it was trying to
do should be done in NIR via nir_opt_barrier_modes so we can just drop the brw
code and move on with our merry lives. Fixes OpenCL tests on Iris:
non_uniform_work_group non_uniform_3d_barriers
basic async_strided_copy_local_to_global
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39795>
Will use the load/store_ssbo with nir_resource_intel_internal later in
this series.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35160>
The entire array is always initialized to zero and never modified.
Cuts the size of brw_wm_prog_data by 32%.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39791>
This is the shader key for the fragment shader. Nobody even knows
what the windowizer/masker unit is or does anymore. Even on Gen4-6,
"fs" is still clearer. This makes the codebase easier to read.
This is only about 15 years overdue.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
This is the program data for the fragment shader. Nobody even knows
what the windowizer/masker unit is or does anymore. Even on Gen4-6,
"fs" is still clearer. This makes the codebase easier to read.
This is only about 15 years overdue.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
This started out as dynamic configuration for MSAA related state, but
has since expanded to cover many dynamic fragment shader options.
We rename it to intel_fs_config, similar to intel_tess_config, to
better indicate its purpose.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
For VS/TES/GS, we lower all outputs to temporaries and emit copies at the
end of the shader (or for GS, at each EmitVertex() call) from those
temporaries back to real outputs. We use vec8 URB writes without
writemasking, since our output area's contents are undefined anyhow.
This is simpler than what TCS and Mesh do, which allow for output
variables to be read/written at a per-component level at any time,
with the output memory being used for cross-thread communication.
Rather than using the complicated TCS/Mesh handling and relying on
vectorization, we port the emit_urb_writes() approach to NIR. This
also takes care of emitting the VUE header with default values when
fields aren't explicitly written by the shader.
We also handle multiview in the process. It simplifies things, and
also drops another case of non-semantic IO in brw.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.
Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.
Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*
Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
We were checking for 0xf which is fine for vec4, but vec8 gets 0xff.
Either way, nothing is writemasked, so we can skip sending the mask.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
Minor adjustments to formatting of the copyright line, but keep
dates and holders. "Authors" entries that could be
obtained via Git logs were also removed.
The license in brw_disasm.c and elk_disasm.c don't match directly
any SPDX pattern I could find, so kept as is.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39503>
Makes a bunch of copy propagation and other passes work much better.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39382>
Since the move to MEMORY_*_LOGICAL the result value was being ignored, so
change to use that.
Since the conversion to use new registers, some issues were introduced:
- Even with `has_64bit_int` ADD with 64-bit immediate value is not supported;
- `dst_high` was not being filled if there was no overflow;
- Only `dst_low` returned.
Found when writing some new code involving large block loads.
Fixes: b79e85a93f ("brw: always use new registers for load address increments")
Fixes: b55f77161d ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39282>
nr_params & params array are gone.
brw_ubo_range is not stored on the prog_data structure anymore (Anv
already stored a copy of that with its own additional information)
The backend now only deals with load_push_data_intel. load_uniform &
load_push_constant have to be lowered by the driver.
Pre Gfx12.5 platforms have to provide a subgroup_id_param to specify
where the subgroup_id value is located in the push constants.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
We leave GS pushed inputs using load_per_vertex_input for now - they're
relatively simple, and using load_attribute_payload doesn't work well
since it's assumed to be convergent (for TES, FS inputs) while GS inputs
are divergent.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>