The entire array is always initialized to zero and never modified.
Cuts the size of brw_wm_prog_data by 32%.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39791>
This is the shader key for the fragment shader. Nobody even knows
what the windowizer/masker unit is or does anymore. Even on Gen4-6,
"fs" is still clearer. This makes the codebase easier to read.
This is only about 15 years overdue.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
This is the program data for the fragment shader. Nobody even knows
what the windowizer/masker unit is or does anymore. Even on Gen4-6,
"fs" is still clearer. This makes the codebase easier to read.
This is only about 15 years overdue.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
This started out as dynamic configuration for MSAA related state, but
has since expanded to cover many dynamic fragment shader options.
We rename it to intel_fs_config, similar to intel_tess_config, to
better indicate its purpose.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39748>
For VS/TES/GS, we lower all outputs to temporaries and emit copies at the
end of the shader (or for GS, at each EmitVertex() call) from those
temporaries back to real outputs. We use vec8 URB writes without
writemasking, since our output area's contents are undefined anyhow.
This is simpler than what TCS and Mesh do, which allow for output
variables to be read/written at a per-component level at any time,
with the output memory being used for cross-thread communication.
Rather than using the complicated TCS/Mesh handling and relying on
vectorization, we port the emit_urb_writes() approach to NIR. This
also takes care of emitting the VUE header with default values when
fields aren't explicitly written by the shader.
We also handle multiview in the process. It simplifies things, and
also drops another case of non-semantic IO in brw.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39666>
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.
Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.
Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*
Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
We were checking for 0xf which is fine for vec4, but vec8 gets 0xff.
Either way, nothing is writemasked, so we can skip sending the mask.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39250>
Minor adjustments to formatting of the copyright line, but keep
dates and holders. "Authors" entries that could be
obtained via Git logs were also removed.
The license in brw_disasm.c and elk_disasm.c don't match directly
any SPDX pattern I could find, so kept as is.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39503>
Makes a bunch of copy propagation and other passes work much better.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39382>
Since the move to MEMORY_*_LOGICAL the result value was being ignored, so
change to use that.
Since the conversion to use new registers, some issues were introduced:
- Even with `has_64bit_int` ADD with 64-bit immediate value is not supported;
- `dst_high` was not being filled if there was no overflow;
- Only `dst_low` returned.
Found when writing some new code involving large block loads.
Fixes: b79e85a93f ("brw: always use new registers for load address increments")
Fixes: b55f77161d ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39282>
nr_params & params array are gone.
brw_ubo_range is not stored on the prog_data structure anymore (Anv
already stored a copy of that with its own additional information)
The backend now only deals with load_push_data_intel. load_uniform &
load_push_constant have to be lowered by the driver.
Pre Gfx12.5 platforms have to provide a subgroup_id_param to specify
where the subgroup_id value is located in the push constants.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38975>
We leave GS pushed inputs using load_per_vertex_input for now - they're
relatively simple, and using load_attribute_payload doesn't work well
since it's assumed to be convergent (for TES, FS inputs) while GS inputs
are divergent.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38990>
(Split by Ken from a larger patch originally written by Lionel.)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
Before DG2, the value the HW gives us seems to be backwards, but
since DG2 this is supposed to be supported just fine.
However, due to Wa_22012766191, enable it only for Xe2 and up.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38641>
We were using this for indirect loads of the shader input thread
payload, but there's no reason we can't use it for constant access
too. In this case we can just MOV from the ATTR file directly
without a special opcode that turns into MOV_INDIRECT later.
We also allow it to load multiple components now. This is useful
for say, returning vec4 pushed inputs. And, we allow it in more
stages than just the fragment stage.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
We're going to change the intrinsic to a load(...) which puts "load" in
the name. Also, it's just more consistent with our usual terminology.
We also rename the corresponding backend opcode so they remain matched.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
These work the same regardless of stage.
v2 (Ken): Rebase, move from mesh to all stages, add reorderable load
variant, allow channel masks to be non-constant even on Xe2.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Be consistent with lowering that happens after, so that it gets a full
vector register and can stride into it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
Allows us to better determine if we need Z/W payload delivery.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36392>
This is done by grep ALIGN( to align(
docs,*.xml,blake3 is excluded
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38365>
We add a bunch of new helpers to avoid the need to touch >parent_instr,
including the full set of:
* nir_def_is_*
* nir_def_as_*_or_null
* nir_def_as_* [assumes the right instr type]
* nir_src_is_*
* nir_src_as_*
* nir_scalar_is_*
* nir_scalar_as_*
Plus nir_def_instr() where there's no more suitable helper.
Also an existing helper is renamed to unify all the names, while we're
churning the tree:
* nir_src_as_alu_instr -> nir_src_as_alu
..and then we port the tree to use the helpers as much as possible, using
nir_def_instr() where that does not work.
Acked-by: Marek Olšák <maraeo@gmail.com>
---
To eliminate nir_def::parent_instr we need to churn the tree anyway, so I'm
taking this opportunity to clean up a lot of NIR patterns.
Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38313>
Xe2 uses byte offsets rather than OWord offsets. We've been storing the
per-slot offsets in bytes on Xe2 for a while, but kept the global offset
immediate in OWords for some reason, choosing to lower it during logical
send lowering.
This patch makes both offsets (global immediate, per-slot) in the same
units, so they could be added together if necessary without scaling.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
I noticed that our backend was completely ignoring writemasks, despite
them appearing on many of the intrinsics we're implementing.
Rhys Perry pointed out that nir_lower_mem_access_bitsizes is removing
all non-trivial writemasking today, so ssbo/global/shared/scratch/etc.
stores should only ever see all components enabled. Which means what
we're doing is legitimate, if non-obvious. Add an assert to make it
obvious.
Thanks a lot to Rhys for helping me rediscover what made this work.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
Release builds are noisy about flush_type and scope being used
uninitialized, even though they are always set.
Initialize them to the final else values to make GCC happy.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38357>
Instead of having abstracted opcodes, we target directly the HW format
at the NIR translation.
The payload description gives us the order of the payload sources (we
can use that for pretty printing) and we don't have to have a
complicated scheme in the logical send lowering for the ordering. All
we have to do is build the header if needed as well as the descriptors.
PTL Fossil-db stats:
Totals from 66759 (13.54% of 492917) affected shaders:
Instrs: 44289221 -> 43957404 (-0.75%); split: -0.81%, +0.06%
Send messages: 2050378 -> 2042607 (-0.38%)
Cycle count: 3878874713 -> 3712848434 (-4.28%); split: -4.44%, +0.16%
Max live registers: 8773179 -> 8770104 (-0.04%); split: -0.06%, +0.03%
Max dispatch width: 1677408 -> 1707952 (+1.82%); split: +1.85%, -0.03%
Non SSA regs after NIR: 11407821 -> 11421041 (+0.12%); split: -0.03%, +0.15%
GRF registers: 5686983 -> 5838785 (+2.67%); split: -0.24%, +2.91%
LNL Fossil-db stats:
Totals from 57911 (15.72% of 368381) affected shaders:
Instrs: 39448036 -> 38923650 (-1.33%); split: -1.41%, +0.08%
Subgroup size: 1241360 -> 1241392 (+0.00%)
Send messages: 1846696 -> 1845137 (-0.08%)
Cycle count: 3834818910 -> 3784003027 (-1.33%); split: -2.33%, +1.00%
Spill count: 21866 -> 22168 (+1.38%); split: -0.07%, +1.45%
Fill count: 59324 -> 60339 (+1.71%); split: -0.00%, +1.71%
Scratch Memory Size: 1479680 -> 1483776 (+0.28%)
Max live registers: 7521376 -> 7447841 (-0.98%); split: -1.04%, +0.06%
Non SSA regs after NIR: 9744605 -> 10113728 (+3.79%); split: -0.01%, +3.80%
Only 2 titles negatively impacted (spilling) :
- Shadow of the Tomb Raider
- Red Dead Redemption 2
All impacted shaders were already spilling.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37171>