Otherwise the compiler generates an extra MOV to load the constant into
a register first because reasons. 🤷 vote_any, vote_all, vote_ieq,
and vote_feq handling already do this.
No shader-db changes on any Intel plaform.
Fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165592451 -> 165557937 (-0.02%)
Cycles: 15133282615 -> 15133059360 (-0.00%); split: -0.00%, +0.00%
Totals from 33779 (5.15% of 656115) affected shaders:
Instrs: 4396576 -> 4362062 (-0.79%)
Cycles: 86867412 -> 86644157 (-0.26%); split: -0.37%, +0.11%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
On Xe2+, we need to pack LOD with array index for cube array surfaces,
with that mlod parameter gets adjusted to different indices based on the
layout.
So track if we are packing LOD with array index in fs_inst and propogate
that to sampler lowering code to adjust param location.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
In the past, we didn't have a good solution for combining scalar loads
with a variable index plus a constant offset. To handle that, we took
our load offset and rounded it down to the nearest vec4, loaded an
entire vec4, and trusted in the backend CSE pass to detect loads from
the same address and remove redundant ones.
These days, nir_opt_load_store_vectorize() does a good job of taking
those scalar loads and combining them into vector loads for us, so we
no longer need to do this trick. In fact, it can be better not to:
our offset need only be 4 byte (scalar) aligned, but we were making it
16 byte (vec4) aligned. So if you wanted to load an unaligned vec2,
we might actually load two vec4's (___X | Y___) instead of doing a
single load at the starting offset.
This should also reduce the work the backend CSE pass has to do,
since we just emit a single VARYING_PULL_CONSTANT_LOAD instead of 4.
shader-db results on Alchemist:
- No changes in SEND count or spills/fills
- Instructions: helped 95, hurt 100, +/- 1-3 instructions
- Cycles: helped 3411 hurt 1868, -0.01% (-0.28% in affected)
- SIMD32: gained 5, lost 3
fossil-db results on Alchemist:
- Instrs: 161381427 -> 161384130 (+0.00%); split: -0.00%, +0.00%
- Cycles: 14258305873 -> 14145884365 (-0.79%); split: -0.95%, +0.16%
- SIMD32: Gained 42, lost 26
- Totals from 56285 (8.63% of 652236) affected shaders:
- Instrs: 13318308 -> 13321011 (+0.02%); split: -0.01%, +0.03%
- Cycles: 7464985282 -> 7352563774 (-1.51%); split: -1.82%, +0.31%
From this we can see that we aren't doing more loads than before
and the change is pretty inconsequential, but it requires less
optimizing to produce similar results.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27568>
Since this lowering is totally Intel specific, we don't have to
introduce the new texture source. We can use the nir_tex_src_backend1
source to pack LOD/LOD Bias and array index into 32 bit single value.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
The extra assertions are just there to help validate
pack_lod_and_array_index (in nir_lower_tex.c).
v2: Split got_lod_or_bias into two variables. This simplifies some
changes that Sagar is working on. Suggested by Sagar.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
With regards to implicit masking of the shift counts for 8- and 16-bit
types, the PRMs are incorrect. They falsely state that on Gen9+ only the
low bits of src1 matching the size of src0 (e.g., 4-bits for W or UW
src0) are used. The Bspec (backed by data from experimentation) state
that 0x3f is used for Q and UQ types, and 0x1f is used for **all** other
types.
To match the behavior expected for the NIR opcodes, explicit masks for
8- and 16-bit types must be added.
This fixes (the updated version, see crucible!138) of
func.shader.shift.int16_t on all Intel platforms. According to Karol,
this also fixes "integer_ops integer_rotate" tests in OpenCL CTS.
No shader-db or fossil-db changes on any Intel platform.
Tested-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23001>
This change moves the instance id gs_thread_payload constructor and
lowering code will simply use that.
Also, this change takes the Xe2 register width in consideration that
fixes a couple of tests involving geometry shaders with gl_InvocationID
on Xe2.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26960>
Gen5 adds some boolean conversion instructions after nir emits,
but that nir srcs don't line up with them, so reemit the boolean
conversion if we reemit the inot.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 31b5f5a51f ("nir/opt_if: Simplify if's with general conditions")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26782>
v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion.
v3: Fix float16 destination DPAS on DG2.
v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio.
v5: Rebase on !26323.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
The interpolation deltas of PS inputs now show up as a 12B vec3 (A0,
A1-A0, A2-A0) in the ATTR file, instead of the previously used 16B
format with an unused component.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This includes the render target array index, viewport index, and
front/back facing fields, which are now replicated per pair of
subspans in order to support fixed-layout multi-polygon PS dispatch.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
Note from Caio: proper handling of brw_sample_mask_reg
will appear in later patches.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
Main motivation is that for multipolygon PS shaders the i-th plane
parameter for the j-th input attribute will no longer necessarily be a
scalar, since different channels may be processing different polygons
with different input plane parameters, so simply taking a component()
of the result of interp_reg() will no longer work. Instead of
duplicating the multipolygon handling logic in every caller of
interp_reg(), fold the component() call into interp_reg() so we can
replace it with multipolygon-correct code more easily.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The fs_reg::nr field currently has a somewhat inconsistent meaning for
ATTR registers depending on the shader stage. In geometry stages it
has a similar effect as fs_reg::offset except it's expressed in 32B
units instead of B units. In the PS however it's expressed in units
of logical scalar attributes (16B on present platforms), which isn't
currently handled correctly throughout the back-end since some places
assume 32B units in all cases.
The different format of the PS setup data in multi-polygon dispatch
modes would make its behavior even more irregular, which would be
worsened further (for both geometry and pixel stages) by the register
size changes coming up on Xe2, particularly in brw_ir_fs.h helpers
where neither the devinfo struct nor the shader stage are available.
Instead of treating it as an offset simply consider different
fs_reg::nr indices as denoting disjoint spaces that can never be
accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Per Ian suggestion. Also clear up a few unnecessary casts around the code and
use `s` for fs_visitor ("shader"). Note to include a reference in ntf we need
to set it during initialization, so create an explicit mem_ctx for it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Uses the dispatch_width from the shader (fs_visitor). This was not
possible before because the dispatch_width was not part of
backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
This will allow fs_builder have a reference to an fs_visitor (a
"fs_shader" really), instead of a reference to a backend_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
The remaining users can simply create a new builder at_end() if needed.
In many places a new builder object is already being constructed, so
just give more specific instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Use the "current bld" in nir_to_brw_state more widely, and also replace it
with an annotated version when applicable (to associate it with a NIR
instruction being lowered). After filling a block we reset it back to
the original value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Create a nir_to_brw_state struct that is valid only during the
NIR to backend translation and use it for nir_ssa_values array.
This removes some NIR specific handling out of the fs_visitor -- nowadays
effectively an fs_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>