Remove references to the workaround number from the callsites. Instead
the function has "workaround" as part of the name and the number is
in its definition. Return bool for consistency with other passes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>
Rename to workaround_memory_fence_before_eot and return the already
present progress value for consistency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>
Rename fixup to lower and return the already present
progress value for consistency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>
In the past, we didn't have a good solution for combining scalar loads
with a variable index plus a constant offset. To handle that, we took
our load offset and rounded it down to the nearest vec4, loaded an
entire vec4, and trusted in the backend CSE pass to detect loads from
the same address and remove redundant ones.
These days, nir_opt_load_store_vectorize() does a good job of taking
those scalar loads and combining them into vector loads for us, so we
no longer need to do this trick. In fact, it can be better not to:
our offset need only be 4 byte (scalar) aligned, but we were making it
16 byte (vec4) aligned. So if you wanted to load an unaligned vec2,
we might actually load two vec4's (___X | Y___) instead of doing a
single load at the starting offset.
This should also reduce the work the backend CSE pass has to do,
since we just emit a single VARYING_PULL_CONSTANT_LOAD instead of 4.
shader-db results on Alchemist:
- No changes in SEND count or spills/fills
- Instructions: helped 95, hurt 100, +/- 1-3 instructions
- Cycles: helped 3411 hurt 1868, -0.01% (-0.28% in affected)
- SIMD32: gained 5, lost 3
fossil-db results on Alchemist:
- Instrs: 161381427 -> 161384130 (+0.00%); split: -0.00%, +0.00%
- Cycles: 14258305873 -> 14145884365 (-0.79%); split: -0.95%, +0.16%
- SIMD32: Gained 42, lost 26
- Totals from 56285 (8.63% of 652236) affected shaders:
- Instrs: 13318308 -> 13321011 (+0.02%); split: -0.01%, +0.03%
- Cycles: 7464985282 -> 7352563774 (-1.51%); split: -1.82%, +0.31%
From this we can see that we aren't doing more loads than before
and the change is pretty inconsequential, but it requires less
optimizing to produce similar results.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27568>
This change moves the instance id gs_thread_payload constructor and
lowering code will simply use that.
Also, this change takes the Xe2 register width in consideration that
fixes a couple of tests involving geometry shaders with gl_InvocationID
on Xe2.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26960>
Implements integer dot product lowering both with and without
DP4A. Implements half-float dot product lowering.
There are a couple FINISHME comments describing future optimizations.
v2: Add a brw_compiler::lower_dpas flag to track when the lowering
should be applied.
v3: Use is_null() instead of checking file != ARF. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
This extends fetch_payload_reg() to support fetching vector registers
like barycentrics stored on the payload as a contiguous sequence of
SIMD-wide vectors. In the SIMD32 case, both halves of the SIMD16
vector registers specified as regs[0] and regs[1] are zipped to
construct a single SIMD32-wide vector.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
It's not replicated per SIMD16 half of a SIMD32 thread on the PS
payload. Make fs_visitor::payload::depth_w_coef_reg a scalar rather
than an array.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Main motivation is that for multipolygon PS shaders the i-th plane
parameter for the j-th input attribute will no longer necessarily be a
scalar, since different channels may be processing different polygons
with different input plane parameters, so simply taking a component()
of the result of interp_reg() will no longer work. Instead of
duplicating the multipolygon handling logic in every caller of
interp_reg(), fold the component() call into interp_reg() so we can
replace it with multipolygon-correct code more easily.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
To allow specifying the number of polygons that will be processed per
SIMD thread.
Rework:
* Jordan: Add needs_register_pressure following
09cdb77a92 ("intel/fs: report max register pressure in shader stats")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
And use it in ANV in order to return a "SIMDNxM" name from
vkGetPipelineExecutablePropertiesKHR.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
In the past, multiple writes to a single register were pretty common,
but since we've transitioned to NIR, and leave the IR in SSA form for
everything not captured in a phi-web, the pattern of generating new
temporary registers at each step is a lot more common.
This pass isn't nearly as useful now. Across fossil-db on Alchemist,
this affects only 0.55% of shaders, which fall into two cases:
- Coarse pixel shading pixel-X/Y setup. There are a few cases where
we write a partial calculation into a register, then have a second
instruction read that as a source and overwrite it as a destination.
While we could use a temporary here, it doesn't actually help with
register pressure at all, since there's the same amount of values
live at both instructions regardless. So while this pass kicks in,
it doesn't do anything useful.
- Geometry shader control data bits (5 shaders total). We track masks
for handling EndPrimitive in a single register across the program,
and apparently in some cases can split the live range. However, it's
a single register...only in geometry shaders...which use EndPrimitive.
None of them appear to be in danger of spilling, either. So this tiny
benefit doesn't seem to justify the cost of running the pass.
So, just throw it out. It's not worth keeping.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26343>