Commit graph

680 commits

Author SHA1 Message Date
Ian Romanick
bf5d82654a intel/brw: Fix optimize_extract_to_float for i2f of unsigned extract
Fixes fs-uint-to-float-of-extract-int8.shader_test and
fs-uint-to-float-of-extract-int16.shader_test added by piglit!883.

No shader-db or fossil-db changes on any Intel platform.

v2: Expand the comment explaining the potential problem. Suggested by
Caio.

Fixes: 29ce110be6 ("i965/fs: Remove extract virtual opcodes.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>
2024-05-03 15:01:43 -07:00
Kenneth Graunke
1b54b4fad5 intel/brw: Use VEC for NIR vec*() sources
This writes the whole destination register in a single builder call.
Eventually, VEC will write the whole destination register in one go,
allowing better visibility into how it is defined.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:50 -07:00
Kenneth Graunke
d4563747d9 intel/brw: Use VEC for output stores
This writes the whole destination register in a single builder call.
Eventually, VEC will write the whole destination register in one go,
allowing better visibility into how it is defined.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:49 -07:00
Kenneth Graunke
f0c29c9b71 intel/brw: Use VEC for FS outputs
This writes the whole destination register in a single builder call.
Eventually, VEC will write the whole destination register in one go,
allowing better visibility into how it is defined.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:49 -07:00
Kenneth Graunke
cbe7a13f2b intel/brw: Use VEC for TCS/TES/GS input/output loads
This writes the whole destination register in a single builder call.
Eventually, VEC will write the whole destination register in one go,
allowing better visibility into how it is defined.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:48 -07:00
Kenneth Graunke
a94e1bd0ac intel/brw: Use VEC for gl_FragCoord
This writes the whole destination register in a single builder call.
Eventually, VEC will write the whole destination register in one go,
allowing better visibility into how it is defined.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:47 -07:00
Kenneth Graunke
d0a24496fd intel/brw: Use VEC for load_const
This writes the whole destination register in a single builder call.
Eventually, VEC will write the whole destination register in one go,
allowing better visibility into how it is defined.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:45 -07:00
Kenneth Graunke
c194df565a intel/brw: Don't include unnecessary undefined values in texture results
When emitting a sampler message, we allocate a temporary destination
large enough to hold 4 values (or 5 for sparse).  This is the maximum
size needed to hold any result.  However, we shrink the size written by
the sampler message to skip writing any trailing components that NIR
tells us are never read.  So we may not write the entire temporary.

The NIR texture instruction has a destination VGRF which is sized
assuming that all components are present.  We issue a LOAD_PAYLOAD
instruction to copy our sampler result temporary to the NIR destination.
When we reduce the response length of the sampler messages, then some of
these temporary components have undefined values.  The correct way to
indicate that is by using a BAD_FILE source.  Unfortunately, we were
naively reading offsets of the temporary that were never written, but
are still part of a larger VGRF.  This complicates things.

For example, sampling and only using RGB (not RGBA) was producing this:

   txl_logical(8) (written: 3) vgrf3+0.0:F, ...
   undef(8) (written: 4) vgrf4:UD
   load_payload(8) (written: 4) vgrf4:F, vgrf3+0.0:F, vgrf3+1.0:F, vgrf3+2.0:F, vgrf3+3.0:F

The last source, vgrf3+3.0:F, is undefined, and should be BAD_FILE.
Doing so allows VGRF splitting and other optimizations to work better.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28971>
2024-04-30 17:16:41 -07:00
Kenneth Graunke
674e89953f intel/brw: Use new builder helpers that allocate a VGRF destination
With the previous commit, we now have new builder helpers that will
allocate a temporary destination for us.  So we can eliminate a lot
of the temporary naming and declarations, and build up expressions.

In a number of cases here, the code was confusingly mixing D-type
addresses with UD-immediates, or expecting a UD destination.  But the
underlying values should always be positive anyway.  To accomodate the
type inference restriction that the base types much match, we switch
these over to be purely UD calculations.  It's cleaner to do so anyway.

Compared to the old code, this may in some cases allocate additional
temporary registers for subexpressions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Kenneth Graunke
319ba85e10 intel/brw: Add builder helpers for math functions
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Kenneth Graunke
f5473e6edd intel/brw: Don't use inst return value when it isn't needed
We just want to emit an instruction, but we don't need to do anything
further with it, so we don't need to store the resulting inst pointer
anywhere.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Lionel Landwerlin
1bbe2d9833 intel/brw: fixup wm_prog_data_barycentric_modes()
Always select sample barycentric when persample dispatch is unknown at
compile time and let the payload adjustments feed the expected value
based on dispatch.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>
2024-04-26 05:13:02 +00:00
Kenneth Graunke
545bb8fb6f intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_*
Both of these helpers do the same thing.  We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
c22f44ff07 intel/brw: Replace brw_reg_type_from_bit_size by brw_type_with_size
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
f523bfcf90 intel/brw: Reindent after shortening BRW_REGISTER_TYPE_* to BRW_TYPE_*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Karol Herbst
d22f936019 nir: remove workgroup_id_zero_base
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
2024-04-24 20:18:49 +00:00
Jordan Justen
4e5ed7ebd5 intel/brw: Avoid getting a stride of 0 for nir_intrinsic_exclusive_scan
Ref: 671745b616 ("intel/fs: Don't allow 0 stride on MOV destination")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28821>
2024-04-18 23:03:57 +00:00
Kenneth Graunke
e637c63239 intel/brw: Make an fs_builder::SYNC helper
We always want a null destination, so this saves some typing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
d5b8cec7a2 intel/brw: Replace FS_OPCODE_LINTERP with BRW_OPCODE_PLN
We no longer support the old LINE+MAC lowering, and we already lower
this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always
corresponds the PLN.  The only catch is that LINTERP's operands are
reversed from PLN, so we have to switch them.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
46a7ee772e intel/brw: Drop default size of 1 from bld.vgrf() calls
This isn't necessary as 1 is the default value for the parameter.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
217d56e9b1 intel/brw: Delete fs_visitor::vgrf helper
Just use fs_builder::vgrf instead of the older glsl_type-based one.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Ian Romanick
6d85f7129a intel/brw/xe2+: DPAS must be SIMD16 now
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29 21:12:32 +00:00
Ian Romanick
a8115221e5 nir: intel/brw: Change the order of sources for nir_dpas_intel
It was by pure luck that all sources (and the result) of nir_dpas_intel
had the same number of components. It is possible to support matrix
sizes where the accumlator matrix and the result matrix are larger
(e.g., 16x8 * 8x16 = 16x16).

This breaks all of the assumptions of NIR's infrastructure for code
generating intrinsics. Fix the by making the accumulator matrix be the
first source. The accumulator and the result will always have the same
dimensions (due to rules of matrix multiplication) and the same type
(due to restructions of the cooperative matrix extension). This forces
them to have the same number of components.

This doesn't fix all the potential problems. NIR expects that all
0-sized sources will have the same number of components. This just
ensures that the result has the correct number of components.

Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29 21:12:32 +00:00
Rohan Garg
df3a1348d1 intel/brw: minor rework to de duplicate variable assignment
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>
2024-03-28 19:53:40 +00:00
Kenneth Graunke
d473004576 intel/fs: Avoid generating useless UNDEFs for every SSA def
Emitting UNDEF is only necessary when the instructions we generate to
produce the NIR def are considered partial writes.  By adding a simple
check (adapted from fs_inst::is_partial_write()), we can avoid creating
loads of unnecessary UNDEFs that we have to clean up later.

Our first dead code elimination pass does get rid of them pretty
quickly, but this should save memory and time during our first
split_virtual_grfs and dead_code_elimination passes.

This generates roughly 30% fewer instructions at the beginning.

Improves compilation time of shaders:
- Rise of the Tomb Raider: -3.51563% +/- 0.103951% (n=7)
- Borderlands 3: -3.64422% +/- 0.300951% (n=7).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28169>
2024-03-19 19:32:18 +00:00
Caio Oliveira
b22879e753 intel/brw: Use predicates for quad_vote_any and quad_vote_all when available
Up until Xe2, we can use the predicates ANY4H and ALL4H to achieve the
same result with less instructions.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27279>
2024-03-19 18:41:15 +00:00
Caio Oliveira
857e62e6ac intel/brw: Implement quad_vote_any and quad_vote_all
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27279>
2024-03-19 18:41:15 +00:00
Ian Romanick
671745b616 intel/fs: Don't allow 0 stride on MOV destination
Outside SIMD1 instructions, a destination stride of zero doesn't make
any sense. When such strides exist, they would be fixed by the FS
generator. Currently the only place that intentionally generates such a
stride is setup_barrier_message_payload_gfx125, and this commit changes
that.

The existence of a zero stride that won't really be a zero stride causes
a variety of problems with other optimization passes. Those passes don't
know that 0 actually means 1, and they make incorrect assumptions about
sizes written, etc.

The assertion helped catch many bugs in some other work in progress that
tries to store convergent values in SIMD8 registers regardless of the
dispatch width. That code would accidentally generate destination
strides of zero.

v2: Check stride differently depending on register file. Suggested by
Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28256>
2024-03-19 18:17:59 +00:00
Kenneth Graunke
97aec40111 intel/brw: Emit better code for read_invocation(x, constant)
For something as basic as read_invocation(x, 0), we were emitting:

   mov(8) vgrf67:D, 0d
   find_live_channel(8) vgrf236:UD, NoMask
   broadcast(8) vgrf237:D, vgrf67:D, vgrf236+0.0<0>:UD NoMask
   broadcast(8) vgrf235+0.0:W, vgrf197+0.0:W, vgrf237+0.0<0>:D NoMask
   mov(8) vgrf234+0.0:W, vgrf235+0.0<0>:W

This is way overcomplicated - if the invocation is a constant, we can
simply emit a single MOV which reads the desired channel index.  Not
only that, but it's difficult to clean up:

1. If this expression appears multiple times, CSE will find all the
   redundant emit_uniformize(invocation) and get rid of the duplicate
   (find_live_channel+broadcast) on future instructions.
2. Copy propagation will put the 0d directly in the first broadcast.
3. Dead code elimination will get rid of the vgrf67 temp holding 0.
4. Algebraic will replace the first broadcast(x, 0) with a MOV.
5. Copy propagation will put the 0d directly in the second broadcast.
6. Dead code elimination will get rid of the vgrf237 temp.
7. Algebraic will replace the second broadcast(x, 0) with a MOV.
8. Copy propagation will finally combine the two MOVs

That's at least 7-8 optimization passes and several loops through the
same passes just to clean up something we can do trivially.

Cuts 25% of the of the optimizer steps in pipeline 22200210259a2c9c
of fossil-db/google-meet-clvk/BgBlur.1f58fdf742c27594.1 (31 to 23).

Shortens compilation time of the google-meet-clvk/Relight pipeline by
-2.87717% +/- 0.509162% (n=150).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28097>
2024-03-12 21:58:27 +00:00
Rohan Garg
73d98848fa intel/compiler: Xe2+ can do URB load/store with a byte offset
Thanks to Ken for suggesting this URB refactoring change and pointing
out that the LSC can operate on the byte offset granularity.

This should fix the geometry shader test cases where we have more than
32 vertices since previously we were failing to write the correct
control data bits because of incorrect write mask.

Shader-db results for Xe2:

total instructions in shared programs: 153475 -> 153437 (-0.02%)
instructions in affected programs: 1374 -> 1336 (-2.77%)
helped: 11
HURT: 0
helped stats (abs) min: 3 max: 5 x̄: 3.45 x̃: 3
helped stats (rel) min: 1.67% max: 4.92% x̄: 3.23% x̃: 2.70%
95% mean confidence interval for instructions value: -3.92 -2.99
95% mean confidence interval for instructions %-change: -4.10% -2.36%
Instructions are helped.

total loops in shared programs: 140 -> 140 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 16002649 -> 16002329 (<.01%)
cycles in affected programs: 9174 -> 8854 (-3.49%)
helped: 11
HURT: 0
helped stats (abs) min: 22 max: 38 x̄: 29.09 x̃: 32
helped stats (rel) min: 2.62% max: 5.54% x̄: 3.78% x̃: 3.85%
95% mean confidence interval for cycles value: -33.56 -24.62
95% mean confidence interval for cycles %-change: -4.48% -3.08%
Cycles are helped.

total spills in shared programs: 52 -> 52 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 94 -> 94 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

total sends in shared programs: 4240 -> 4240 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Rework: (Sagar)
- Adjust offset/indirect offset calculation.
- Add shader-db results
- Always calculate dword index
- Drop changes for indirect writes

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27602>
2024-03-01 16:11:30 +00:00
Caio Oliveira
865ef36609 intel/brw: Remove brw_shader.h
Find a better home for its existing content.  Some functions are
now just static functions at the usage sites.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:06 +00:00
Caio Oliveira
d9552fccf2 intel/brw: Remove extra stage_prog_data field in fs_visitor
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:06 +00:00
Caio Oliveira
634dff403f intel/brw: Fold backend_shader into fs_visitor
The base class was used when we had vec4, but now we can fold it with
its only subclass.  Declare fs_visitor now as a struct to be able to
forward declare for C code without causing errors due to class/struct
being mixed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:05 +00:00
Lionel Landwerlin
99047451c9 intel/fs: add plumbing for embedded samplers
We can address samplers from 3 different locations :
   - binding table
   - dynamic state base address
   - bindless sampler base address (only Gfx11+)

Here we allow samplers to be address from the dynamic state base
address with the embedded sampler flag.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22151>
2024-02-29 07:05:06 +00:00
Caio Oliveira
aff961f423 intel/brw: Remove Gfx8- fields from *_prog_key structs
Those are not used or relevant anymore.  Also update Iris accordingly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
071e9f49f1 intel/brw: Remove F16TO32 and F32TO16 opcodes
These are done with MOVs and appropriate types in Gfx9+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
6a03280af1 intel/brw: Remove Gfx8- code from NIR conversion
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Ian Romanick
8fb37ef985 intel/fs: Add fast path for ballot(true)
This doesn't help very much now. A later commit adds a NIR optimization
pass, tentatively called nir_opt_uniform_subgroup, that converts many
kinds of subgroup operations to things involving
bitCount(ballot(true)). This commit makes a huge difference in the
results of that later commit.

No shader-db changes on any Intel platform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165558033 -> 165557519 (-0.00%)
Cycles: 15156188362 -> 15156178922 (-0.00%); split: -0.00%, +0.00%

Totals from 299 (0.05% of 656117) affected shaders:
Instrs: 88293 -> 87779 (-0.58%)
Cycles: 3709498 -> 3700058 (-0.25%); split: -0.28%, +0.03%

v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:37:46 -08:00
Ian Romanick
c42830c64a intel/fs: Use constant of same type to write flag
Otherwise the compiler generates an extra MOV to load the constant into
a register first because reasons. 🤷 vote_any, vote_all, vote_ieq,
and vote_feq handling already do this.

No shader-db changes on any Intel plaform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165592451 -> 165557937 (-0.02%)
Cycles: 15133282615 -> 15133059360 (-0.00%); split: -0.00%, +0.00%

Totals from 33779 (5.15% of 656115) affected shaders:
Instrs: 4396576 -> 4362062 (-0.79%)
Cycles: 86867412 -> 86644157 (-0.26%); split: -0.37%, +0.11%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:37:15 -08:00
Ian Romanick
56a3f031f4 intel/fs: Delete stale comment in nir_intrinsic_ballot implementation
Discard actually uses f1.x, so this implementation of ballot is fine.

Trivial.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:36:34 -08:00
Sagar Ghuge
6f0ab5e4d5 intel/compiler: Add texture gather offset LOD/Bias message support
v2: (Ian)
- Space formatting on conditional statement

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Sagar Ghuge
79af0ac29a intel/compiler: Add gather4_i/l/[_c]/b sampler message
v2: (Ian)
- Format comment

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Sagar Ghuge
b34b2bdff3 intel/compiler: Adjust sample_b parameter according to new layout
On Xe2+, we need to pack LOD with array index for cube array surfaces,
with that mlod parameter gets adjusted to different indices based on the
layout.

So track if we are packing LOD with array index in fs_inst and propogate
that to sampler lowering code to adjust param location.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Kenneth Graunke
c12300844d intel/fs: Don't rely on CSE for VARYING_PULL_CONSTANT_LOAD
In the past, we didn't have a good solution for combining scalar loads
with a variable index plus a constant offset.  To handle that, we took
our load offset and rounded it down to the nearest vec4, loaded an
entire vec4, and trusted in the backend CSE pass to detect loads from
the same address and remove redundant ones.

These days, nir_opt_load_store_vectorize() does a good job of taking
those scalar loads and combining them into vector loads for us, so we
no longer need to do this trick.  In fact, it can be better not to:
our offset need only be 4 byte (scalar) aligned, but we were making it
16 byte (vec4) aligned.  So if you wanted to load an unaligned vec2,
we might actually load two vec4's (___X | Y___) instead of doing a
single load at the starting offset.

This should also reduce the work the backend CSE pass has to do,
since we just emit a single VARYING_PULL_CONSTANT_LOAD instead of 4.

shader-db results on Alchemist:
- No changes in SEND count or spills/fills
- Instructions: helped 95, hurt 100, +/- 1-3 instructions
- Cycles: helped 3411 hurt 1868, -0.01% (-0.28% in affected)
- SIMD32: gained 5, lost 3

fossil-db results on Alchemist:
- Instrs: 161381427 -> 161384130 (+0.00%); split: -0.00%, +0.00%
- Cycles: 14258305873 -> 14145884365 (-0.79%); split: -0.95%, +0.16%
- SIMD32: Gained 42, lost 26

- Totals from 56285 (8.63% of 652236) affected shaders:
- Instrs: 13318308 -> 13321011 (+0.02%); split: -0.01%, +0.03%
- Cycles: 7464985282 -> 7352563774 (-1.51%); split: -1.82%, +0.31%

From this we can see that we aren't doing more loads than before
and the change is pretty inconsequential, but it requires less
optimizing to produce similar results.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27568>
2024-02-20 23:16:27 -08:00
Caio Oliveira
7d85d2c7fd intel/compiler: Rename DISPATCH_MODE_* enums to INTEL_DISPATCH_MODE_*
And move to the intel_shader_enums.h file.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27475>
2024-02-14 22:31:23 -08:00
Caio Oliveira
26dd1f0bba intel/compiler: Rename BRW_WM_MSAA_* enums to INTEL_MSAA_*
And move to the intel_shader_enums.h file.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27475>
2024-02-14 22:31:23 -08:00
Jordan Justen
820e04ead4 intel/compiler: Implement nir_intrinsic_load_topology_id_intel for xe2
Rework:
 * Sagar: Rework BRW_TOPOLOGY_ID_DSS, BRW_TOPOLOGY_ID_EU_THREAD_SIMD
   calculations

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27529>
2024-02-14 20:07:13 +00:00
Sagar Ghuge
15129c7634 intel/compiler: Use nir_tex_src_backend1 to pack LOD and array index
Since this lowering is totally Intel specific, we don't have to
introduce the new texture source. We can use the nir_tex_src_backend1
source to pack LOD/LOD Bias and array index into 32 bit single value.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
2024-02-12 21:25:48 +00:00
Ian Romanick
84de7a88d3 intel/compiler/xe2: Emit texture instructions w/ combined LOD and array index
The extra assertions are just there to help validate
pack_lod_and_array_index (in nir_lower_tex.c).

v2: Split got_lod_or_bias into two variables. This simplifies some
changes that Sagar is working on. Suggested by Sagar.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
2024-02-02 02:39:10 +00:00