Commit graph

3427 commits

Author SHA1 Message Date
Kenneth Graunke
674e89953f intel/brw: Use new builder helpers that allocate a VGRF destination
With the previous commit, we now have new builder helpers that will
allocate a temporary destination for us.  So we can eliminate a lot
of the temporary naming and declarations, and build up expressions.

In a number of cases here, the code was confusingly mixing D-type
addresses with UD-immediates, or expecting a UD destination.  But the
underlying values should always be positive anyway.  To accomodate the
type inference restriction that the base types much match, we switch
these over to be purely UD calculations.  It's cleaner to do so anyway.

Compared to the old code, this may in some cases allocate additional
temporary registers for subexpressions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Kenneth Graunke
4c2c49f7bc intel/brw: Add builder helpers that allocate temporary destinations
In many cases, we calculate an expression by generating a series of
instructions.  We'd either overwrite the same register repeatedly,
or call vgrf(BRW_TYPE_X) repeatedly to allocate temporaries for each
intermediate step.  In many cases, we overwrote the same register simply
because allocating and naming temporaries for each step was annoying.

This commit adds new builder helpers that will allocate a temporary
destination for you, using simple type interference: unary operations
use the source type, and binary operations require a matching base type
and return the largest of the two types.

The helpers return the destination register, allowing us to write in an
expression-tree style, chaining together builder operations to produce
whole values.  Sort of like nir_builder.  We still optionally will write
out the fs_inst pointer in case the caller wants to do things like set
predicates or saturation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Kenneth Graunke
319ba85e10 intel/brw: Add builder helpers for math functions
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Kenneth Graunke
cf8ed9925f intel/brw: Make a helper for finding the largest of two types
Some instructions can operate on mixed types.  Typically this is
something like a binary operation with UD and UW sources resulting
in a UD destination.  In order to make it easier to find the result
type of such operations, let's make a type helper that returns the
larger of the two types (but requires the base type to match).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Kenneth Graunke
f5473e6edd intel/brw: Don't use inst return value when it isn't needed
We just want to emit an instruction, but we don't need to do anything
further with it, so we don't need to store the resulting inst pointer
anywhere.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
2024-04-29 07:51:45 +00:00
Lionel Landwerlin
b80dd22d57 intel/brw: add min_sample_shading value in wm_prog_data
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>
2024-04-26 05:13:03 +00:00
Lionel Landwerlin
bdfa25dc77 intel/fs: decouple alphaToCoverage from per sample dispatch
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>
2024-04-26 05:13:03 +00:00
Lionel Landwerlin
1bbe2d9833 intel/brw: fixup wm_prog_data_barycentric_modes()
Always select sample barycentric when persample dispatch is unknown at
compile time and let the payload adjustments feed the expected value
based on dispatch.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27803>
2024-04-26 05:13:02 +00:00
Kenneth Graunke
df6cfb4dd0 intel/brw: Rename brw_reg_type_to_hw_type to brw_type_encode
And similarly brw_hw_type_to_reg_type to brw_type_decode.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
9205f6ff51 intel/brw: Combine a1/a16 3src type decoding functions
Align16 is only used on Gfx9, while Align1 is used on Gfx11+.  We can
decode both kinds of encodings in the same function with a simple
devinfo check.  One snag is that the align16 encodings didn't have a
separate exec_type field, but we can just pass 0.

This lets us have a single function named brw_type_decode_for_3src,
which is much less of a mouthful.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
28034aac34 intel/brw: Combine a1/a16 3src type encoding functions
Align16 is only used on Gfx9, while Align1 is used on Gfx11+.  We can
handle both encodings in the same function with a simple devinfo check,
and give that function a simple name like brw_type_encode_for_3src.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
545bb8fb6f intel/brw: Replace type_sz and brw_reg_type_to_size with brw_type_size_*
Both of these helpers do the same thing.  We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
c22f44ff07 intel/brw: Replace brw_reg_type_from_bit_size by brw_type_with_size
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
007d891239 intel/brw: Use newer brw_type_is_* shorter names
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
f523bfcf90 intel/brw: Reindent after shortening BRW_REGISTER_TYPE_* to BRW_TYPE_*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
9d8f2c4421 intel/brw: Rework BRW_REGISTER_TYPE's representation semantics
In ancient days, we directly used the hardware register type encodings
throughout the compiler.  As more GPU generations came out, encodings
shifted, and we moved to an abstract enum that we could encode/decode
to a particular GPU's hardware encoding.  But there was no particular
meaning behind any particular value.

One downside to this approach is that we end up with switch statements
galore.  Want to know a type's size?  Switch.  Convert a unsigned type
to a signed one?  Switch.  Get a type with the same base type, but
different bit size?  Switch.  This is both inefficient and inconvenient.

In contrast, nir_alu_type takes a nicer approach - the type encoding has
certain bits representing the base type, and others encoding the size of
the type.  Switching base types or sizes is a simple matter of masking
out the relevant field and substituting a different one.

Tigerlake's encoding adopts a similar approach: two bits represent the
size as a 2-bit unsigned number n, where the bit size is (8 * 2^n).
Two more bits represent the base type.  Past encodings were a bit ad hoc
as new data types were added over time, but Gfx12 is organized (mostly).

This patch converts our brw_reg_type enum over to a new system that's
patterned after the Tigerlake style (for easy conversion) while
deviating in a few ways that make our vector immediate type size
handling simpler.  Should we add additional base types, we're likely
to continue deviating.  Still, converting is much simpler.

Type size calculations (which are performed all the time) are now a
simple mask and shift, instead of a switch.

We also adopt the name BRW_TYPE_* instead of BRW_REGISTER_TYPE_* because
it's much shorter and easier to type.  Similarly, we create new helper
functions named brw_type_* for working with these types, with a cleaner
naming convention.  Legacy names still exist but will we dropped over
the next few patches as pieces get cleaned up.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
c45e235df5 intel/brw: Drop NF type support
Icelake removed the PLN instruction for interpolating fragment shader
inputs, instead adding a special "Native Float" (NF) data type which
was a 66-bit floating point data type that could only be used with the
accumulator.  On Tigerlake, they dropped NF support in favor of just
doing the interpolation with MAD instructions.

We stopped using NF years ago (commit 9ea90aae1e),
instead just using the fs_visitor::lower_linterp() pass to emit MADs.

Since this existed only for a short time, and had very limited utility,
we drop it from the compiler.  One downside is that we can no longer
disassemble Icelake shaders containing NF types properly, but I doubt
anyone really minds.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Kenneth Graunke
1c6f863fc7 intel/brw: Delete gfx10 table for align1 3src type encoding
align1 three-source instructions do not exist on gfx9, and this
compiler does not support gfx10.  So the oldest case is gfx11.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Karol Herbst
d22f936019 nir: remove workgroup_id_zero_base
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
2024-04-24 20:18:49 +00:00
Sagar Ghuge
46e4354940 intel/compiler: Disassemble mlen/rlen/ex_mlen in units of registers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28637>
2024-04-23 23:46:26 +00:00
Caio Oliveira
ff89e83178 intel/brw: Lower VGRFs to FIXED_GRFs earlier
Moves the lowering of VGRFs into FIXED_GRFs from the code generation
to (almost) right after the register allocation.

This will allow: (1) later passes not worry about VGRFs (and what they
mean in a post reg alloc phase) and (2) make easier to add certain
types of validation post reg alloc phase using the backend IR.

Note that a couple of passes still take advantage of seeing "allocated
VGRFs", so perform lowering after they run.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>
2024-04-23 23:17:57 +00:00
Caio Oliveira
5b3d4c757d intel/brw: Support FIXED_GRF when generating code for CLUSTER_BROADCAST
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>
2024-04-23 23:17:57 +00:00
Francisco Jerez
62aab1437e intel/fs/gfx20+: Handle subdword integer regioning restrictions in copy propagation.
This makes sure that copy propagation doesn't undo the lowering of
restricted sub-dword integer regions done by brw_fs_lower_regioning().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>
2024-04-22 18:02:32 -07:00
Francisco Jerez
217d412360 intel/fs/gfx20+: Implement sub-dword integer regioning restrictions.
This patch introduces code to enforce the pages-long regioning
restrictions introduced by Xe2 that apply to sub-dword integer
datatypes (See BSpec page 56640).  They impose a number of
restrictions on what the regioning parameters of a source can be
depending on the source and destination datatypes as well as the
alignment of the destination.  The tricky cases are when the
destination stride is smaller than 32 bits and the source stride is at
least 32 bits, since such cases require the destination and source
offsets to be in agreement based on an equation determined by the
source and destination strides.  The second source of instructions
with multiple sources is even more restricted, and due to the
existence of hardware bug HSDES#16012383669 it basically requires the
source data to be packed in the GRF if the destination stride isn't
dword-aligned.

In order to address those restrictions this patch leverages the
existing infrastructure from brw_fs_lower_regioning.cpp.  The same
general approach can be used to handle this restriction we were using
to handle restrictions of the floating-point pipeline in previous
generations: Unsupported source regions are lowered by emitting an
additional copy before the instruction that shuffles the data in a way
that allows using a valid region in the original instruction.  The
main difficulty that wasn't encountered in previous platforms is that
it is non-trivial to come up with a copy instruction that doesn't
break the regioning restrictions itself, since on previous platforms
we could just bitcast floating-point data and use integer copies in
order to implement arbitrary regioning, which is unfortunately no
longer a choice lacking a magic third pipeline able to do the
regioning modes the integer pipeline is no longer able to do.

The required_src_byte_stride() and required_src_byte_offset() helpers
introduced here try to calculate parameters for both regions that
avoid that situation, but it isn't always possible, and actually in
some cases that involve the second source of ALU instructions a chain
of multiple copy instructions will be required, so the
lower_instruction() routine needs to be applied recursively to the
instructions emitted to lower the original instruction.

XXX - Allow more flexible regioning for the second source of an
      instruction if bug HSDES#16012383669 is fixed in a future
      hardware platform.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28698>
2024-04-22 18:02:07 -07:00
Caio Oliveira
13093ceb3c intel/brw: Move validate out of fs_visitor
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28534>
2024-04-22 13:38:41 -07:00
Caio Oliveira
671d216f39 intel/brw: Remove two duplicated validate calls in optimizer
The OPT macro will call validate() after each pass, so both cases
removed by this patch are just redundant calls.  Will only affect
Debug builds since in Release builds validation is a no-op.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28534>
2024-04-22 13:38:41 -07:00
Caio Oliveira
8a6fe54409 intel/brw: Refactor FS validation macros
Use `a` and `b` (already identified as that in the output message)
instead of `f` and `s` for the two values being compared, since in
a later patch `s` will be used to hold the fs_visitor shader.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28534>
2024-04-22 13:38:41 -07:00
Ian Romanick
a5adbae6f6 nir: intel/brw: Remove cmat_signed_mask from dpas_intel intrinsic
It is not used. The signedness is inferred from src_type and dest_type.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28822>
2024-04-19 09:53:29 -07:00
Ian Romanick
2ce558d928 intel/brw: Fix handling of cmat_signed_mask
For integer types, the signedness is determined by flags on the muladd
instruction. The types of the sources play no role. Previously we were
using the signedness of the type and ignoring the mask.

Adjust the types passed to the dpas_intel intrinsic to match.

Fixes various
dEQP-VK.compute.*.cooperative_matrix.khr_*.matrixmuladd_cross.* tests on
different Intel platforms. Some platforms had failing tests, and some
platforms failed EU validation before the tests could fail.

Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28822>
2024-04-19 09:53:27 -07:00
Mike Blumenkrantz
042b8a65d3 brw/lower_a2c: fix for scalarized fs outputs
it's legal for a fs to write xyzw components separately,
and this pass should handle such cases

cc: mesa-stable

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28752>
2024-04-18 23:27:22 +00:00
Jordan Justen
4e5ed7ebd5 intel/brw: Avoid getting a stride of 0 for nir_intrinsic_exclusive_scan
Ref: 671745b616 ("intel/fs: Don't allow 0 stride on MOV destination")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28821>
2024-04-18 23:03:57 +00:00
Ian Romanick
90e12ed843 intel/brw/xe2+: Only apply Wa 22016140776 to math instructions
The check in has_invalid_src_region incorrectly omitted inst->is_math()
from the condition.

Fixes: 0e817ba548 ("intel/brw/xe2+: Implement Wa 22016140776")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28821>
2024-04-18 23:03:57 +00:00
Kenneth Graunke
e637c63239 intel/brw: Make an fs_builder::SYNC helper
We always want a null destination, so this saves some typing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
d5b8cec7a2 intel/brw: Replace FS_OPCODE_LINTERP with BRW_OPCODE_PLN
We no longer support the old LINE+MAC lowering, and we already lower
this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always
corresponds the PLN.  The only catch is that LINTERP's operands are
reversed from PLN, so we have to switch them.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
12b0e03bd2 intel/brw: Use SHADER_OPCODE_SEND for coherent framebuffer reads
We already have a logical opcode and lower to what is basically a send
instruction.  We just weren't using SHADER_OPCODE_SEND, instead having
extra redundant infrastructure for no real gain.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
46a7ee772e intel/brw: Drop default size of 1 from bld.vgrf() calls
This isn't necessary as 1 is the default value for the parameter.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
217d56e9b1 intel/brw: Delete fs_visitor::vgrf helper
Just use fs_builder::vgrf instead of the older glsl_type-based one.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
f29a56a4ac intel/brw: Delete if_depth_in_loop
This was only used prior to Sandybridge.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Kenneth Graunke
bd6a430c94 intel/brw: Drop gfx7 scratch message setup code
Nothing uses this.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
2024-04-16 02:14:49 +00:00
Mike Blumenkrantz
39b66f9c84 intel: set compact_arrays in compiler options
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28601>
2024-04-12 18:43:48 +00:00
Tapani Pälli
0413729bc3 intel/compiler: add assert for Wa_22017182272
According to the workaround description:
   "For all Data Port messages, DP_FLUSH_TYPE should not be
    programmed to Discard."

This issue happens only with certain circumstances but as we are not
using discard, add assert and deal with it later if discard is taken in
to use.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24422>
2024-04-10 06:03:58 +00:00
Lionel Landwerlin
6a7e576017 intel/fs: fixup instruction scheduling last grf write tracking
When I bumped the max size of VGRFs, I should have bumped the values
in the scheduler too.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d33aff783d ("intel/fs: add support for sparse accesses")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28188>
2024-04-05 19:46:40 +00:00
Lionel Landwerlin
d59612f5e5 intel/fs: printout a couple of more late compile steps
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28188>
2024-04-05 19:46:40 +00:00
Ian Romanick
0e817ba548 intel/brw/xe2+: Implement Wa 22016140776
HF sources to math instructions cannot be scalar. This is very similar
to an old Gfx6 restriction on POW, so let's fix it in a similar way.

As an extra bit of saftey, lower any occurances that might slip through
in brw_fs_lower_regioning.

The primary change is to prevent copy propagation from violating the
restriction. With that change, nothing should be able to generate these
invalid source strides. The modification to fs_visitor::validate should
detect potential problems sooner rather than later.

Previous attempts to implement this Wa when emitting the math
instruction (in brw_eu_emit.c gfx6_math) didn't work for several
reasons. The lowering happens after the SWSB pass, so the scoreboarding
was incorrect (thanks to Curro for finding that). In addition, the
lowering happens after register allocation, so it's impossible to
allocate a non-scalar register to expand the scalar value.

Fixes 113 tests in the dEQP-VK.spirv_assembly.* group on LNL.

v2: Add changes to brw_fs_lower_regioning. Suggested by Curro.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28480>
2024-04-04 21:04:09 -07:00
Ian Romanick
0b67d3d909 intel/elk: Delete stray nir_opt_dce
No shader-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:28 +00:00
Ian Romanick
24cdbbdaa2 intel/brw: Delete stray nir_opt_dce
No shader-db or fossil-db changes on any Intel platform.

Fixes: f76f4be301 ("intel/compiler: move gen5 final pass to actually be final pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Ian Romanick
44fb57b827 intel/elk: Don't call nir_opt_remove_phis before nir_convert_from_ssa
shader-db:

All platforms had similar results. (Ivy Bridge shown)
total instructions in shared programs: 15831424 -> 15831637 (<.01%)
instructions in affected programs: 38880 -> 39093 (0.55%)
helped: 0 / HURT: 179

total cycles in shared programs: 432140353 -> 432170199 (<.01%)
cycles in affected programs: 11798080 -> 11827926 (0.25%)
helped: 77 / HURT: 123

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Ian Romanick
6377e8fd29 intel/brw: Don't call nir_opt_remove_phis before nir_convert_from_ssa
Per discussion in #10727, removing phis breaks LCSSA form which in turn
invalidates divergence analysis.

shader-db:

All Skylake and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20299612 -> 20299695 (<.01%)
instructions in affected programs: 20829 -> 20912 (0.40%)
helped: 6 / HURT: 13

total cycles in shared programs: 842149085 -> 842148399 (<.01%)
cycles in affected programs: 15146222 -> 15145536 (<.01%)
helped: 40 / HURT: 45

fossil-db:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165505077 -> 165505603 (+0.00%); split: -0.00%, +0.00%
Cycles: 15144183575 -> 15144235695 (+0.00%); split: -0.00%, +0.00%
Spill count: 45213 -> 45220 (+0.02%)
Fill count: 74166 -> 74184 (+0.02%)

Totals from 94 (0.01% of 656116) affected shaders:
Instrs: 263079 -> 263605 (+0.20%); split: -0.00%, +0.20%
Cycles: 28411487 -> 28463607 (+0.18%); split: -0.18%, +0.37%
Spill count: 3474 -> 3481 (+0.20%)
Fill count: 6713 -> 6731 (+0.27%)

Fixes: 6dbb5f1e07 ("intel/fs: rerun divergence analysis prior to convert_from_ssa")
Closes: #10727
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Ian Romanick
87101e7d83 intel/compiler: Ensure load_barycentric_at_sample and load_interpolated_input remain together
This previously worked by luck because we were incorrectly calling
nir_opt_remove_phis before calling nir_convert_from_ssa. See also #10727.

No shader-db or fossil-db changes on any Intel platform.

v2: Handle the load_interpolated_input and load_barycentric_at_sample as
separate passes. Based on discussion with Ken starting at
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136#note_2330424.

Fixes: 74a40cc4b6 ("intel/fs: move lower of non-uniform at_sample barycentric to NIR")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00