Commit graph

3713 commits

Author SHA1 Message Date
Lionel Landwerlin
aa494cbacf brw: align spilling offsets to physical register sizes
In commit fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message
width for Xe2 regs.") we aligned the width of scratch messages to
physical register sizes (32B prior to Xe2, 64B for Xe2+).

But our spilling offsets are computed using the register allocations
sizes which are in units of 32B. That means on Xe2, you can end up
spilling a virtual register allocated at 32B (which we use for surface
state computations with exec_all) and then the spilling of that
register will be emitted in SIMD16, having the upper 8 lanes
overwriting the next spilled register.

We could potentially limit spills to SIMD8 messages on Xe2 (only
writing 32B of data), but we're also unlikely to have all 32B virtual
register spilled next to one another. And if not tightly packed, we
would have 64B registers stored on 2 different cachelines which sounds
inefficient.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.")
Backport-to: 24.2
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30983>
2024-09-04 23:05:31 +00:00
Caio Oliveira
74be809237 compiler: Allow derivative_group to be used for all stages in shader_info
These will now also be used by stages that have workgroups.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30950>
2024-09-03 20:03:18 +00:00
Caio Oliveira
3f6b5ea27a intel/brw: Use linear walk when shader requires DERIVATIVE_GROUP_LINEAR
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30955>
2024-08-30 20:24:42 +00:00
Caio Oliveira
e4f090d3a6 intel/brw: Remove special treatment for 2-src in emit() helper
For Gfx9+ no 2-src instructions need sources to fixed up.  Special
treatment remains for 3-src instructions.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30911>
2024-08-30 04:33:47 +00:00
Ian Romanick
73f365e208 intel/brw: load_offset cannot be constant on this path
Literally inside an if-statement (about 26 lines before this hunk)
that checks for !nir_src_is_const(instr->src[1]).

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
fef175de09 intel/brw: Enable constant propagation for a couple more logical sends
This prevents some regressions later in the MR. Once load_const
operations are marked as is_scalar, they will cesase to get the
automatic constant propagation that occurs in try_rebuild_source.

No shader-db or fossil-db changes on any Intel platform.

v2: Slightly relax source restrictions on
SHADER_OPCODE_UNALIGNED_OWORD_BLOCK_READ_LOGICAL. Add a comment
explaining the restriction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
c6a8b382fd intel/brw: Relax is_partial_write check in cmod propagation
The is_partial_write check is too strict because it tests two separate
things. It tests whether or not the instruction always writes a value
(i.e., is it predicated), and it tests whether or not the instruction
writes a complete register. This latter check is problematic as it
perevents cmod propagation in SIMD1, and it prevents cmod propagation in
SIMD8 when the destination size is 16 bits.

This check is unnecessary. Cmod propagation already checks that the
region written and region read overlap. It also already checks that the
execution sizes of the instructions match. Further restriction based on
the specific parts of the register written only generates false
negatives.

v2: Relax all of the calls to is_partial_write. Suggested by Caio.

No shader-db changes on any Intel platform.

fossil-db:

Meteor Lake
Totals:
Instrs: 151505520 -> 151502923 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17201385104 -> 17194901423 (-0.04%); split: -0.06%, +0.02%
Spill count: 80827 -> 80837 (+0.01%)
Fill count: 152693 -> 152692 (-0.00%); split: -0.01%, +0.01%

Totals from 346 (0.05% of 630198) affected shaders:
Instrs: 1257205 -> 1254608 (-0.21%); split: -0.21%, +0.00%
Cycle count: 5532845647 -> 5526361966 (-0.12%); split: -0.18%, +0.06%
Spill count: 32903 -> 32913 (+0.03%)
Fill count: 64338 -> 64337 (-0.00%); split: -0.03%, +0.03%

DG2
Totals:
Instrs: 151531440 -> 151528055 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17200238927 -> 17197996676 (-0.01%); split: -0.03%, +0.02%
Spill count: 81003 -> 80971 (-0.04%); split: -0.04%, +0.00%
Fill count: 152975 -> 152912 (-0.04%); split: -0.05%, +0.01%

Totals from 346 (0.05% of 630198) affected shaders:
Instrs: 1260363 -> 1256978 (-0.27%); split: -0.27%, +0.00%
Cycle count: 5532019670 -> 5529777419 (-0.04%); split: -0.09%, +0.05%
Spill count: 33046 -> 33014 (-0.10%); split: -0.11%, +0.01%
Fill count: 64581 -> 64518 (-0.10%); split: -0.13%, +0.03%

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149972324 -> 149972289 (-0.00%)
Cycle count: 15566495293 -> 15565151171 (-0.01%); split: -0.01%, +0.00%

Totals from 16 (0.00% of 629912) affected shaders:
Instrs: 351194 -> 351159 (-0.01%)
Cycle count: 3922227030 -> 3920882908 (-0.03%); split: -0.04%, +0.00%

Skylake
Totals:
Instrs: 140787999 -> 140787983 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14665614947 -> 14665515855 (-0.00%); split: -0.00%, +0.00%
Spill count: 58500 -> 58501 (+0.00%)
Fill count: 102097 -> 102100 (+0.00%)

Totals from 16 (0.00% of 625685) affected shaders:
Instrs: 343560 -> 343544 (-0.00%); split: -0.01%, +0.01%
Cycle count: 3354997898 -> 3354898806 (-0.00%); split: -0.01%, +0.01%
Spill count: 16864 -> 16865 (+0.01%)
Fill count: 27479 -> 27482 (+0.01%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
13332c236b intel/brw: Unconditionally run optimizations after nir_opt_uniform_subgroup
I observed some ray tracing shaders where a resource_intel inside a
loop was non-uniform, and some code was lowered to account for
that. Eventually the loop containing the resource_intel was unrolled,
and the resource_intel became uniform.

For example, nir_opt_uniform_subgroup can transform something like

    con loop {
        con block b5:        // preds: b4 b8
        con 32    %330 = @read_first_invocation (%329)
        con 1     %331 = ieq %330, %329
                         // succs: b6 b7
        if %331 {
            con block b6:        // preds: b5
            con 32    %332 = iadd %120.b, %330
            con 32    %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless|non-uniform, resource_block_intel=-1)
            div 32x4  %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler)
                             break
                             // succs: b9
        } else {
            con block b7:  // preds: b5, succs: b8
        }
        con block b8:  // preds: b7, succs: b5
    }

into

    con loop {
        con block b5:        // preds: b4 b8
        con 1     %331 = ieq %329, %329
                         // succs: b6 b7
        if %331 {
            con block b6:        // preds: b5
            con 32    %332 = iadd %120.b, %329
            con 32    %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless|non-uniform, resource_block_intel=-1)
            div 32x4  %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler)
                             break
                             // succs: b9
        } else {
            con block b7:  // preds: b5, succs: b8
        }
        con block b8:  // preds: b7, succs: b5
    }

Notice that %331 is now a tautology. Running brw_nir_optimize again
eliminates the loop.

v2: Add a comment in the code explaining the rationale. Suggested by
Ken. Update the commit message. Suggested by Caio.

shader-db:

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19733448 -> 19733330 (<.01%)
instructions in affected programs: 14120 -> 14002 (-0.84%)
helped: 32 / HURT: 3

total cycles in shared programs: 916254496 -> 916226288 (<.01%)
cycles in affected programs: 2035116 -> 2006908 (-1.39%)
helped: 19 / HURT: 13

total spills in shared programs: 5807 -> 5807 (0.00%)
spills in affected programs: 26 -> 26 (0.00%)
helped: 1 / HURT: 1

total fills in shared programs: 6794 -> 6792 (-0.03%)
fills in affected programs: 84 -> 82 (-2.38%)
helped: 1 / HURT: 1

LOST:   1
GAINED: 1

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20393084 -> 20392971 (<.01%)
instructions in affected programs: 21750 -> 21637 (-0.52%)
helped: 31 / HURT: 4

total cycles in shared programs: 880273065 -> 880247818 (<.01%)
cycles in affected programs: 2546748 -> 2521501 (-0.99%)
helped: 18 / HURT: 9

total spills in shared programs: 4628 -> 4630 (0.04%)
spills in affected programs: 287 -> 289 (0.70%)
helped: 1 / HURT: 2

total fills in shared programs: 5381 -> 5376 (-0.09%)
fills in affected programs: 711 -> 706 (-0.70%)
helped: 2 / HURT: 2

LOST:   1
GAINED: 1

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 151513669 -> 151505520 (-0.01%); split: -0.01%, +0.00%
Send messages: 7459339 -> 7459396 (+0.00%)
Loop count: 49111 -> 47588 (-3.10%)
Cycle count: 17208178205 -> 17201385104 (-0.04%); split: -0.05%, +0.01%
Spill count: 80830 -> 80827 (-0.00%); split: -0.02%, +0.01%
Fill count: 152754 -> 152693 (-0.04%); split: -0.04%, +0.00%
Scratch Memory Size: 4136960 -> 4130816 (-0.15%)
Max live registers: 32016493 -> 32015955 (-0.00%); split: -0.00%, +0.00%

Totals from 672 (0.11% of 630198) affected shaders:
Instrs: 1352428 -> 1344279 (-0.60%); split: -0.78%, +0.17%
Send messages: 54302 -> 54359 (+0.10%)
Loop count: 6124 -> 4601 (-24.87%)
Cycle count: 1260266379 -> 1253473278 (-0.54%); split: -0.69%, +0.16%
Spill count: 15967 -> 15964 (-0.02%); split: -0.09%, +0.08%
Fill count: 36245 -> 36184 (-0.17%); split: -0.18%, +0.01%
Scratch Memory Size: 740352 -> 734208 (-0.83%)
Max live registers: 50699 -> 50161 (-1.06%); split: -1.45%, +0.39%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149976046 -> 149971100 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 7685264 -> 7685256 (-0.00%)
Cycle count: 15566401168 -> 15566405478 (+0.00%); split: -0.00%, +0.00%
Spill count: 61238 -> 61240 (+0.00%)
Fill count: 107301 -> 107289 (-0.01%)
Max live registers: 31992969 -> 31993857 (+0.00%); split: -0.00%, +0.00%

Totals from 553 (0.09% of 629912) affected shaders:
Instrs: 557027 -> 552081 (-0.89%); split: -0.90%, +0.01%
Subgroup size: 8648 -> 8640 (-0.09%)
Cycle count: 150154496 -> 150158806 (+0.00%); split: -0.23%, +0.24%
Spill count: 181 -> 183 (+1.10%)
Fill count: 440 -> 428 (-2.73%)
Max live registers: 33698 -> 34586 (+2.64%); split: -0.02%, +2.65%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
65eb7ed5fc intel/brw: Run intel_nir_lower_conversions only after brw_nir_optimize
Without this, the next commit tiggers assertions.

v2: Unconditionally do the lowering after brw_nir_optimize. Suggested by
Caio.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Ian Romanick
572e00dd66 intel/brw: Copy prop from raw integer moves with mismatched types
The specific pattern from the unit test was observed in ray tracing
trampoline shaders.

v2: Refactor the is_raw_move tests out to a utility function. Suggested
by Ken.

v3: Fix a regression caused by being too picky about source
modifiers. This was introduced somewhere between when I did initial
shader-db runs an v2.

v4: Fix typo in comment. Noticed by Caio.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19734086 -> 19733997 (<.01%)
instructions in affected programs: 135388 -> 135299 (-0.07%)
helped: 76 / HURT: 2

total cycles in shared programs: 916290451 -> 916264968 (<.01%)
cycles in affected programs: 41046002 -> 41020519 (-0.06%)
helped: 32 / HURT: 29

fossil-db:

Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 151531355 -> 151513669 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17209372399 -> 17208178205 (-0.01%); split: -0.01%, +0.00%
Max live registers: 32016490 -> 32016493 (+0.00%)

Totals from 17361 (2.75% of 630198) affected shaders:
Instrs: 2642048 -> 2624362 (-0.67%); split: -0.67%, +0.00%
Cycle count: 79803066 -> 78608872 (-1.50%); split: -1.75%, +0.25%
Max live registers: 421668 -> 421671 (+0.00%)

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 149995644 -> 149977326 (-0.01%); split: -0.01%, +0.00%
Cycle count: 15567293770 -> 15566524840 (-0.00%); split: -0.02%, +0.01%
Spill count: 61241 -> 61238 (-0.00%)
Fill count: 107304 -> 107301 (-0.00%)
Max live registers: 31993109 -> 31993112 (+0.00%)

Totals from 17813 (2.83% of 629912) affected shaders:
Instrs: 3738236 -> 3719918 (-0.49%); split: -0.49%, +0.00%
Cycle count: 4251157049 -> 4250388119 (-0.02%); split: -0.06%, +0.04%
Spill count: 28268 -> 28265 (-0.01%)
Fill count: 50377 -> 50374 (-0.01%)
Max live registers: 470648 -> 470651 (+0.00%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
2024-08-30 03:39:31 +00:00
Jesse Natalie
03655dfda1 compiler, vk: Support subgroup size of 4
Relax the assert and assign it an enum value

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30876>
2024-08-29 03:30:31 +00:00
Kenneth Graunke
da395e6985 intel/brw: Fix extract_imm for subregion reads of 64-bit immediates
We could be trying to extract a D/UD from a Q/UQ, for example.  We were
ignoring the top 32-bits, which is incorrect.

Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30884>
2024-08-28 12:33:26 -07:00
Kenneth Graunke
51c85e0363 intel/brw: Drop misguided sign extension attempts in extract_imm()
This function never expands a type - it only narrows it.  As such, we
don't need to ever sign extend to fill additional new bits.  I think
this code was left over from earlier versions of my optimization pass
that was buggy and trying to handle cases it should not have.

Fixes: 580e1c592d ("intel/brw: Introduce a new SSA-based copy propagation pass")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30884>
2024-08-28 12:33:26 -07:00
Caio Oliveira
695f5314d6 intel/brw: Simplify fs_inst annotation
When INTEL_DEBUG=ann is also set, the disassembler would annotate the
output with either a string or the string verison of a NIR instruction.
This was done by keeping two pointers (but only using one at a time).

Change the code to print the instruction into a string instead of
keeping it pointer around (peg the string to the shader).  That way,
only one pointer is needed for annotations.  Because that serialization
is not free, only do that when the environment variable is set.

Since we are here, move the annotation string field to the end, moving
it to the least commonly used cacheline.  Further packing might allow
the entire fs_inst to fit in two cachelines.

For release builds, don't even add the debug annotation to the struct.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30822>
2024-08-28 03:59:50 +00:00
Caio Oliveira
ec15cdfa2a intel/brw: Pack brw_reg struct
The alignment required for the second union (has 64-bit size) causes
a hole between the first and second union.  Move the remaining data
there.

In 64-bit build, shrinks brw_reg from 24 bytes to 16 bytes.  And by
consequence, shirnks fs_inst from 200 bytes to 160 bytes, making it
use one less cacheline.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30822>
2024-08-28 03:59:50 +00:00
Lionel Landwerlin
e97b968aeb brw: add a comment what Gfx12.5 URB fences
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30849>
2024-08-27 13:38:14 +00:00
Lionel Landwerlin
93fba40389 brw: switch mesh/task URB fence prior to EOT to GPU
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30849>
2024-08-27 13:38:14 +00:00
Kenneth Graunke
437bda3013 intel/brw: Get rid of the lsc_msg_desc_wcmask helper
The LOAD/STORE opcodes take a vector size, while the LOAD/STORE_CMASK
opcodes take a channel mask.  The two are mutually exclusive.  So we
can just have the lsc_msg_desc() helper take one or the other in the
same parameter.  This more closely matches the actual descriptor.

We couldn't do this until the previous commit, since we were previously
relying on the lsc_msg_desc() function to calculate a cmask out of the
number of vector components.  But now we don't need it to do that.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30632>
2024-08-27 09:25:59 +00:00
Kenneth Graunke
55f193a105 intel/brw: Switch from LSC CMASK opcodes to regular LOAD/STORE
The LOAD/STORE opcodes take a vector size (number of components), while
the LOAD/STORE_CMASK opcodes take a channel mask.  For some reason, we
were passing a number of channels to lsc_msg_desc(), then using it to
construct a channel mask with all channels enabled, and always using the
CMASK message variants.

Considering we don't actually want to mask off any channels, we should
probably just use the regular LOAD/STORE opcodes, as they're more
flexible anyway.

One exception is that typed messages on Xe2 apparently only support
LOAD_CMASK/STORE_CMASK and not regular LOAD/STORE.  So we keep using
those there.  (Thanks to Sagar Ghuge for catching this!)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30632>
2024-08-27 09:25:58 +00:00
Sviatoslav Peleshko
09122e2be0 brw,elk: Fix opening flags on dumping shader binaries
Truncation is needed for overwriting correctly in cases when old file is
bigger than the one we want to dump (e.g. when the old one was edited
inplace). Also, creation permissions are way too broad.

Fixes: 4f41c44d ("intel/compiler: Add variable to dump binaries of all compiled shaders")
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30581>
2024-08-27 08:26:08 +00:00
Caio Oliveira
31dfb04fd3 intel/brw: Remove long register file names
The long names were originally meant to map to the HW encoding but
nowadays the actual encoding values depend on gfx version, whether
instruction is 3src, etc.

Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
6bdf2de4d2 intel/brw: Remove unused ARF values and helpers
These were used by old Gfx versions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
72b687abb4 intel/brw: Make BAD_FILE the zero value for brw_reg_file
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
e8f921678a intel/brw: Explicitly map brw_reg_file into hardware values
For now this is a no-op, but will be useful when changing the enum
to values that don't match the hardware.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
e7179232c9 intel/brw: Move encoding of Gfx11 3-src inside the inst helpers
Create specific helper for register file encoding and handle it there.
Use ad-hoc structs to let the macro take optional named arguments.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
d31c8bfb6f intel/brw: Remove more uses of variable length arrays
In these cases there's a clear bound we can use.  In C++ this is a
compiler extension and not compatible with zero initializing a
regular struct -- which will happen in a later change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
86c20e2910 intel/brw: Use a helper for common VEC pattern
In the helper, instead of using the Variable Length Array, use a
fixed size array to NIR_MAX_VEC_COMPONENTS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:14 +00:00
Caio Oliveira
abc535a3b4 intel/brw: Remove unused variable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
2024-08-25 22:08:13 +00:00
Kenneth Graunke
b97e10208c intel/brw: Add a file parameter to idom_tree::dump()
The other dump methods in this file also take a file parameter,
defaulting to stderr.  Dumping dot files to stdout is probably not
what anybody really wanted.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>
2024-08-22 22:54:45 +00:00
Kenneth Graunke
bb4f05005e intel/brw: Print blocks in brw_print_instructions_to_file()
Useful when examining the control flow graph.  For some reason,
we printed this for the final assembly but not the IR.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>
2024-08-22 22:54:45 +00:00
Kenneth Graunke
2d73e42333 intel/brw: Fix OOB reads when printing instructions post-reg-alloc
Post-register allocation, but before brw_fs_lower_vgrfs_to_fixed_grfs,
we have registers with the VGRF file but they are actually fixed GRFs.

brw_print_instructions_to_file() was seeing VGRFs and trying to access
their size, but using bogus register numbers that could be out-of-bound.

Detect when we're post-RA and avoid doing this.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>
2024-08-22 22:54:45 +00:00
Lionel Landwerlin
d9406658ed brw: remove unused prog_data field
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30713>
2024-08-22 19:44:40 +00:00
Kenneth Graunke
6a292c2699 intel: Fix bad align_offset on global_constant_uniform_block_intel
We were specifying align_offset = 64 and align_mul = 64, which is
invalid.  nir_combined_align() asserts that align_offset < align_mul.

Our intention here is to perform cacheline-aligned (64B-aligned) block
loads, so we should set align_mul = 64 and can leave align_offset = 0.

Fixes: fbafa9cabd ("intel/nir: remove load_global_const_block_intel intrinsic")
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30755>
2024-08-21 20:44:57 +00:00
Ian Romanick
c96ceb50d0 intel/brw/xe2: Allow int64 conversions
As far as I can tell from looking at the Bspec, MOV between integers
of all sizes appears to be supported.

shader-db:

total instructions in shared programs: 17480631 -> 17480535 (<.01%)
instructions in affected programs: 26284 -> 26188 (-0.37%)
helped: 21 / HURT: 13

total cycles in shared programs: 897601907 -> 897664293 (<.01%)
cycles in affected programs: 10929664 -> 10992050 (0.57%)
helped: 48 / HURT: 45

fossil-db:

Totals:
Instrs: 140686824 -> 140686155 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21525129188 -> 21524717729 (-0.00%); split: -0.01%, +0.00%
Spill count: 70778 -> 70776 (-0.00%)
Fill count: 139172 -> 139168 (-0.00%)
Max live registers: 47513859 -> 47513795 (-0.00%)

Totals from 612 (0.11% of 549272) affected shaders:
Instrs: 964441 -> 963772 (-0.07%); split: -0.09%, +0.02%
Cycle count: 1215564312 -> 1215152853 (-0.03%); split: -0.09%, +0.06%
Spill count: 16172 -> 16170 (-0.01%)
Fill count: 37962 -> 37958 (-0.01%)
Max live registers: 70749 -> 70685 (-0.09%)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30700>
2024-08-21 20:16:00 +00:00
Francisco Jerez
71ca8529c5 intel/brw/gfx12.5+: Fix IR of sub-dword atomic LSC operations.
We were currently emitting logical atomic instructions with a packed
destination region for sub-dword LSC atomics, along the lines of:

> untyped_atomic_logical(32) dst<1>:HF, ...

However, these instructions use an LSC data size D16U32, which means
that the 16b data on the return payload is expanded to 32b by the LSC
shared function, so we were lying to the compiler about the location
of the individual channels on the return payload, its execution
masking, etc.  This is why the hacks that manually set the
'inst->size_written' of the instruction were required.

In some cases this worked, but any non-trivial manipulation of the
instruction destination by lowering or optimization passes could have
led to corruption, as has been reproduced in deqp-vk during
lower_simd_width() for shaders that use 16-bit atomics in SIMD32
dispatch mode.

Note that LSC sub-dword reads aren't affected by this because they use
raw UD destinations and specify the actual bit size of the operation
datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't
work for atomic operations since that immediate specifies the atomic
opcode.

Instead, have the logical operation implement the behavior of 16-bit
destinations correctly instead of silently replacing the 16-bit region
with an inconsistent 32-bit region -- This is done by emitting the MOV
instructions used to pack the data from the UD temporary into the
packed destination from the lower_logical_sends() pass instead of from
the NIR translation pass.

Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>
2024-08-21 02:33:12 +00:00
Kenneth Graunke
d22d6d814d intel/brw: Fix Xe2+ SWSB encoding/decoding for DPAS instructions
SBID SET can only be used on SEND, SENDC, or DPAS instructions.  The
existing code was handling SET for SEND/SENDC, but was using the wrong
encoding for DPAS.  Add a new case to handle that and make it clear that
the existing code is only for SEND/SENDC.

While here, rewrite the encoder to use 2-bit binary immediates shifted
up into the mode [9:8] field, rather than pre-shifted hex values.  This
matches the documentation better and is a little easier to follow.

On the decode side, we were incorrectly decoding MATH instructions.
Because they're marked is_unordered, we were hitting the SEND/SENDC
decoding, which is incorrect for MATH.

Fixes 22 cooperative matrix tests on Lunar Lake.

Huge thanks to Paulo Zanoni for bisecting failures to one of my commits,
then analyzing shaders and experimenting to discover that the failure
was really an unrelated bug, just being provoked by different choices of
registers.  His work narrowing the problem down made it much easier to
discover and fix this bug.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>
2024-08-20 19:09:37 +00:00
Kenneth Graunke
89f9a6e10b intel/brw: Pass opcode to brw_swsb_encode/decode
We're going to need to handle encoding/decoding differently for DPAS vs.
SEND/SENDC vs. other instructions.  Pass the opcode so we can figure out
the encodings for each type of instruction.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>
2024-08-20 19:09:37 +00:00
Caio Oliveira
40f77b6936 intel/brw: Avoid modifying the shader in assign_curb_setup if not needed
If there are no uniforms to push, don't emit the AND or invalidate the
shader analysis.  This affects only compute shaders.

Not a significant impact since lots of shaders end up pushing
uniforms.  Fossil-db numbers (restricted to compute pipelines only) for DG2

```
Totals:
Instrs: 3071016 -> 3070894 (-0.00%)
Cycle count: 8320268863 -> 8320264519 (-0.00%)

Totals from 122 (2.70% of 4520) affected shaders:
Instrs: 10675 -> 10553 (-1.14%)
Cycle count: 2060003 -> 2055659 (-0.21%)
```

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30631>
2024-08-17 16:25:01 -07:00
Sagar Ghuge
c4f2a8d984 intel/compiler: Fix indirect offset in GS input read for Xe2+
Make sure to take new GRF size into consideration and adjust the
indirect offset according to new size so that when we do the indirect
load with address register, we load right values.

This helps pass the following tests:
   - dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.*geom*
   - dEQP-VK.ray_query.*geometry_shader.*

Backport-to: 24.2
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>
2024-08-16 18:40:13 +00:00
Ian Romanick
c8038643b8 intel/brw: Make ifind_msb SSA friendly
No shader-db changes on any Intel platform.

v2: Use negate(tmp) instead of creating a new temporary. Suggested by
Ken.

fossil-db:

Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 152535897 -> 152535883 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17112329592 -> 17112406110 (+0.00%); split: -0.06%, +0.06%

Totals from 40 (0.01% of 633223) affected shaders:
Instrs: 458813 -> 458799 (-0.00%); split: -0.01%, +0.00%
Cycle count: 4358016282 -> 4358092800 (+0.00%); split: -0.23%, +0.24%

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150560511 -> 150560465 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15484534441 -> 15482372893 (-0.01%); split: -0.12%, +0.11%
Spill count: 59795 -> 59794 (-0.00%)
Fill count: 103513 -> 103509 (-0.00%)

Totals from 40 (0.01% of 632445) affected shaders:
Instrs: 368877 -> 368831 (-0.01%); split: -0.01%, +0.00%
Cycle count: 3918398264 -> 3916236716 (-0.06%); split: -0.49%, +0.43%
Spill count: 16896 -> 16895 (-0.01%)
Fill count: 27819 -> 27815 (-0.01%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>
2024-08-16 14:52:04 +00:00
Ian Romanick
e9c151fde6 intel/brw: Make 16-bit ishl, ishr, and ushr SSA friendly
No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 152536266 -> 152535897 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17124901233 -> 17112329592 (-0.07%); split: -0.07%, +0.00%
Spill count: 78571 -> 78525 (-0.06%)
Fill count: 148178 -> 148132 (-0.03%)

Totals from 210 (0.03% of 633223) affected shaders:
Instrs: 514525 -> 514156 (-0.07%); split: -0.16%, +0.08%
Cycle count: 4003540698 -> 3990969057 (-0.31%); split: -0.32%, +0.00%
Spill count: 15632 -> 15586 (-0.29%)
Fill count: 26241 -> 26195 (-0.18%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>
2024-08-16 14:52:04 +00:00
Lionel Landwerlin
fbafa9cabd intel/nir: remove load_global_const_block_intel intrinsic
load_global_constant_uniform_block_intel is equivalent in terms of
loading, then for the predicate we just do a bcsel afterward in places
where that is required.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>
2024-08-16 11:12:39 +00:00
Caio Oliveira
6267585778 intel/brw: Also return the size of the assembled shader
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30062>
2024-08-14 03:03:46 +00:00
Sagar Ghuge
83c2524124 intel/compiler: Adjust trace ray control field on Xe2
Bspec 64643: Structure_TraceRayPayload::Trace Ray Control

Bit field moved from 9-8 to 10-8 on Xe2.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>
2024-08-13 20:02:24 +00:00
Sagar Ghuge
c3c62e493f intel/compiler: Ray query requires write-back register
Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable

   "When this bit is set in the header, Trace Ray Message behaves like a
   Ray Query. This message requires a write-back message indicating
   RayQuery for all valid Rays (SIMD lanes) have completed."

If we don't pass the write-back register, somehow it was stepping on
over R0 register and can mess up the scratch space accesses which could
potentially lead to GPU hang. It can be noticed while running it under
simulator trace.

send.rta (16|M0)         null     r124  r126:1  0x0            0x02000100           {$15} // wr:1+1, rd:0; simd16 trace ray
R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>
2024-08-13 20:02:24 +00:00
Alyssa Rosenzweig
5f437aa24d elk: fix compute shader derivatives
derivatives are not fs only so move to be with the rest of subgroup ops.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11674
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30634>
2024-08-13 12:19:30 +00:00
Lionel Landwerlin
aaff191356 brw/rt: fix ray_object_(direction|origin) for closest-hit shaders
When closest hit shader is called, the BVH object level
brw_nir_rt_load_mem_ray origin/direction is 0. What we should be using
is the ray origin/direction and apply the transform of the current
instance.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9ba7d459a3 ("intel/rt: Implement the new ray-tracing system values")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30578>
2024-08-13 10:28:50 +00:00
Ian Romanick
119801e647 intel/brw: Move fsat instructions closer to the source
Intel GPUs have a saturate destination modifier, and
brw_fs_opt_saturate_propagation tries to replace explicit saturate
operations with this destination modifier. That pass is limited in
several ways. If the source of the explicit saturate is in a different
block or if the source of the explicit saturate is live after the
explicit saturate, brw_fs_opt_saturate_propagation will be unable to
make progress.

This optimization exists to help brw_fs_opt_saturate_propagation make
more progress. It tries to move NIR fsat instructions to the same block
that contains the definition of its source. It does this only in cases
where it will not create additional live values. It also attempts to do
this only in cases where the explicit saturate will ultimiately be
converted to a destination modifier.

v2: Fix metadata_preserve when theres no progress and use
nir_metadata_control_flow when there is progress. All suggested by
Alyssa.

v3: Fix a typo in the file header comment. Noticed by Ken.  Don't
require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead
of open-coding something slightly more specific. Both suggested by Ken.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19733645 -> 19733028 (<.01%)
instructions in affected programs: 193300 -> 192683 (-0.32%)
helped: 246
HURT: 1
helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2
helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31%
95% mean confidence interval for instructions value: -2.87 -2.13
95% mean confidence interval for instructions %-change: -0.34% -0.32%
Instructions are helped.

total cycles in shared programs: 916180971 -> 916264656 (<.01%)
cycles in affected programs: 30197180 -> 30280865 (0.28%)
helped: 194
HURT: 142
helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19
helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23%
HURT stats (abs)   min: 1 max: 28058 x̄: 1781.68 x̃: 399
HURT stats (rel)   min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63%
95% mean confidence interval for cycles value: -196.84 694.97
95% mean confidence interval for cycles %-change: -0.17% 1.27%
Inconclusive result (value mean confidence interval includes 0).

fossil-db:

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02%
Max live registers: 32013312 -> 32013549 (+0.00%)
Max dispatch width: 5512304 -> 5512136 (-0.00%)

Totals from 774 (0.12% of 630172) affected shaders:
Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01%
Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30%
Max live registers: 82195 -> 82432 (+0.29%)
Max dispatch width: 6664 -> 6496 (-2.52%)

Ice Lake
Totals:
Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01%
Max live registers: 32471367 -> 32471603 (+0.00%)
Max dispatch width: 5623752 -> 5623712 (-0.00%)

Totals from 733 (0.12% of 635598) affected shaders:
Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01%
Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64%
Max live registers: 72067 -> 72303 (+0.33%)
Max dispatch width: 6216 -> 6176 (-0.64%)

Skylake
Totals:
Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01%
Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00%

Totals from 659 (0.11% of 625670) affected shaders:
Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00%
Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41%
Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:10 -07:00
Ian Romanick
f5815a003e intel/brw: Use def analysis for simple cases of saturate propagation
I had hoped this would improve compilation performance too. I tried
several different long running fossils, and there was no difference.

Fossil-db results are all over the place from platform to platform.

All of the Tiger Lake shaders hurt for spills and fills are fragment
shaders in rdr2.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19734088 -> 19733645 (<.01%)
instructions in affected programs: 71200 -> 70757 (-0.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 1
helped stats (rel) min: 0.06% max: 2.79% x̄: 0.83% x̃: 0.48%
95% mean confidence interval for instructions value: -2.69 -2.07
95% mean confidence interval for instructions %-change: -0.93% -0.72%
Instructions are helped.

total cycles in shared programs: 916290473 -> 916180971 (-0.01%)
cycles in affected programs: 3403719 -> 3294217 (-3.22%)
helped: 89
HURT: 88
helped stats (abs) min: 1 max: 36685 x̄: 1424.13 x̃: 10
helped stats (rel) min: <.01% max: 26.75% x̄: 1.66% x̃: 0.46%
HURT stats (abs)   min: 1 max: 8750 x̄: 195.98 x̃: 7
HURT stats (rel)   min: <.01% max: 17.12% x̄: 1.57% x̃: 0.19%
95% mean confidence interval for cycles value: -1199.88 -37.43
95% mean confidence interval for cycles %-change: -0.66% 0.56%
Inconclusive result (%-change mean confidence interval includes 0).

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 151458346 -> 151457413 (-0.00%)
Cycle count: 17202426472 -> 17202406469 (-0.00%); split: -0.00%, +0.00%
Max live registers: 31989626 -> 31989959 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5500560 -> 5500384 (-0.00%)

Totals from 479 (0.08% of 628970) affected shaders:
Instrs: 398836 -> 397903 (-0.23%)
Cycle count: 18064565 -> 18044562 (-0.11%); split: -0.40%, +0.29%
Max live registers: 36663 -> 36996 (+0.91%); split: -0.02%, +0.92%
Max dispatch width: 4392 -> 4216 (-4.01%)

Tiger Lake
Totals:
Instrs: 149913036 -> 149912182 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15560086488 -> 15560135139 (+0.00%); split: -0.00%, +0.00%
Spill count: 61241 -> 61251 (+0.02%)
Fill count: 107304 -> 107314 (+0.01%)
Max live registers: 31964752 -> 31965119 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5517568 -> 5517248 (-0.01%)

Totals from 486 (0.08% of 628673) affected shaders:
Instrs: 396065 -> 395211 (-0.22%); split: -0.23%, +0.01%
Cycle count: 17677691 -> 17726342 (+0.28%); split: -0.23%, +0.51%
Spill count: 1302 -> 1312 (+0.77%)
Fill count: 3746 -> 3756 (+0.27%)
Max live registers: 37538 -> 37905 (+0.98%); split: -0.02%, +0.99%
Max dispatch width: 4576 -> 4256 (-6.99%)

Ice Lake
Totals:
Instrs: 151348422 -> 151347463 (-0.00%)
Cycle count: 15155678386 -> 15155691726 (+0.00%); split: -0.00%, +0.00%
Fill count: 108114 -> 108111 (-0.00%)
Max live registers: 32444479 -> 32444814 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5611288 -> 5611256 (-0.00%)

Totals from 483 (0.08% of 634352) affected shaders:
Instrs: 393333 -> 392374 (-0.24%)
Cycle count: 16706439 -> 16719779 (+0.08%); split: -0.14%, +0.22%
Fill count: 3654 -> 3651 (-0.08%)
Max live registers: 37246 -> 37581 (+0.90%); split: -0.02%, +0.92%
Max dispatch width: 4312 -> 4280 (-0.74%)

Skylake
Totals:
Instrs: 140741190 -> 140734481 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14659096516 -> 14659116346 (+0.00%); split: -0.00%, +0.00%
Max live registers: 31757558 -> 31757725 (+0.00%)
Max dispatch width: 5470040 -> 5469920 (-0.00%)

Totals from 3542 (0.57% of 624449) affected shaders:
Instrs: 3081309 -> 3074600 (-0.22%); split: -0.22%, +0.00%
Cycle count: 228843073 -> 228862903 (+0.01%); split: -0.11%, +0.12%
Max live registers: 304531 -> 304698 (+0.05%)
Max dispatch width: 31016 -> 30896 (-0.39%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:05 -07:00
Ian Romanick
adcce2bba4 intel/brw: Small code refactor in brw_fs_opt_saturate_propagation
This bit of code will have a second use in the next commit.

v2: Fix some broken indentation. Noticed by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:03 -07:00