Commit graph

540 commits

Author SHA1 Message Date
Caio Oliveira
6ffa597bcf intel/compiler: Keep track of compiled/spilled in brw_simd_selection_state
We still update the cs_prog_data, but don't rely on it for this state anymore.
This will allow use the SIMD selector with shaders that don't use cs_prog_data.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
a0580dadfd intel/compiler: Create a struct to hold SIMD selection state
This is a preparation to decouple the storage of what SIMDs
compiled/spilled from the cs_prog_data.  This will allow reuse
of SIMD selection code by Bindless Shaders.

And since we have a struct now, move the error array there so
reduce the boilerplate of the users.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
8cda6cd774 intel/compiler: Simplify usage of brw_simd_select_for_workgroup_size()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
ecc2dfc503 intel/compiler: Use std::unique_ptr for tracking the fs_visitors
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19605>
2022-11-10 18:01:52 +00:00
Ian Romanick
9abeb3d739 intel/fs: Optimize integer multiplication of large constants by factoring
Many Intel platforms can only perform 32x16 bit multiplication.  The
straightforward way to implement 32x32 bit multiplications is by
splitting one of the operands into high and low parts called H and L,
repsectively.  The full multiplication can be implemented as:

         ((A * H) << 16) + (A * L)

On Intel platforms, special register accesses can be used to eliminate
the shift operation.  This results in three instructions and a temporary
register for most values.

If H or L is 1, then one (or both) of the multiplications will later be
eliminated.  On some platforms it may be possible to eliminate the
multiplication when H is 256.

If L is zero (note that H cannot be zero), one of the multiplications
will also be eliminated.

Instead of splitting the operand into high and low parts, it may
possible to factor the operand into two 16-bit factors X and Y.  The
original multiplication can be replaced with (A * (X * Y)) = ((A * X) *
Y).  This requires two instructions without a temporary register.

I may have gone a bit overboard with optimizing the factorization
routine.  It was a fun brainteaser, and I couldn't put it down. :) On my
1.3GHz Ice Lake, a standalone test could chug through 1,000,000 randomly
selected values in about 5.7 seconds.  This is about 9x the performance
of the obvious, straightforward implementation that I started with.

v2: Drop an unnecessary return.  Rearrange logic slightly and rename
variables in factor_uint32 to better match the names used in the large
comment.  Both suggested by Caio. Rearrange logic to avoid possibly
using `a` uninitialized. Noticed by Marcin.

v3: Use DIV_ROUND_UP instead of open coding it. Noticed by Caio.

Tiger Lake, Ice Lake, Haswell, and Ivy Bridge had similar results. (Ice Lake shown)
total instructions in shared programs: 19912558 -> 19912526 (<.01%)
instructions in affected programs: 3432 -> 3400 (-0.93%)
helped: 10 / HURT: 0

total cycles in shared programs: 856413218 -> 856412810 (<.01%)
cycles in affected programs: 122032 -> 121624 (-0.33%)
helped: 9 / HURT: 0

No shader-db changes on any other Intel platforms.

Tiger Lake and Ice Lake had similar results. (Ice Lake shown)
Instructions in all programs: 141997227 -> 141996923 (-0.0%)
Instructions helped: 71

Cycles in all programs: 9162524757 -> 9162523886 (-0.0%)
Cycles helped: 63
Cycles hurt: 5

No fossil-db changes on any other Intel platforms.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
90d267b2d1 intel/fs: Fix bounds checking for integer multiplication lowering
The previous bounds checking would cause

    mul(8)          g121<1>D        g120<8,8,1>D    0xec4dD

to be lowered to

    mul(8)          g121<1>D        g120<8,8,1>D    0xec4dUW
    mul(8)          g41<1>D         g120<8,8,1>D    0x0000UW
    add(8)          g121.1<2>UW     g121.1<16,8,2>UW g41<16,8,2>UW

Instead of picking the bounds (and the new type) based on the old type,
pick the new type based on the value only.

This helps a few fossil-db shaders in Witcher 3 and Geekbench5.  No
changes on any other Intel platforms.

Tiger Lake
Instructions in all programs: 157581069 -> 157580768 (-0.0%)
Instructions helped: 24

Cycles in all programs: 7566979620 -> 7566977172 (-0.0%)
Cycles helped: 22
Cycles hurt: 4

Ice Lake
Instructions in all programs: 141998965 -> 141998667 (-0.0%)
Instructions helped: 26

Cycles in all programs: 9162568666 -> 9162565297 (-0.0%)
Cycles helped: 24
Cycles hurt: 2

Skylake
No changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Lionel Landwerlin
e59c4a912b intel/fs: use fs implementation of dump_instructions
This specialized version prints out the liveness count as well as the
maximum liveness count. It was eye opening when seeing the max
liveness jump after lowering of packing instructions which should not
have changed the count.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
e5dfff0946 intel/fs: reduce liveness of variables in lowering passes
When lowering a single instruction with a destination VGRF to 2 or
more, the VGRF is now considered partially written by each generated
instruction and that increases its liveness especially in loops. Thus
potentially increasing the number of spills/fills due to register
allocation.

Putting an UNDEF instruction in front of the lowered instructions
allows the IR to limit the liveness of the VGRF, reducing register
pressure.

This has a pretty dramatic effect on spills/fills for RT shaders. Here
the stats on Q2RTX shaders on DG2 (wipping out any spills/fills due to
register allocation) :

Instructions in all programs: 26150 -> 24955 (-4.6%)
SENDs in all programs: 1148 -> 1148 (+0.0%)
Loops in all programs: 4 -> 4 (+0.0%)
Cycles in all programs: 392179 -> 332787 (-15.1%)
Spills in all programs: 132 -> 116 (-12.1%)
Fills in all programs: 262 -> 154 (-41.2%)

Shader-db results on TGL :

total instructions in shared programs: 21158140 -> 21158377 (<.01%)
instructions in affected programs: 76629 -> 76866 (0.31%)
helped: 18
HURT: 20
helped stats (abs) min: 1 max: 60 x̄: 18.89 x̃: 12
helped stats (rel) min: 0.21% max: 3.61% x̄: 1.02% x̃: 0.77%
HURT stats (abs)   min: 1 max: 79 x̄: 28.85 x̃: 18
HURT stats (rel)   min: 0.04% max: 2.81% x̄: 1.13% x̃: 0.79%
95% mean confidence interval for instructions value: -4.82 17.30
95% mean confidence interval for instructions %-change: -0.34% 0.57%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 5753 -> 5753 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 798856834 -> 798870688 (<.01%)
cycles in affected programs: 6208395 -> 6222249 (0.22%)
helped: 22
HURT: 17
helped stats (abs) min: 2 max: 8794 x̄: 1438.18 x̃: 782
helped stats (rel) min: 0.05% max: 2.28% x̄: 0.63% x̃: 0.44%
HURT stats (abs)   min: 2 max: 19178 x̄: 2676.12 x̃: 1358
HURT stats (rel)   min: 0.04% max: 23.49% x̄: 2.25% x̃: 0.71%
95% mean confidence interval for cycles value: -952.19 1662.65
95% mean confidence interval for cycles %-change: -0.64% 1.90%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 4078 -> 4066 (-0.29%)
spills in affected programs: 40 -> 28 (-30.00%)
helped: 2
HURT: 0

total fills in shared programs: 2856 -> 2832 (-0.84%)
fills in affected programs: 127 -> 103 (-18.90%)
helped: 2
HURT: 0

total sends in shared programs: 998554 -> 998554 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Total CPU time (seconds): 2346.06 -> 2304.80 (-1.76%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
dd6d40429b intel/fs: make split_virtual_grfs deal with partial undefs
v2: fix up UNDEFs instructions (Curro)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Tapani Pälli
c9c9a5b78d intel/fs: mark debug variables with ASSERTED
To clean up compilation warnings about unused variables
when asserts are disabled.

v2: UNUSED -> ASSERTED (Eric Engestrom)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19016>
2022-10-11 04:14:30 +00:00
Lionel Landwerlin
23c7142cd6 anv: disable SIMD16 for RT shaders
Since divergence is a lot more likely in RT than compute, it makes
sense to limit ourselves to SIMD8.

The trampoline shader defaults to SIMD16 since this one is uniform.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:37 +00:00
Lionel Landwerlin
9dba8d8aa1 intel/fs: take a builder arg for resolve_source_modifiers()
There will be situations where we will want to use a local builder
rather than the one associated with NIR->backend translation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Kenneth Graunke
be21d54aca intel/compiler: Use an existing URB write to end TCS threads when viable
VS, TCS, TES, and GS threads must end with a URB write message with the
EOT (end of thread) bit set.  For VS and TES, we shadow output variables
with temporaries and perform all stores at the end of the shader, giving
us an existing message to do the EOT.

In tessellation control shaders, we don't defer output stores until the
end of the thread like we do for vertex or evaluation shaders.  We just
process store_output and store_per_vertex_output intrinsics where they
occur, which may be in control flow.  So we can't guarantee that there's
a URB write being at the end of the shader.

Traditionally, we've just emitted a separate URB write to finish TCS
threads, doing a writemasked write to an single patch header DWord.
On Broadwell, we need to set a "TR DS Cache Disable" bit, so this is
a convenient spot to do so.  But on other platforms, there's no such
field, and this write is purely wasteful.

Insetad of emitting a separate write, we can just look for an existing
URB write at the end of the program and tag that with EOT, if possible.
We already had code to do this for geometry shaders, so just lift it
into a helper function and reuse it.

No changes in shader-db.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
2022-09-27 18:17:42 -07:00
Lionel Landwerlin
139e8f4635 intel/fs: fixup a64 messages
And run algebraic when either int64 for float64 are not supported so
those don't end up in the generated code.

Cc: mesa-stable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Caio Oliveira
0b6e613de8 intel/compiler: Create and use struct for CS thread payload
Move subgroup_id, that's only used by CS for verx10 < 125, as part of
the payload too -- even though is not, strictly speaking.

Note the thread execution of Task/Mesh is similar enough, so we make
their common struct inherit from cs_thread_payload.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
d8461e975a intel/compiler: Export brw_get_subgroup_id_param_index()
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
9de790760e intel/compiler: Create and use struct for Bindless thread payload
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
5b6987daee intel/compiler: Create and use struct for GS thread payload
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
7664c85b1d intel/compiler: Create and use struct for TASK and MESH thread payloads
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
0ca65b3c4c intel/compiler: Create and use struct for VS thread payload
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
19c6e1b447 intel/compiler: Create and use struct for TES thread payload
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
9a9b1119b4 intel/compiler: Store Patch URB output in TCS thread payload struct
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
e21359ed0e intel/compiler: Create struct for TCS thread payload
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
73920b7e2f intel/compiler: Use FS thread payload only for FS
Move the setup into the FS thread payload constructor.  Consolidate
payload setup for that in brw_fs_thread_payload.cpp file.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Marcin Ślusarz
2e1b96bb1b intel/compiler: implement EXT_mesh_shader
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18371>
2022-09-02 17:40:47 +00:00
Kenneth Graunke
d689ef7482 intel/compiler: Change dg2_plus check to devinfo->verx10 >= 125
Less special casing and possibly more future-proof.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17990>
2022-08-29 10:28:32 +00:00
Lionel Landwerlin
f242c9af76 intel/fs: bump max SIMD size for A64 atomics with LSC
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
2022-08-24 17:51:40 +00:00
Lionel Landwerlin
a81ca32f96 intel/fs: remove unused opcode
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
2022-08-24 17:51:40 +00:00
Lionel Landwerlin
aa65f83203 intel/fs: switch compute push constant loads to LSC
We're now able to load up to 8 GRFs in one send.

v2: Switch to use transpose + vector of up to 64 (Thanks Curro!)

v3: Increase parallelism by not reusing the same register for push
    constant offset (Curro)

v4: Drop dead ADD() instruction (Curro)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
2022-08-24 17:51:40 +00:00
Caio Oliveira
a1b1fdf70d intel/compiler: Rename 8_PATCH to MULTI_PATCH
Make it clearer we are dealing with multiple patches,
works better in constrast with SINGLE_PATCH.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18151>
2022-08-24 00:39:57 +00:00
Kenneth Graunke
2cea0d6ef6 intel/compiler: Drop variable group size lowering
This backend lowering code has been dead since the removal of i965 -
nothing in the current source tree ever sets the flag.

This is handled by iris_setup_uniforms() and crocus_setup_uniforms().
Variable group size does not appear to be a feature in anv.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18055>
2022-08-18 16:17:03 +00:00
Lionel Landwerlin
734384e8bc intel/fs: fixup simd selection with shader calls
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17908>
2022-08-05 11:51:31 +00:00
Lionel Landwerlin
9cb9390962 intel/fs: store num of resume shaders in prog_data
That way we can look at the SBT entries for debug purposes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17908>
2022-08-05 11:51:31 +00:00
Marcin Ślusarz
883acc4150 intel/compiler: use NIR_PASS more
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17619>
2022-08-02 10:07:05 +00:00
Marcin Ślusarz
7ebae85955 intel/compiler: insert URB fence before task/mesh termination
Bspec 53421 says:
"A URB fence memory is typically performed prior the thread
exit message, so that the next thread dispatch that reads
that URB memory will see it."

Cc: 22.1 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16665>
2022-08-02 09:31:24 +00:00
Kenneth Graunke
49ee3ae9e8 intel/compiler: Lower FIND_[LAST_]LIVE_CHANNEL in IR on Gfx8+
This allows the software scoreboarding pass, scheduler, and so on
to handle the individual instructions and handle them, rather than
trusting in the generator to do scoreboarding correctly when expanding
the virtual instruction to multiple actual instructions.

By using SHADER_OPCODE_READ_SR_REG, we also correctly handle the
software scoreboarding workaround when reading DMask/VMask.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17530>
2022-08-02 08:41:43 +00:00
Ian Romanick
377246318a intel/fs: Eliminate "masked" and "per slot offset" URB messages
All of this information can be inferred from the sources.

v2: Fix "error: unused variable 'opcode'" detected by marge-bot.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick
1b17f8fc5a intel/fs: Make logical URB read instructions more like other logical instructions
No shader-db changes on any Intel platform

Fossil-db results:

Tiger Lake
Instructions in all programs: 156926440 -> 156926470 (+0.0%)
Instructions hurt: 15

Cycles in all programs: 7513099349 -> 7513099402 (+0.0%)
Cycles hurt: 15

Ice Lake and Skylake had similar results. (Ice Lake shown)
Cycles in all programs: 9099036492 -> 9099036489 (-0.0%)
Cycles helped: 1

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick
349a040f68 intel/fs: Make logical URB write instructions more like other logical instructions
The changes to fs_visitor::validate() helped track down a place where I
initially forgot to convert a message to the new sources layout.  This
had caused a different validation failure in
dEQP-GLES31.functional.tessellation.tesscoord.triangles_equal_spacing,
but this were not detected until after SENDs were lowered.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951145 -> 19951133 (<.01%)
instructions in affected programs: 2429 -> 2417 (-0.49%)
helped: 8 / HURT: 0

total cycles in shared programs: 858904152 -> 858862331 (<.01%)
cycles in affected programs: 5702652 -> 5660831 (-0.73%)
helped: 2138 / HURT: 1255

Broadwell
total cycles in shared programs: 904869459 -> 904835501 (<.01%)
cycles in affected programs: 7686744 -> 7652786 (-0.44%)
helped: 2861 / HURT: 2050

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442032 (-0.0%)
Instructions helped: 337

Cycles in all programs: 9099270231 -> 9099036492 (-0.0%)
Cycles helped: 40661
Cycles hurt: 28606

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:18 +00:00
Emma Anholt
94bd06256a intel/fs: Simplify brw_barycentric_mode() args.
Reduce a bit of mode lookup noise I was tracing through trying to resolve
the previous bug.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>
2022-07-19 01:25:47 +00:00
Lionel Landwerlin
2d1f021e16 intel/fs: Set NonPerspectiveBarycentricEnable when the interpolator needs it.
[anholt: changed to make all drivers do the right thing by moving the
payload barycentric check into the compiler]

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>
2022-07-19 01:25:47 +00:00
Jason Ekstrand
3346f6918f intel/fs,anv: Rework handling of coarse and sample shading
Now that this information is accurately gathered by spirv_to_nir, we no
longer need the hack.  We just need to fix up the way we handle some of
the key bits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand
d0b154319d intel/fs: Simplify persample_dispatch
Thanks to the previous commit, we no longer need this check.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand
fd17aaf430 intel/fs: Use nir_lower_single_sampled
This lets us drop demote_sample_qualifiers as well as a back-end check
for key->multisample_fbo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand
ca9f0f72db intel/fs: Use shader_info::fs::uses_sample_shading
NIR constructs this information for us as part of nir_gather_info these
days so we can simplify our logic a bit.  This will also let us be more
correct once we move uses_sample_shading scraping earlier.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand
530de844ef intel,anv,iris,crocus: Drop subgroup size from the shader key
Use nir->info.subgroup_size instead.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17337>
2022-07-08 22:47:22 +00:00
Ian Romanick
bbcb881f46 intel/fs: Remove non-_LOGICAL URB messages
The _LOGICAL versions are lowered direct to SEND, so nothing can ever
generate these messages.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick
a477587b4a intel/fs: Add _LOGICAL versions of URB messages
The lowering is currently fake.  It just changes the opcode from the
_LOGICAL version to the non-_LOGICAL version.

v2: Remove some rebase cruft.  's/gfx8_//;s/simd8_/' in
brw_instruction_name.  Both suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick
07b9bfacc7 intel/compiler: Move logical-send lowering to a separate file
brw_fs.cpp was 10kloc.  Now it's only 7.5kloc.  Ugh.

v2: Rebase on 9680e0e4a2.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Lionel Landwerlin
9680e0e4a2 intel/fs: ray query fix for global address
With stages dispatching with a mask, we can run into situations where
we don't have the global address in all lanes. The existing code
always assumed we had the addres in at least lane0.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bb40e999d1 ("intel/nir: use a single intel intrinsic to deal with ray traversal")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17330>
2022-07-08 00:36:04 +00:00