Commit graph

433 commits

Author SHA1 Message Date
Kenneth Graunke
b84939c678 intel/compiler: Delete fs_visitor::nir_emit_{ssbo,shared}_atomic_float()
These are now basically identical to their non-float counterparts.  The
only thing that differed was the opcode checking to determine which
operands existed.  Now that we have a unified opcode enum and a helper
for the number of data operands, we can just use that.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
f7b29d7924 intel/compiler: Drop redundant 32-bit expansion for shared float atomics
We already expanded data to 32-bit a few lines earlier, so this is just
redundantly doing it a second time.

Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
02129eee3a intel/compiler: Eliminate SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT
The only reason for the separate opcode was because of the overlapping
BRW_AOP_* enums, making it impossible to tell whether a particular AOP
was the integer or float operation.  Now that we use the lsc_opcode
enums, we can just have the legacy lowering inspect the opcode and
select the right descriptor.  No need for a separate opcode.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
90a2137cd5 intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs
This gets our logical atomic messages using the lsc_opcode enum rather
than the legacy BRW_AOP_* defines.  We have to translate one way or
another, and using the modern set makes sense going forward.

One advantage is that the lsc_opcode encoding has opcodes for both
integer and floating point atomics in the same enum, whereas the legacy
encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX),
which made it impossible to handle both sensibly in common code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
8d2dc52a14 intel/compiler: Move atomic op translation into emit_*_atomic()
There's no need to pass both the intrinsic and an opcode computed from
that same intrinsic.  Just do it in the functions themselves.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Francisco Jerez
4a2e7306dd intel/fs/gfx12: Ensure that prior reads have executed before barrier with acquire semantics.
This avoids a violation of the Vulkan memory model that was leading to
intermittent failures of at least 8k test-cases of the Vulkan CTS
(within the group dEQP-VK.memory_model.*) on TGL and DG2 platforms.
In theory the issue may be reproducible on earlier platforms like IVB
and ICL, but the SYNC.ALLWR instruction is not available on those
platforms so a different (likely costlier) fix will be needed.

The issue occurs within the sequence we emit for a NIR memory barrier
with acquire semantics requiring the synchronization of multiple
caches, e.g. in pseudocode for a barrier involving the TGM and UGM
caches on DG2:

 x <- load.ugm // Atomic read sequenced-before the barrier
 y <- fence.ugm
 z <- fence.tgm
 wait(y, z)
 w <- load.tgm // Read sequenced-after the barrier

In the example we must provide the guarantee that the memory load for
x is completed before the one for w, however this ordering can be
reversed with the intervention of a concurrent thread, since the UGM
fence will block on the prior UGM load and potentially take a long
time, while the TGM fence may complete and invalidate the TGM cache
immediately, so a concurrent thread could pollute the TGM cache with
stale contents for the w location *before* the UGM load has completed,
leading to an inversion of the expected memory ordering.

v2: Apply the workaround regardless of whether the NIR barrier
    intrinsic specifies multiple storage classes or a single one,
    since an acquire barrier is required to order subsequent requests
    relative to previous atomic requests of unknown storage class not
    necessarily specified by the memory scope information of the
    intrinsic.

Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>
2023-01-18 21:34:33 -08:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Kenneth Graunke
88918baf5c intel/compiler: Delete key->msaa_16
None of the drivers have used this since we dropped i965, and BLORP
no longer uses it as of the previous commit.  We can also drop the
former compressed_multisample_tex_mask (now padding) field so that
things remain 64-bit aligned.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Kenneth Graunke
584e18863e intel: Drop compressed_multisample_layout_mask from the compiler keys
The compiler looks at this key field to determine whether to perform
an MCS fetch for a txf_ms or samples_identical texture message, if a
nir_tex_src_ms_mcs_intel source wasn't provided.  If it isn't set,
it instead uses constant 0 (nothing is compressed).

All of the drivers (iris, crocus, anv, hasvk) unconditionally set this
to ~0 because we don't want to pay for costly shader recompiles (which
can cause nasty stuttering).  Most textures are compressed anyway, and
the hardware ignores the l2dms MCS parameter if MCS is disabled.

The only user was BLORP, which sets the key field based on whether the
texture's aux usage has MCS.  But if it has MCS, it also does the MCS
fetch itself and supplies it directly.  Otherwise, it relies on the
compiler to fill in the 0 value.  But it could easily just provide the
0 value itself in that case and not rely on the compiler at all.

With that fixed, we can just drop the key fields entirely.  We leave
them as padding for now to avoid repacking structures; we won't need
to after the next commits anyway.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Jason Ekstrand
7d2e3f660c intel/fs: Support load_workgroup_id_zero_base
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20068>
2022-12-01 04:56:48 +00:00
Lionel Landwerlin
9c1c1888d9 intel/fs: put scratch surface in the surface state heap
In 4ceaed7839 we made scratch surface state allocations part of the
internal heap (mapped to STATE_BASE_ADDRESS::SurfaceStateBaseAddress)
so that it doesn't uses slots in the application's expected 1M
descriptors (especially with vkd3d-proton).

But all our compiler code relies on BSS
(STATE_BASE_ADDRESS::BindlessSurfaceStateBaseAddress).

The additional issue is that there is only 26bits of surface offset
available in CS instruction (CFE_STATE, 3DSTATE_VS, etc...) for
scratch surfaces. So we need the drivers to put the scratch surfaces
in the first chunk of STATE_BASE_ADDRESS::SurfaceStateBaseAddress
(hence all the driver changes).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4ceaed7839 ("anv: split internal surface states from descriptors")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7687
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19727>
2022-11-19 14:58:58 +00:00
Ian Romanick
351b8c6aec intel/fs: Enable nir_op_imul_32x16 and nir_op_umul_32x16 on pre-Gfx7
Even though Intel's CI doesn't test these old platforms anymore, the
validation added in "intel/eu/validate: Validate integer multiplication
source size restrictions" combined with full shader-db runs gives me
confidence in the changes.

Sandy Bridge
total instructions in shared programs: 13902341 -> 13902167 (<.01%)
instructions in affected programs: 30771 -> 30597 (-0.57%)
helped: 66 / HURT: 0

total cycles in shared programs: 741795500 -> 741791931 (<.01%)
cycles in affected programs: 987602 -> 984033 (-0.36%)
helped: 28 / HURT: 5

Iron Lake
total instructions in shared programs: 8365806 -> 8365754 (<.01%)
instructions in affected programs: 1766 -> 1714 (-2.94%)
helped: 10 / HURT: 0

total cycles in shared programs: 248542694 -> 248542378 (<.01%)
cycles in affected programs: 29836 -> 29520 (-1.06%)
helped: 9 / HURT: 0

GM45
total instructions in shared programs: 5187127 -> 5187101 (<.01%)
instructions in affected programs: 891 -> 865 (-2.92%)
helped: 5 / HURT: 0

total cycles in shared programs: 163643914 -> 163643750 (<.01%)
cycles in affected programs: 22206 -> 22042 (-0.74%)
helped: 5 / HURT: 0

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Ian Romanick
293ad13e3f intel/fs: Slightly restructure emitting nir_op_imul_32x16 and nir_op_umul_32x16
There are no immediate values at this point, so all of this code was
bunk. :face_palm:

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Caio Oliveira
22d8ed84b8 intel/compiler: Remove unused fs_visitor::emit_percomp()
Since 7ef7738a61 ("i965: Write gl_FragCoord directly to the destination.") this
is not used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>
2022-11-08 07:33:09 +00:00
Rohan Garg
43169dbbe5 intel/compiler: Support 16 bit float ops
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17988>
2022-10-17 15:56:28 +02:00
Lionel Landwerlin
9dba8d8aa1 intel/fs: take a builder arg for resolve_source_modifiers()
There will be situations where we will want to use a local builder
rather than the one associated with NIR->backend translation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
José Roberto de Souza
f4857591e1 intel/compiler/fs: Use DF to load constants when has_64bit_int is not supported
This was already been done to gen7 platforms, so now extending to all
platforms without has_64bit_int.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>
2022-09-14 19:32:43 +00:00
Caio Oliveira
0b6e613de8 intel/compiler: Create and use struct for CS thread payload
Move subgroup_id, that's only used by CS for verx10 < 125, as part of
the payload too -- even though is not, strictly speaking.

Note the thread execution of Task/Mesh is similar enough, so we make
their common struct inherit from cs_thread_payload.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
9de790760e intel/compiler: Create and use struct for Bindless thread payload
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
a70378f292 intel/compiler: Store start of ICP handles in GS thread payload struct
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
5b6987daee intel/compiler: Create and use struct for GS thread payload
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
19c6e1b447 intel/compiler: Create and use struct for TES thread payload
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
eb837dd23b intel/compiler: Store start of ICP handles in TCS thread payload struct
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
2622fc3af1 intel/compiler: Store Primitive ID in TCS thread payload struct
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
9a9b1119b4 intel/compiler: Store Patch URB output in TCS thread payload struct
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Jordan Justen
af8ab4a889 intel/compiler: Use builder to allocate fs regs for gs control data bits
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18537>
2022-09-12 10:00:28 -07:00
Caio Oliveira
00b8f9a3a6 intel/compiler: Use builder to allocate fs regs for TCS store output
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18537>
2022-09-12 10:00:18 -07:00
Caio Oliveira
55db3aaa3a intel/compiler: Create fs_visitor::emit_tcs_barrier()
Allow us to implement this in brw_fs_visitor.cpp, which then will
let us deduplicate code between the CS-like barrier and the TCS
barrier in a later patch.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18362>
2022-09-09 09:35:08 -07:00
Kenneth Graunke
19fc870ac6 intel/compiler: Use subgroup invocation for ICP handle loads
When loading a TCS or GS input, we generate some code to read the URB
handle for a particular input control point (ICP handle), which often
involves indirect addressing due to a non-constant vertex.

For example:

   mov(8) vgrf148+0.0:UW, 76543210V
   shl(8) vgrf149:UD, vgrf148+0.0:UW, 2u
   shl(8) vgrf150:UD, vgrf145:UD, 5u
   add(8) vgrf151:UD, vgrf150:UD, vgrf149:UD
   mov_indirect(8) vgrf147:UD, g2:UD, vgrf151:UD, 96u

Unfortunately, the first load with 76543210V is considered a partial
write because the 8 channels of 16-bit UW data doesn't fill an entire
register, and we can't allocate VGRFs at sub-register granularity.

This causes none of the above math to be CSE'd, even though the first
two instructions are common to *all* input loads, and the rest may be
reused sometimes as well.

To work around this, we stop emitting 76543210V to a temporary, and
instead use nir_system_values[SYSTEM_VALUE_SUBGROUP_INVOCATION], which
already contains this value, and is unconditionally set up for us.
With all input loads using the same register for the sequence, our
CSE pass is able to eliminate the rest of the common math.

shader-db results on Tigerlake:

   total instructions in shared programs: 20748243 -> 20744844 (-0.02%)
   instructions in affected programs: 73410 -> 70011 (-4.63%)
   helped: 242 / HURT: 21
   helped stats (abs) min: 1 max: 37 x̄: 14.17 x̃: 15
   helped stats (rel) min: 0.17% max: 19.58% x̄: 6.13% x̃: 6.32%
   HURT stats (abs)   min: 1 max: 4 x̄: 1.38 x̃: 1
   HURT stats (rel)   min: 0.18% max: 1.31% x̄: 0.58% x̃: 0.58%
   95% mean confidence interval for instructions value: -13.73 -12.12
   95% mean confidence interval for instructions %-change: -6.00% -5.19%
   Instructions are helped.

   total cycles in shared programs: 785828951 -> 785788480 (<.01%)
   cycles in affected programs: 597593 -> 557122 (-6.77%)
   helped: 227 / HURT: 13
   helped stats (abs) min: 6 max: 624 x̄: 182.19 x̃: 185
   helped stats (rel) min: 0.24% max: 18.22% x̄: 7.85% x̃: 7.80%
   HURT stats (abs)   min: 2 max: 153 x̄: 68.08 x̃: 36
   HURT stats (rel)   min: 0.03% max: 7.79% x̄: 2.97% x̃: 1.25%
   95% mean confidence interval for cycles value: -182.55 -154.71
   95% mean confidence interval for cycles %-change: -7.84% -6.69%
   Cycles are helped.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18455>
2022-09-08 15:12:41 +00:00
Marcin Ślusarz
66bc9aec65 intel/compiler: add support for non-zero base in [load|store]_shared intrins
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17618>
2022-08-29 12:42:40 +00:00
Lionel Landwerlin
407f2beb97 intel/fs: port block a64/surface messages to use LSC
v2: Fixup block load/store on surfaces/shared-memory (Rohan)

v3: drop write specific size_written case (Rohan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
2022-08-24 17:51:40 +00:00
Caio Oliveira
bee2df64d2 intel/compiler: Use fs_reg helpers for GS icp_handle selection
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18221>
2022-08-24 01:42:23 +00:00
Caio Oliveira
b4aff6ab49 intel/compiler: Use fs_reg helpers for TCS icp_handle selection
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18221>
2022-08-24 01:42:22 +00:00
Caio Oliveira
a1b1fdf70d intel/compiler: Rename 8_PATCH to MULTI_PATCH
Make it clearer we are dealing with multiple patches,
works better in constrast with SINGLE_PATCH.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18151>
2022-08-24 00:39:57 +00:00
Lionel Landwerlin
3c78e94ff3 intel/fs: fixup scratch load/store handling on Gfx12.5+
We did not handle the operation with data size < 4. It works fine on
all other messages (global/shared). The initial commit was just too
restrictive.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1e242785c3 ("intel/fs: Implement load/store_scratch on XeHP")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16964>
2022-08-23 22:19:16 +00:00
Lionel Landwerlin
46a13404c0 intel/fs: fix load_scratch intrinsic
The selection of the internal opcode to deal with load_scratch is
incorrect.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c643979228 ("intel/fs: Choose memory message type based on bit size")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16964>
2022-08-23 22:19:16 +00:00
Kenneth Graunke
2cea0d6ef6 intel/compiler: Drop variable group size lowering
This backend lowering code has been dead since the removal of i965 -
nothing in the current source tree ever sets the flag.

This is handled by iris_setup_uniforms() and crocus_setup_uniforms().
Variable group size does not appear to be a feature in anv.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18055>
2022-08-18 16:17:03 +00:00
Kenneth Graunke
bb5d09da6c intel/compiler: Use named NIR intrinsic const index accessors
In the early days of NIR, you had to prod at inst->const_index[]
directly, but a long while back, we added handy accessor functions
that let you use the actual name of the thing you want instead of
memorizing the exact order of parameters.

Also rewrite a comment I had a hard time parsing.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18067>
2022-08-16 05:44:30 +00:00
Marcin Ślusarz
30c0f2bfbb intel/compiler: there are 4 types of fences on gfx >= 12.5
Found by code inspection.

There's an assert later checking that we haven't overflown
this array, so this change probably doesn't matter for any
workload.

Cc: 22.1 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16665>
2022-08-02 09:31:24 +00:00
Marcin Ślusarz
2bd148c990 intel/compiler: emit URB fences for TASK/MESH
Cc: 22.1 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16665>
2022-08-02 09:31:24 +00:00
Ian Romanick
377246318a intel/fs: Eliminate "masked" and "per slot offset" URB messages
All of this information can be inferred from the sources.

v2: Fix "error: unused variable 'opcode'" detected by marge-bot.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick
1b17f8fc5a intel/fs: Make logical URB read instructions more like other logical instructions
No shader-db changes on any Intel platform

Fossil-db results:

Tiger Lake
Instructions in all programs: 156926440 -> 156926470 (+0.0%)
Instructions hurt: 15

Cycles in all programs: 7513099349 -> 7513099402 (+0.0%)
Cycles hurt: 15

Ice Lake and Skylake had similar results. (Ice Lake shown)
Cycles in all programs: 9099036492 -> 9099036489 (-0.0%)
Cycles helped: 1

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick
349a040f68 intel/fs: Make logical URB write instructions more like other logical instructions
The changes to fs_visitor::validate() helped track down a place where I
initially forgot to convert a message to the new sources layout.  This
had caused a different validation failure in
dEQP-GLES31.functional.tessellation.tesscoord.triangles_equal_spacing,
but this were not detected until after SENDs were lowered.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951145 -> 19951133 (<.01%)
instructions in affected programs: 2429 -> 2417 (-0.49%)
helped: 8 / HURT: 0

total cycles in shared programs: 858904152 -> 858862331 (<.01%)
cycles in affected programs: 5702652 -> 5660831 (-0.73%)
helped: 2138 / HURT: 1255

Broadwell
total cycles in shared programs: 904869459 -> 904835501 (<.01%)
cycles in affected programs: 7686744 -> 7652786 (-0.44%)
helped: 2861 / HURT: 2050

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442032 (-0.0%)
Instructions helped: 337

Cycles in all programs: 9099270231 -> 9099036492 (-0.0%)
Cycles helped: 40661
Cycles hurt: 28606

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:18 +00:00
Emma Anholt
94bd06256a intel/fs: Simplify brw_barycentric_mode() args.
Reduce a bit of mode lookup noise I was tracing through trying to resolve
the previous bug.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>
2022-07-19 01:25:47 +00:00
Lionel Landwerlin
2d1f021e16 intel/fs: Set NonPerspectiveBarycentricEnable when the interpolator needs it.
[anholt: changed to make all drivers do the right thing by moving the
payload barycentric check into the compiler]

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>
2022-07-19 01:25:47 +00:00
Ian Romanick
a477587b4a intel/fs: Add _LOGICAL versions of URB messages
The lowering is currently fake.  It just changes the opcode from the
_LOGICAL version to the non-_LOGICAL version.

v2: Remove some rebase cruft.  's/gfx8_//;s/simd8_/' in
brw_instruction_name.  Both suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Lionel Landwerlin
9680e0e4a2 intel/fs: ray query fix for global address
With stages dispatching with a mask, we can run into situations where
we don't have the global address in all lanes. The existing code
always assumed we had the addres in at least lane0.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bb40e999d1 ("intel/nir: use a single intel intrinsic to deal with ray traversal")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17330>
2022-07-08 00:36:04 +00:00
Lionel Landwerlin
1b6c74c48d intel/fs: make sure memory writes have landed for thread dispatch
The thread dispatch SEND instructions will dispatch new threads
immediately even before the caller of the SEND instruction has reached
EOT. So we really need to make sure all the memory writes are visible
to other threads within the DSS before the SEND.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15755>
2022-07-07 09:48:20 +03:00
Marcin Ślusarz
f4386b81e6 intel: fix typos found by codespell
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17191>
2022-06-27 10:20:55 +00:00
Marcin Ślusarz
f871aa10a1 intel/compiler: assert that base is 0 for [load|store]_shared intrins
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17143>
2022-06-22 10:32:13 +00:00