Commit graph

446 commits

Author SHA1 Message Date
Faith Ekstrand
83fd7a5ed1 intel: Use nir_lower_tex_options::lower_index_to_offset
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21546>
2023-03-06 21:38:32 +00:00
Caio Oliveira
c80268a20d intel/compiler: Mark various memory barriers intrinsics unreachable
Now that both SPIR-V and GLSL are using scoped barriers, we can stop
handling the specialized ones.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3339>
2023-02-27 20:24:01 +00:00
Daniel Schürmann
2bb369dd8d nir: add assertions that loops don't have a Continue Construct
Hoping that I didn't miss any, this *should* add assertions
to all functions and passes which explicitly handle 'nir_loop'.

Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>
2023-02-21 10:41:11 +00:00
Lionel Landwerlin
9ac192d79d intel/fs: bound subgroup invocation read to dispatch size
This is to avoid out of bound register accesses (potentially leading
to hangs) when the dispatch size is smaller than when is reported in
the NIR subgroup_size.

v2: Implement bounding with a mask (since workgroup sizes are powers of 2) (Faith)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 530de844ef ("intel,anv,iris,crocus: Drop subgroup size from the shader key")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21282>
2023-02-14 21:29:42 +00:00
Jason Ekstrand
949b42c4dc intel/compiler: Convert wm_prog_key::multisample_fbo to a tri-state
This allows us to communicate to the back-end that we don't actually
know if the framebuffer is multisampled or not.  No drivers set anything
but ALWAYS/NEVER and we still have a few ALWAYS/NEVER assumptions but
those should be asserted.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:18 +00:00
Ian Romanick
ea413e826b nir: Eliminate nir_op_f2b
Builds on the work of !15121.  This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.

No shader-db or fossil-db changes on any Intel platform.

v2: Rebase on 1a35acd8d9.

v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.

v4: Another rebase. Remove f2b stuff from Midgard.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
2023-02-03 22:39:57 +00:00
Sagar Ghuge
0c083d29a5 intel/fs: Always stall between the fences on Gen11+
Be conservative in Gfx11+ and always stall in a fence.  Since there are
two different fences, and shader might want to synchronize between them.

This change also brings back the original code block for the stall
between the fence and comment from the commit
b390ff3517.

v2: (Caio)
 - Re-arrange code block.
 - Adjust comment.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6958

Fixes: f7262462 ("intel/fs: Rework fence handling in brw_fs_nir.cpp")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Tested-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20996>
2023-02-02 00:21:21 +00:00
Amber
ab4c2990ed intel/compiler: use lower_image_samples_to_one
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewer-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Amber Amber <amber@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20813>
2023-02-01 19:52:49 +00:00
Lionel Landwerlin
13cca48920 intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7
We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more
generic sends and drop this internal opcode.

The idea behind this change is to allow bindless surfaces to be used
for UBO pulls and why it's interesting to be able to reuse
setup_surface_descriptors(). But that will come in a later change.

No shader-db changes on TGL & DG2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>
2023-01-26 11:26:53 +00:00
Marcin Ślusarz
9bb18a4f9e intel/compiler: fix generation of vec8/vec16 alu instruction
I stumbled on this when I inserted some suboptimal lowering code after all
optimizations. Adding certain subset of optimizations after my lowering code
actually avoided this bug, so I think it's not possible to hit this on upstream.

Let's fix this for the next person generating suboptimal code...

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20857>
2023-01-24 13:15:58 +00:00
Kenneth Graunke
16b66ab659 intel/compiler: Drop dest checking in atomic code
NIR atomic operation intrinsics all have destinations.  This is just
copy and pasted from other generic intrinsic handling where that may
or may not be the case.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
780f3e2e6b intel/compiler: Delete all the A64 atomic variants for type sizes
These are handled identically in almost all cases.  There is one place
in the legacy surface lowering that was obtaining the bitsize from the
opcode, but the LSC-based lowering uses (type_sz(inst->dst.type) * 8)
for that and works just fine.  If we just do that in the legacy lowering
too, then we don't need this plethora of opcodes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
03ddde1230 intel/compiler: Combine nir_emit_{ssbo,shared}_atomic into one helper
These are basically identical save for:
- shared has surface hardcoded to SLM rather than an SSBO index
- shared has to handle adding the 'base' const_index (SSBO have none)
- the NIR source index for data is shifted by one

It's not worth copy and pasting the entire function for this.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
b84939c678 intel/compiler: Delete fs_visitor::nir_emit_{ssbo,shared}_atomic_float()
These are now basically identical to their non-float counterparts.  The
only thing that differed was the opcode checking to determine which
operands existed.  Now that we have a unified opcode enum and a helper
for the number of data operands, we can just use that.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
f7b29d7924 intel/compiler: Drop redundant 32-bit expansion for shared float atomics
We already expanded data to 32-bit a few lines earlier, so this is just
redundantly doing it a second time.

Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
02129eee3a intel/compiler: Eliminate SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT
The only reason for the separate opcode was because of the overlapping
BRW_AOP_* enums, making it impossible to tell whether a particular AOP
was the integer or float operation.  Now that we use the lsc_opcode
enums, we can just have the legacy lowering inspect the opcode and
select the right descriptor.  No need for a separate opcode.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
90a2137cd5 intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs
This gets our logical atomic messages using the lsc_opcode enum rather
than the legacy BRW_AOP_* defines.  We have to translate one way or
another, and using the modern set makes sense going forward.

One advantage is that the lsc_opcode encoding has opcodes for both
integer and floating point atomics in the same enum, whereas the legacy
encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX),
which made it impossible to handle both sensibly in common code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
8d2dc52a14 intel/compiler: Move atomic op translation into emit_*_atomic()
There's no need to pass both the intrinsic and an opcode computed from
that same intrinsic.  Just do it in the functions themselves.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Francisco Jerez
4a2e7306dd intel/fs/gfx12: Ensure that prior reads have executed before barrier with acquire semantics.
This avoids a violation of the Vulkan memory model that was leading to
intermittent failures of at least 8k test-cases of the Vulkan CTS
(within the group dEQP-VK.memory_model.*) on TGL and DG2 platforms.
In theory the issue may be reproducible on earlier platforms like IVB
and ICL, but the SYNC.ALLWR instruction is not available on those
platforms so a different (likely costlier) fix will be needed.

The issue occurs within the sequence we emit for a NIR memory barrier
with acquire semantics requiring the synchronization of multiple
caches, e.g. in pseudocode for a barrier involving the TGM and UGM
caches on DG2:

 x <- load.ugm // Atomic read sequenced-before the barrier
 y <- fence.ugm
 z <- fence.tgm
 wait(y, z)
 w <- load.tgm // Read sequenced-after the barrier

In the example we must provide the guarantee that the memory load for
x is completed before the one for w, however this ordering can be
reversed with the intervention of a concurrent thread, since the UGM
fence will block on the prior UGM load and potentially take a long
time, while the TGM fence may complete and invalidate the TGM cache
immediately, so a concurrent thread could pollute the TGM cache with
stale contents for the w location *before* the UGM load has completed,
leading to an inversion of the expected memory ordering.

v2: Apply the workaround regardless of whether the NIR barrier
    intrinsic specifies multiple storage classes or a single one,
    since an acquire barrier is required to order subsequent requests
    relative to previous atomic requests of unknown storage class not
    necessarily specified by the memory scope information of the
    intrinsic.

Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>
2023-01-18 21:34:33 -08:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Kenneth Graunke
88918baf5c intel/compiler: Delete key->msaa_16
None of the drivers have used this since we dropped i965, and BLORP
no longer uses it as of the previous commit.  We can also drop the
former compressed_multisample_tex_mask (now padding) field so that
things remain 64-bit aligned.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Kenneth Graunke
584e18863e intel: Drop compressed_multisample_layout_mask from the compiler keys
The compiler looks at this key field to determine whether to perform
an MCS fetch for a txf_ms or samples_identical texture message, if a
nir_tex_src_ms_mcs_intel source wasn't provided.  If it isn't set,
it instead uses constant 0 (nothing is compressed).

All of the drivers (iris, crocus, anv, hasvk) unconditionally set this
to ~0 because we don't want to pay for costly shader recompiles (which
can cause nasty stuttering).  Most textures are compressed anyway, and
the hardware ignores the l2dms MCS parameter if MCS is disabled.

The only user was BLORP, which sets the key field based on whether the
texture's aux usage has MCS.  But if it has MCS, it also does the MCS
fetch itself and supplies it directly.  Otherwise, it relies on the
compiler to fill in the 0 value.  But it could easily just provide the
0 value itself in that case and not rely on the compiler at all.

With that fixed, we can just drop the key fields entirely.  We leave
them as padding for now to avoid repacking structures; we won't need
to after the next commits anyway.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Jason Ekstrand
7d2e3f660c intel/fs: Support load_workgroup_id_zero_base
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20068>
2022-12-01 04:56:48 +00:00
Lionel Landwerlin
9c1c1888d9 intel/fs: put scratch surface in the surface state heap
In 4ceaed7839 we made scratch surface state allocations part of the
internal heap (mapped to STATE_BASE_ADDRESS::SurfaceStateBaseAddress)
so that it doesn't uses slots in the application's expected 1M
descriptors (especially with vkd3d-proton).

But all our compiler code relies on BSS
(STATE_BASE_ADDRESS::BindlessSurfaceStateBaseAddress).

The additional issue is that there is only 26bits of surface offset
available in CS instruction (CFE_STATE, 3DSTATE_VS, etc...) for
scratch surfaces. So we need the drivers to put the scratch surfaces
in the first chunk of STATE_BASE_ADDRESS::SurfaceStateBaseAddress
(hence all the driver changes).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4ceaed7839 ("anv: split internal surface states from descriptors")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7687
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19727>
2022-11-19 14:58:58 +00:00
Ian Romanick
351b8c6aec intel/fs: Enable nir_op_imul_32x16 and nir_op_umul_32x16 on pre-Gfx7
Even though Intel's CI doesn't test these old platforms anymore, the
validation added in "intel/eu/validate: Validate integer multiplication
source size restrictions" combined with full shader-db runs gives me
confidence in the changes.

Sandy Bridge
total instructions in shared programs: 13902341 -> 13902167 (<.01%)
instructions in affected programs: 30771 -> 30597 (-0.57%)
helped: 66 / HURT: 0

total cycles in shared programs: 741795500 -> 741791931 (<.01%)
cycles in affected programs: 987602 -> 984033 (-0.36%)
helped: 28 / HURT: 5

Iron Lake
total instructions in shared programs: 8365806 -> 8365754 (<.01%)
instructions in affected programs: 1766 -> 1714 (-2.94%)
helped: 10 / HURT: 0

total cycles in shared programs: 248542694 -> 248542378 (<.01%)
cycles in affected programs: 29836 -> 29520 (-1.06%)
helped: 9 / HURT: 0

GM45
total instructions in shared programs: 5187127 -> 5187101 (<.01%)
instructions in affected programs: 891 -> 865 (-2.92%)
helped: 5 / HURT: 0

total cycles in shared programs: 163643914 -> 163643750 (<.01%)
cycles in affected programs: 22206 -> 22042 (-0.74%)
helped: 5 / HURT: 0

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Ian Romanick
293ad13e3f intel/fs: Slightly restructure emitting nir_op_imul_32x16 and nir_op_umul_32x16
There are no immediate values at this point, so all of this code was
bunk. :face_palm:

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Caio Oliveira
22d8ed84b8 intel/compiler: Remove unused fs_visitor::emit_percomp()
Since 7ef7738a61 ("i965: Write gl_FragCoord directly to the destination.") this
is not used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>
2022-11-08 07:33:09 +00:00
Rohan Garg
43169dbbe5 intel/compiler: Support 16 bit float ops
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17988>
2022-10-17 15:56:28 +02:00
Lionel Landwerlin
9dba8d8aa1 intel/fs: take a builder arg for resolve_source_modifiers()
There will be situations where we will want to use a local builder
rather than the one associated with NIR->backend translation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
José Roberto de Souza
f4857591e1 intel/compiler/fs: Use DF to load constants when has_64bit_int is not supported
This was already been done to gen7 platforms, so now extending to all
platforms without has_64bit_int.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>
2022-09-14 19:32:43 +00:00
Caio Oliveira
0b6e613de8 intel/compiler: Create and use struct for CS thread payload
Move subgroup_id, that's only used by CS for verx10 < 125, as part of
the payload too -- even though is not, strictly speaking.

Note the thread execution of Task/Mesh is similar enough, so we make
their common struct inherit from cs_thread_payload.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
9de790760e intel/compiler: Create and use struct for Bindless thread payload
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
a70378f292 intel/compiler: Store start of ICP handles in GS thread payload struct
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
5b6987daee intel/compiler: Create and use struct for GS thread payload
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
19c6e1b447 intel/compiler: Create and use struct for TES thread payload
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
eb837dd23b intel/compiler: Store start of ICP handles in TCS thread payload struct
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
2622fc3af1 intel/compiler: Store Primitive ID in TCS thread payload struct
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
9a9b1119b4 intel/compiler: Store Patch URB output in TCS thread payload struct
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Jordan Justen
af8ab4a889 intel/compiler: Use builder to allocate fs regs for gs control data bits
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18537>
2022-09-12 10:00:28 -07:00
Caio Oliveira
00b8f9a3a6 intel/compiler: Use builder to allocate fs regs for TCS store output
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18537>
2022-09-12 10:00:18 -07:00
Caio Oliveira
55db3aaa3a intel/compiler: Create fs_visitor::emit_tcs_barrier()
Allow us to implement this in brw_fs_visitor.cpp, which then will
let us deduplicate code between the CS-like barrier and the TCS
barrier in a later patch.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18362>
2022-09-09 09:35:08 -07:00
Kenneth Graunke
19fc870ac6 intel/compiler: Use subgroup invocation for ICP handle loads
When loading a TCS or GS input, we generate some code to read the URB
handle for a particular input control point (ICP handle), which often
involves indirect addressing due to a non-constant vertex.

For example:

   mov(8) vgrf148+0.0:UW, 76543210V
   shl(8) vgrf149:UD, vgrf148+0.0:UW, 2u
   shl(8) vgrf150:UD, vgrf145:UD, 5u
   add(8) vgrf151:UD, vgrf150:UD, vgrf149:UD
   mov_indirect(8) vgrf147:UD, g2:UD, vgrf151:UD, 96u

Unfortunately, the first load with 76543210V is considered a partial
write because the 8 channels of 16-bit UW data doesn't fill an entire
register, and we can't allocate VGRFs at sub-register granularity.

This causes none of the above math to be CSE'd, even though the first
two instructions are common to *all* input loads, and the rest may be
reused sometimes as well.

To work around this, we stop emitting 76543210V to a temporary, and
instead use nir_system_values[SYSTEM_VALUE_SUBGROUP_INVOCATION], which
already contains this value, and is unconditionally set up for us.
With all input loads using the same register for the sequence, our
CSE pass is able to eliminate the rest of the common math.

shader-db results on Tigerlake:

   total instructions in shared programs: 20748243 -> 20744844 (-0.02%)
   instructions in affected programs: 73410 -> 70011 (-4.63%)
   helped: 242 / HURT: 21
   helped stats (abs) min: 1 max: 37 x̄: 14.17 x̃: 15
   helped stats (rel) min: 0.17% max: 19.58% x̄: 6.13% x̃: 6.32%
   HURT stats (abs)   min: 1 max: 4 x̄: 1.38 x̃: 1
   HURT stats (rel)   min: 0.18% max: 1.31% x̄: 0.58% x̃: 0.58%
   95% mean confidence interval for instructions value: -13.73 -12.12
   95% mean confidence interval for instructions %-change: -6.00% -5.19%
   Instructions are helped.

   total cycles in shared programs: 785828951 -> 785788480 (<.01%)
   cycles in affected programs: 597593 -> 557122 (-6.77%)
   helped: 227 / HURT: 13
   helped stats (abs) min: 6 max: 624 x̄: 182.19 x̃: 185
   helped stats (rel) min: 0.24% max: 18.22% x̄: 7.85% x̃: 7.80%
   HURT stats (abs)   min: 2 max: 153 x̄: 68.08 x̃: 36
   HURT stats (rel)   min: 0.03% max: 7.79% x̄: 2.97% x̃: 1.25%
   95% mean confidence interval for cycles value: -182.55 -154.71
   95% mean confidence interval for cycles %-change: -7.84% -6.69%
   Cycles are helped.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18455>
2022-09-08 15:12:41 +00:00
Marcin Ślusarz
66bc9aec65 intel/compiler: add support for non-zero base in [load|store]_shared intrins
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17618>
2022-08-29 12:42:40 +00:00
Lionel Landwerlin
407f2beb97 intel/fs: port block a64/surface messages to use LSC
v2: Fixup block load/store on surfaces/shared-memory (Rohan)

v3: drop write specific size_written case (Rohan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
2022-08-24 17:51:40 +00:00
Caio Oliveira
bee2df64d2 intel/compiler: Use fs_reg helpers for GS icp_handle selection
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18221>
2022-08-24 01:42:23 +00:00
Caio Oliveira
b4aff6ab49 intel/compiler: Use fs_reg helpers for TCS icp_handle selection
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18221>
2022-08-24 01:42:22 +00:00
Caio Oliveira
a1b1fdf70d intel/compiler: Rename 8_PATCH to MULTI_PATCH
Make it clearer we are dealing with multiple patches,
works better in constrast with SINGLE_PATCH.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18151>
2022-08-24 00:39:57 +00:00
Lionel Landwerlin
3c78e94ff3 intel/fs: fixup scratch load/store handling on Gfx12.5+
We did not handle the operation with data size < 4. It works fine on
all other messages (global/shared). The initial commit was just too
restrictive.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1e242785c3 ("intel/fs: Implement load/store_scratch on XeHP")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16964>
2022-08-23 22:19:16 +00:00
Lionel Landwerlin
46a13404c0 intel/fs: fix load_scratch intrinsic
The selection of the internal opcode to deal with load_scratch is
incorrect.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c643979228 ("intel/fs: Choose memory message type based on bit size")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16964>
2022-08-23 22:19:16 +00:00
Kenneth Graunke
2cea0d6ef6 intel/compiler: Drop variable group size lowering
This backend lowering code has been dead since the removal of i965 -
nothing in the current source tree ever sets the flag.

This is handled by iris_setup_uniforms() and crocus_setup_uniforms().
Variable group size does not appear to be a feature in anv.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18055>
2022-08-18 16:17:03 +00:00