Commit graph

2433 commits

Author SHA1 Message Date
Jason Ekstrand
43ca7f4178 intel/compiler: Convert brw_wm_aa_enable to brw_sometimes
There are other cases where we want a tri-state logic like this.  May as
well have one enum for all the cases.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:17 +00:00
Jason Ekstrand
5d1c538449 intel/fs: Return early in a couple builtin setup helpers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:17 +00:00
Jason Ekstrand
714a291673 intel/compiler: Use SHADER_OPCODE_SEND for PI messages
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:17 +00:00
Jason Ekstrand
d25e5310bc intel/nir: Lower barycentrics to per-sample in a dedicated pass
This is more similar to what we do for single-sample and it should be
more clear going forward once our lowering gets more complex.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:17 +00:00
Jason Ekstrand
991d546102 intel/compiler: Document wm_prog_key::persample_interp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:17 +00:00
Ian Romanick
ea413e826b nir: Eliminate nir_op_f2b
Builds on the work of !15121.  This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.

No shader-db or fossil-db changes on any Intel platform.

v2: Rebase on 1a35acd8d9.

v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.

v4: Another rebase. Remove f2b stuff from Midgard.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
2023-02-03 22:39:57 +00:00
Sagar Ghuge
0c083d29a5 intel/fs: Always stall between the fences on Gen11+
Be conservative in Gfx11+ and always stall in a fence.  Since there are
two different fences, and shader might want to synchronize between them.

This change also brings back the original code block for the stall
between the fence and comment from the commit
b390ff3517.

v2: (Caio)
 - Re-arrange code block.
 - Adjust comment.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6958

Fixes: f7262462 ("intel/fs: Rework fence handling in brw_fs_nir.cpp")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Tested-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20996>
2023-02-02 00:21:21 +00:00
Amber
ab4c2990ed intel/compiler: use lower_image_samples_to_one
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewer-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Amber Amber <amber@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20813>
2023-02-01 19:52:49 +00:00
Marcin Ślusarz
af9e2b8bf1 intel/compiler/mesh: remove dead code path supporting >4 dword writes
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20858>
2023-01-31 18:28:21 +00:00
Marcin Ślusarz
be82ed28f0 intel/compiler/mesh: support longer write messages
Allowing longer writes reduces the number of send messages needed
to support unaligned 4-component writes.

Note: nothing currently generates 8-component writes, so this change
makes "second_mask" code path in emit_urb_direct_writes and
emit_urb_indirect_writes_mod dead.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20858>
2023-01-31 18:28:21 +00:00
Marcin Ślusarz
3131c2fc7a intel/compiler/mesh: optimize indirect writes
Our hardware requires that we write to URB using full vec4s at aligned
addresses. It gives us an ability to mask-off dwords within vec4 we don't
want to write, but we have to know their positions at compile time.

Let's assume that:
- V represents one dword we want to write
- ? is an unitinitialized value
- "|" is a vec4 boundary.

When we want to write 2-dword value at offset 0 we generate 1 write message:
| V1 V2 ? ? |
with mask:
| 1  1  0 0 |

When we want to write 4-dword value at offset 2 we generate 2 write messages:
| ? ? V1 V2 | V3 V4 ? ? |
with mask:
| 0 0 1  1  | 1  1  0 0 |

However if we don't know the offset within vec4 at *compile time* we
currently generate 4 write messages:
| V1 V1 V1 V1 |
| 0  0  1  0  |

| V2 V2 V2 V2 |
| 0  0  0  1  |

| V3 V3 V3 V3 |
| 1  0  0  0  |

| V4 V4 V4 V4 |
| 0  1  0  0  |

where masks are determined at *run time*.

This is quite wasteful and slow.

However, if we could determine the offset modulo 4 statically at compile time,
we could generate only 1 or 2 write messages (1 if modulo is 0) instead of 4.

This is what this patch does: it analyzes the addressing expression for
modulo 4 value and if it can determine it at compile time, we generate
1 or 2 writes, and if it can't we fallback to the old 4 writes method.

In mesh shader, the value of offset modulo 4 should be known for all outputs,
with an exception of primitive indices.

The modulo value should be known because of MUE layout restrictions, which
require that user per-primitive and per-vertex data start at address aligned
to 8 dwords and we should statically always know the offset from this base.

There can be some cases where the offset from the base is more dynamic
(e.g. indirect array access inside a per-vertex value), so we always do
the analysis.

Primitive indices are an exception, because they form vec3s (for triangles),
which means that the offset will not be easy to analyse.

When U888X index format lands, primitive indices will use only one dword
per triangle, which means that we'll always write them using one message.

Task shaders don't have any predetermined structure of output memory, so
always do the analysis.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20050>
2023-01-31 13:50:08 +00:00
Marcin Ślusarz
432e263284 intel/compiler: fine-grained control of dispatch widths
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20854>
2023-01-27 11:00:41 +00:00
Lionel Landwerlin
13cca48920 intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7
We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more
generic sends and drop this internal opcode.

The idea behind this change is to allow bindless surfaces to be used
for UBO pulls and why it's interesting to be able to reuse
setup_surface_descriptors(). But that will come in a later change.

No shader-db changes on TGL & DG2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>
2023-01-26 11:26:53 +00:00
Francisco Jerez
7b5e933629 intel/fs: Fix src and dst types of LOAD_PAYLOAD ACP entries during copy propagation.
The ACP entries created by copy propagation to track the implied
copies of LOAD_PAYLOAD instructions don't model the behavior of
LOAD_PAYLOAD correctly, since (as of 41868bb682) header
moves are implicitly retyped to UD and the destination of non-header
copies implicitly uses the same type as the corresponding source, even
though the ACP entries created for such copies could incorrectly
represent a type conversion, which can lead to mis-optimization of the
program.

According to Marcin, this fixes the func.mesh.ext.workgroup_id.task.q0
crucible test.

Fixes: 41868bb682 ("i965/fs: Rework the fs_visitor LOAD_PAYLOAD instruction")
Reported-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18980>
2023-01-25 22:22:12 +00:00
Marcin Ślusarz
536a2acfc2 intel/compiler/mesh: handle const data in task & mesh programs
Started showing up when nir_opt_large_constants call was moved in 88756cee8d.
Fixes dEQP-VK.mesh_shader.ext.smoke.monolithic.fullscreen_gradient*

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 88756cee8d ("intel/compiler: Run nir_opt_large_constants before scalarizing consts")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20876>
2023-01-24 14:47:21 +00:00
Marcin Ślusarz
9bb18a4f9e intel/compiler: fix generation of vec8/vec16 alu instruction
I stumbled on this when I inserted some suboptimal lowering code after all
optimizations. Adding certain subset of optimizations after my lowering code
actually avoided this bug, so I think it's not possible to hit this on upstream.

Let's fix this for the next person generating suboptimal code...

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20857>
2023-01-24 13:15:58 +00:00
Lionel Landwerlin
a50d2fdb46 intel/fs: avoid cmod optimization on instruction with different write_mask
I've been running into failures with tests like :

dEQP-VK.robustness.robustness2.bind.notemplate.rgba32i.unroll.nonvolatile.uniform_buffer_dynamic.no_fmt_qual.len_4.samples_1.1d.frag

With the load_global_const_block_intel NIR intrinsic, you can load a
vec8/vec16 with a predicate. The predicate is correctly uniformized to
feed into the SEND instruction's flag register.

The problem is that a series of optimization first remove the
find_live_channel and then changes the broadcast into a simple MOV
instruction, on the assumption that the first channel is always active
if there is not control flow. This is correct.

But after that the cmod optimzation will remove this instruction :

   mov.nz.f0.0(16) null:D, vgrf16+0.0<0>:D NoMask

because it seems to be equivalent to :

   cmp.g.f0.0(16) vgrf16:D, vgrf12:D, 63d

In this case vgrf16 is the predicate to the load block SEND
instruction. Since the execution mask is different between both, some
of the channels of the SEND instruction end up not being loaded or
loaded with the wrong predication and we end up with incorrect UBO
data.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20852>
2023-01-24 07:35:42 +00:00
Kenneth Graunke
7092c1218a intel/compiler: Use more symbolic source names in components_read()
Rather than hardcoding source 1, source 2, etc.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
16b66ab659 intel/compiler: Drop dest checking in atomic code
NIR atomic operation intrinsics all have destinations.  This is just
copy and pasted from other generic intrinsic handling where that may
or may not be the case.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
780f3e2e6b intel/compiler: Delete all the A64 atomic variants for type sizes
These are handled identically in almost all cases.  There is one place
in the legacy surface lowering that was obtaining the bitsize from the
opcode, but the LSC-based lowering uses (type_sz(inst->dst.type) * 8)
for that and works just fine.  If we just do that in the legacy lowering
too, then we don't need this plethora of opcodes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
03ddde1230 intel/compiler: Combine nir_emit_{ssbo,shared}_atomic into one helper
These are basically identical save for:
- shared has surface hardcoded to SLM rather than an SSBO index
- shared has to handle adding the 'base' const_index (SSBO have none)
- the NIR source index for data is shifted by one

It's not worth copy and pasting the entire function for this.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
b84939c678 intel/compiler: Delete fs_visitor::nir_emit_{ssbo,shared}_atomic_float()
These are now basically identical to their non-float counterparts.  The
only thing that differed was the opcode checking to determine which
operands existed.  Now that we have a unified opcode enum and a helper
for the number of data operands, we can just use that.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
f7b29d7924 intel/compiler: Drop redundant 32-bit expansion for shared float atomics
We already expanded data to 32-bit a few lines earlier, so this is just
redundantly doing it a second time.

Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
02129eee3a intel/compiler: Eliminate SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT
The only reason for the separate opcode was because of the overlapping
BRW_AOP_* enums, making it impossible to tell whether a particular AOP
was the integer or float operation.  Now that we use the lsc_opcode
enums, we can just have the legacy lowering inspect the opcode and
select the right descriptor.  No need for a separate opcode.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
284f0c9a57 intel/compiler: Add an lsc_op_num_data_values() helper
There are a number of places that need to know how many operands an LSC
atomic takes (0 for inc/dec, 1 for most things, 2 for cmpxchg).  We can
add a helper for that and eliminate some code (with more to come).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
90a2137cd5 intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs
This gets our logical atomic messages using the lsc_opcode enum rather
than the legacy BRW_AOP_* defines.  We have to translate one way or
another, and using the modern set makes sense going forward.

One advantage is that the lsc_opcode encoding has opcodes for both
integer and floating point atomics in the same enum, whereas the legacy
encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX),
which made it impossible to handle both sensibly in common code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
8d2dc52a14 intel/compiler: Move atomic op translation into emit_*_atomic()
There's no need to pass both the intrinsic and an opcode computed from
that same intrinsic.  Just do it in the functions themselves.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Francisco Jerez
f40e17059a intel/fs/gfx12+: Drop redundant handling of SHADER_OPCODE_BROADCAST in exec pipe inference.
Commit c80c0ed943 introduced handling of
SHADER_OPCODE_BROADCAST into inferred_exec_pipe(), but it was already
being handled, drop the redundant handling.  Shouldn't lead to any
functional changes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>
2023-01-19 06:14:03 +00:00
Francisco Jerez
b867d1b851 intel/eu/gfx12+: Implement decoding of 64-bit immediates.
C.f. a12533f2ce.  The corresponding
change for the decoding path was never implemented so the disassembler
was printing incorrect immediate values.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>
2023-01-19 06:14:03 +00:00
Francisco Jerez
f80f29dc4b intel/disasm/gfx12+: Fix print out of non-existing condmod field with 64-bit immediate.
The conditional mode field doesn't exist for instructions with a
64-bit immediate, so this would currently print garbage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>
2023-01-19 06:14:03 +00:00
Francisco Jerez
f3352745ad intel/disasm/gfx12+: Use helper instead of hardcoded bit access for 64-bit immediates.
So we don't have to duplicate code to handle differences in the
encoding of 64-bit immediates across platforms.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>
2023-01-19 06:14:03 +00:00
Francisco Jerez
4a2e7306dd intel/fs/gfx12: Ensure that prior reads have executed before barrier with acquire semantics.
This avoids a violation of the Vulkan memory model that was leading to
intermittent failures of at least 8k test-cases of the Vulkan CTS
(within the group dEQP-VK.memory_model.*) on TGL and DG2 platforms.
In theory the issue may be reproducible on earlier platforms like IVB
and ICL, but the SYNC.ALLWR instruction is not available on those
platforms so a different (likely costlier) fix will be needed.

The issue occurs within the sequence we emit for a NIR memory barrier
with acquire semantics requiring the synchronization of multiple
caches, e.g. in pseudocode for a barrier involving the TGM and UGM
caches on DG2:

 x <- load.ugm // Atomic read sequenced-before the barrier
 y <- fence.ugm
 z <- fence.tgm
 wait(y, z)
 w <- load.tgm // Read sequenced-after the barrier

In the example we must provide the guarantee that the memory load for
x is completed before the one for w, however this ordering can be
reversed with the intervention of a concurrent thread, since the UGM
fence will block on the prior UGM load and potentially take a long
time, while the TGM fence may complete and invalidate the TGM cache
immediately, so a concurrent thread could pollute the TGM cache with
stale contents for the w location *before* the UGM load has completed,
leading to an inversion of the expected memory ordering.

v2: Apply the workaround regardless of whether the NIR barrier
    intrinsic specifies multiple storage classes or a single one,
    since an acquire barrier is required to order subsequent requests
    relative to previous atomic requests of unknown storage class not
    necessarily specified by the memory scope information of the
    intrinsic.

Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>
2023-01-18 21:34:33 -08:00
Tapani Pälli
53de48f1c4 intel/compiler: add cpp_std=c++17 when building tests
Otherwise build fails:

"../src/intel/compiler/brw_private.h:40:4: note:
 ‘std::variant’ is only available from C++17 onwards"

Fixes: 6c194ddd18 ("intel/compiler: Prepare SIMD selection helpers to handle different prog_datas")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20725>
2023-01-17 13:58:03 +00:00
Nico Cortes
29adbb132f Revert "intel/compiler: fine-grained control of dispatch widths"
This reverts commit bed18ab3e2.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8063
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20654>
2023-01-12 00:33:25 +00:00
Marcin Ślusarz
bed18ab3e2 intel/compiler: fine-grained control of dispatch widths
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20535>
2023-01-11 08:17:12 +00:00
Ian Romanick
51be623372 intel/eu/validate: Check predication and cmod for SEL, CMP, and CMPN
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
e0f409c5d8 intel/eu/validate: Add validation for csel
v2: Also check the condition modifier. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
3a7c23973b intel/eu/validate: Add validation for bfi2
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
f34821d998 intel/eu/validate: More validation for logic ops
v2: Use number of source to condition validating src1 instead of using
the opcode. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
8be7406c81 intel/compiler: Assert that ARF used is the accumulator
v2: Move the new check to be with similar existing checks. Suggested by
Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
3b579a2ea8 intel/compiler: Validate 3-source instruction source strides
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
c5684019f6 intel/compiler: Validate 3-source instruction sources have same base type
This can't be checked in EU validation because the bits to describe the
base type of the individual sources no longer exist.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Lionel Landwerlin
6b494745be intel/fs: only avoid SIMD32 if strictly inferior in throughput
This enabled SIMD32 in blorp shaders and seems to be give a small FPS
bump when using a DG2 GPU as secondary (requires copies to linear
buffers to exchange with main GPU).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19341>
2023-01-09 08:41:47 +00:00
Ian Romanick
8ab7ec0129 intel/compiler: Enable lower_bitfield_extract_to_shifts and lower_bitfield_insert_to_shifts for pre-Gfx7
GLSL IR opcodes generated for bitfieldExtract and bitfieldInsert are
lowered by lower_instructions.  4dff3ff005 ("nir/opt_algebraic:
Optimize open coded bfm.") adds an optimization that can rematerialize
nir_op_bfm that was prevented by the GLSL IR lowering.

It appears that every piece of hardware, except older Intel GPUS, that
has real integers (i.e., lower_bitops is not set) also sets
lower_bitfield_extract_to_shifts and lower_bitfield_insert_to_shifts.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 4dff3ff005 ("nir/opt_algebraic: Optimize open coded bfm.")
Closes: #7874
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20323>
2023-01-03 18:37:53 -08:00
Lionel Landwerlin
25608659a0 intel/compiler: mark shader_record_ptr as uniform
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20413>
2022-12-23 09:22:13 +00:00
Jordan Justen
78a75e0d25 intel/common/intel_genX_state.h: Add intel_set_ps_dispatch_state()
This replaces brw_fs_get_dispatch_enables(), which was added in
b9403b1c47 ("intel: factor out dispatch PS enabling logic"), but this
function will not work well for future changes to 3DSTATE_PS.

So, instead, this moves the related code into a "genX" file which can
directly update 3DSTATE_PS for the given platform.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20329>
2022-12-15 00:54:59 -08:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
edae161d98 intel/fs: Use nir_type_convert instead of nir_type_conversion_op
In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

No shader-db or fossil-db changes on any Intel platform.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Lionel Landwerlin
94bb4a13fa intel/fs: make Wa_1806565034 conditional to non robust access
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20280>
2022-12-13 18:05:19 +00:00
Marcin Ślusarz
75375233f6 intel/compiler/mesh: extract emit_urb_direct_vec4_write
No functional changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:49 +00:00