Commit graph

12701 commits

Author SHA1 Message Date
José Roberto de Souza
dec5a624e9 anv: Check if vkCreateQueryPool() is being created in a supported queue
Turns out not even VK CTS was calling
vkEnumeratePhysicalDeviceQueueFamilyPerformanceQueryCountersKHR()
to check if queue supports query.
So here adding a explicity check in our implementation of
vkCreateQueryPool().

https://github.com/KhronosGroup/VK-GL-CTS/pull/482

Cc: 24.2 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30652>
2024-09-18 15:29:16 +00:00
José Roberto de Souza
141e7eaca7 anv: Make sure all previous vm binds are done before execute perf query pool
The query pool batch buffer or other bos could not be bound when
exec starts.

Cc: 24.2 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30652>
2024-09-18 15:29:16 +00:00
José Roberto de Souza
0a19d92ca5 anv: Add warning about mismatch between query queues
Cc: 24.2 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30652>
2024-09-18 15:29:16 +00:00
José Roberto de Souza
c5d79d533a anv: Fix context id or exec queue used to open perf stream
It was always using device->context_id what is not valid in i915 when
has_vm_control is true or when running with Xe KMD.

But anv_AcquireProfilingLockKHR() don't have the queue information so
at least for now we will only support queries in a single queue.

And for consistency doing the same in
anv_QueueSetPerformanceConfigurationINTEL() although here we have the
queue parameter but queries are only supported in render engine
so it would only expose other queues if user set some parameters.

Cc: 24.2 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30652>
2024-09-18 15:29:16 +00:00
Dylan Baker
67bcdbf4a1 hasvk: remove useless uint >= 0 check
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31214>
2024-09-17 21:16:36 +00:00
Dylan Baker
27dd9fd677 anv: remove useless uint >= 0 check
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31214>
2024-09-17 21:16:36 +00:00
Lionel Landwerlin
45377dc5c4 brw: fix vecN rebuilds
When loading a 64bit address from the push constants, we'll load a
vec2, so we need to allocate 2 GRFs and MOV each component.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11831
Fixes: 339630ab05 ("brw: enable A64 loads source rematerialization")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>
2024-09-17 14:22:23 +00:00
Lionel Landwerlin
c16b27f66f brw: use a builder of the size of the physical register for uniforms
Should avoid any partial write non-sense on Xe2+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 339630ab05 ("brw: enable A64 loads source rematerialization")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31010>
2024-09-17 14:22:23 +00:00
Lionel Landwerlin
02b124846f brw: fix TGM messages to use cmask lsc opcodes
This is a restriction for TGM.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b55f7716 ("intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>
2024-09-17 09:28:58 +00:00
Lionel Landwerlin
2159e17da0 brw: remove (load|store)_raw_intel
Those are Elk specific intrinsics.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b8f264cfe4 ("intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic()")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31199>
2024-09-17 09:28:58 +00:00
Dylan Baker
3f3cb1e2fa intel/elk: delete copy constructor and copy-assignment-operator
To keep the rule-of-three. This points out that the implicit copy
operations would be dangerous when there is an explicit constructor and
destructor, since the class is holding un-managed memory.

Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29667>
2024-09-16 20:31:45 +00:00
Dylan Baker
5809209316 anv: enforce state->cmd_buffer is never null in emit_Simpler_shader_init_fragment
We have a couple of checks where we allow this to be NULL, but later we
unconditionally and unavoidably dereference the pointer, which means
there's no way that it ever could have been NULL. Change the assert at
the top to not allow NULL, and remove checks for it being NULL

CID: 1616544
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31156>
2024-09-16 19:16:58 +00:00
Dylan Baker
5ebdfc8813 anv: assert we don't write past the end of an array
Our array has a fixed size of 32, and we know at the start of the block
that our type_count is < 32, but in the loop we grow the block, in
theory up to 31 times. Coverity notes that, and points out we could
write off the end of the array. Add an assert in the loop to ensure we
don't, and to help Coverity out.

CID: 1615171
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31173>
2024-09-16 17:42:40 +00:00
Dylan Baker
7556521417 intel: replace (uint64_t - uint64_t) > 0 with uint64_t > uint64_t
As coverity points out, if the second uint64_t was greater than the
first (I don't think it actually can be), then the overflow would result
in the check succeeding when it shouldn't. We could cast this to an
integer type, but since we have uint64_t, we'd need int128_t for that.
Instead, replace the comparison to 0 with a direct comparison, since
that would give the correct result without potential to overflow.

CID: 1604833
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31175>
2024-09-16 17:12:17 +00:00
Rohan Garg
daea7e1651 intel/compiler: use the correct cache enum for loads and stores
Fixes: 74efde7 ('intel/brw/xehp+: Drop redundant arguments of lsc_msg_desc*()')

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30742>
2024-09-16 15:18:31 +00:00
Rohan Garg
b99fd944e8 intel/compiler: version can never be above 11 due to the previous check
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30742>
2024-09-16 15:18:31 +00:00
Dylan Baker
ed8d1d3c9b anv: if queue is NULL in vm_bind return early
In the error handling path we end up creating a vk_sync and then later
we vk_sync_wait() on it. If that wait fails somehow we'll end up calling
vk_queue_set_lost(&queue->vk, ...) which would segfault if queue is
NULL.

If we end up in this situation (no queue), return directly whatever the
backend's vm_bind function returned, propagating the error up if
necessary.

Fixes: dd5362c78a ("anv/xe: try harder when the vm_bind ioctl fails")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31048>
2024-09-13 20:17:40 +00:00
Caio Oliveira
5e47c5f94a intel/executor: Fix a couple of memory leaks in the tool
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31120>
2024-09-13 01:21:24 +00:00
Ian Romanick
447dae7c13 intel/brw: Use nir_opt_generate_bfi
No shader-db changes on any Intel platform.

The "regression" in SEND messages occurs because a loop containing a
SEND is unrolled.

v2: Move after nir_opt_algebraic. Suggested by Georg.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19787034 -> 19785933 (<.01%)
instructions in affected programs: 373573 -> 372472 (-0.29%)
helped: 541 / HURT: 6

total cycles in shared programs: 906012612 -> 905626304 (-0.04%)
cycles in affected programs: 58456516 -> 58070208 (-0.66%)
helped: 382 / HURT: 180

fossil-db:

Lunar Lake
Totals:
Instrs: 140671401 -> 140670495 (-0.00%); split: -0.00%, +0.00%
Send messages: 12891430822 -> 12891430834 (+0.00%)
Loop count: 46905 -> 46904 (-0.00%)
Cycle count: 21527511599 -> 21530278999 (+0.01%); split: -0.00%, +0.02%
Spill count: 70728 -> 70766 (+0.05%)
Fill count: 139397 -> 139254 (-0.10%); split: -0.13%, +0.02%
Max live registers: 47512432 -> 47512500 (+0.00%)

Totals from 355 (0.06% of 549270) affected shaders:
Instrs: 878953 -> 878047 (-0.10%); split: -0.18%, +0.08%
Send messages: 19289 -> 19301 (+0.06%)
Loop count: 1243 -> 1242 (-0.08%)
Cycle count: 1434664642 -> 1437432042 (+0.19%); split: -0.06%, +0.25%
Spill count: 15826 -> 15864 (+0.24%)
Fill count: 38454 -> 38311 (-0.37%); split: -0.46%, +0.08%
Max live registers: 52530 -> 52598 (+0.13%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152516575 -> 152516147 (-0.00%); split: -0.00%, +0.00%
Send messages: 7491001 -> 7491013 (+0.00%)
Loop count: 47588 -> 47587 (-0.00%)
Cycle count: 17124433133 -> 17126147156 (+0.01%); split: -0.01%, +0.02%
Max live registers: 31854704 -> 31854764 (+0.00%)

Totals from 402 (0.06% of 633223) affected shaders:
Instrs: 839338 -> 838910 (-0.05%); split: -0.09%, +0.04%
Send messages: 20203 -> 20215 (+0.06%)
Loop count: 1243 -> 1242 (-0.08%)
Cycle count: 1327042160 -> 1328756183 (+0.13%); split: -0.11%, +0.24%
Max live registers: 33237 -> 33297 (+0.18%)

Tiger Lake
*** Shaders only in 'before' results are ignored:
fossil-db/steam-native/wolfenstein_youngblood/b8cefe7f700304c4/fs.32/0
from 1 apps: fossil-db/steam-native/wolfenstein_youngblood

Totals:
Instrs: 150549467 -> 150548952 (-0.00%); split: -0.00%, +0.00%
Send messages: 7495582 -> 7495594 (+0.00%)
Loop count: 46605 -> 46604 (-0.00%)
Cycle count: 15472381586 -> 15472247085 (-0.00%); split: -0.00%, +0.00%
Spill count: 59776 -> 59775 (-0.00%)
Fill count: 103475 -> 103464 (-0.01%)
Scratch Memory Size: 2384896 -> 2383872 (-0.04%)
Max live registers: 31760724 -> 31760787 (+0.00%)
Max dispatch width: 5569928 -> 5569912 (-0.00%)

Totals from 525 (0.08% of 632443) affected shaders:
Instrs: 349074 -> 348559 (-0.15%); split: -0.25%, +0.11%
Send messages: 24355 -> 24367 (+0.05%)
Loop count: 849 -> 848 (-0.12%)
Cycle count: 187080291 -> 186945790 (-0.07%); split: -0.19%, +0.12%
Spill count: 483 -> 482 (-0.21%)
Fill count: 1372 -> 1361 (-0.80%)
Scratch Memory Size: 22528 -> 21504 (-4.55%)
Max live registers: 36705 -> 36768 (+0.17%)
Max dispatch width: 6272 -> 6256 (-0.26%)

Ice Lake
Totals:
Instrs: 151804923 -> 151804396 (-0.00%); split: -0.00%, +0.00%
Send messages: 7553216 -> 7553228 (+0.00%)
Loop count: 46196 -> 46195 (-0.00%)
Cycle count: 15099805668 -> 15099533898 (-0.00%); split: -0.00%, +0.00%
Fill count: 103978 -> 103979 (+0.00%)
Max live registers: 32168254 -> 32168323 (+0.00%)

Totals from 527 (0.08% of 637191) affected shaders:
Instrs: 347482 -> 346955 (-0.15%); split: -0.25%, +0.10%
Send messages: 24586 -> 24598 (+0.05%)
Loop count: 849 -> 848 (-0.12%)
Cycle count: 191147758 -> 190875988 (-0.14%); split: -0.16%, +0.02%
Fill count: 1392 -> 1393 (+0.07%)
Max live registers: 37379 -> 37448 (+0.18%)

Skylake
Totals:
Instrs: 140981504 -> 140980647 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14653477192 -> 14653249734 (-0.00%); split: -0.00%, +0.00%
Fill count: 99636 -> 99637 (+0.00%)
Max live registers: 31472062 -> 31472126 (+0.00%)

Totals from 523 (0.08% of 626432) affected shaders:
Instrs: 335551 -> 334694 (-0.26%); split: -0.26%, +0.01%
Cycle count: 178047284 -> 177819826 (-0.13%); split: -0.14%, +0.02%
Fill count: 1100 -> 1101 (+0.09%)
Max live registers: 36734 -> 36798 (+0.17%)

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006>
2024-09-13 00:21:00 +00:00
Kenneth Graunke
02482604e5 intel/brw: Delete old-style surface and A64 message opcodes
These have now been replaced by the MEMORY_*_LOGICAL opcodes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
7090578c35 intel/brw: Switch load_ubo_uniform_block_intel over to memory intrinsics
While there are many cases that turn into the *_PULL_CONSTANT_LOAD ops
or push constants, this one piece was emitting surface block loads.
Switch it over to use the new intrinsics to delete a bunch of code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
b55f77161d intel/brw: Switch to emitting MEMORY_*_LOGICAL opcodes
We introduce a new fs_nir_emit_memory_access() helper that can handle
image, bindless image, SSBO, shared, global, and scratch memory, and
handles loads, stores, atomics, and block loads.  It translates each
of these NIR intrinsics into the new MEMORY_*_LOGICAL intrinsics.

As a result, we delete a lot of similar surface access emitter code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
3ba97176d6 intel/brw: Switch load_num_workgroups to the new memory intrinsic
A simple case we handle directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
dc4770b005 intel/brw: Lower MEMORY_OPCODE_*_LOGICAL to HDC messages
This is more complicated.  We map the MEMORY_*_LOGICAL opcodes to the
older HDC messages: typed and untyped surface read/write/atomic (whether
float or integer), DWord and Byte scattered messages, OWord block, and
both A64, BTI, and stateless messages.

- MEMORY_MODE_* is used to select stateless-scratch, typed, or untyped.
- MEMORY_FLAG_TRANSPOSE is used to select block access.
- MEMORY_BINDING_TYPE = FLAT and 64-bit address size selects A64.
- Alignment and data type size select between byte/dword scattered or
  surface messages.

While we may not be able to handle the full generality of message
possibilities, we can handle everything we generate currently.  The plan
here is to assert/validate that we don't generate MEMORY_*_LOGICAL ops
on HDC-based platforms which can't support those particular messages.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
3255c9cc49 intel/brw: Lower MEMORY_OPCODE_*_LOGICAL to LSC messages
This is pretty straightforward, as the new MEMORY_*_LOGICAL opcodes
are designed to match the new LSC's capabilities.  The main part is
constructing the message payload.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
a82e8b1c6b intel/brw: Pretty-print memory logical opcodes
The new MEMORY_*_LOGICAL intrinsics have a lot of control sources with
a bunch of LSC_* enums (opcode, memory type, address type, address and
data sizes), as well as flags, coordinate components vs. components...
they unfortunately are nigh-unreadable with the default printing since
there's just a string of unreadable UD immediates in some order.

To fix this, we add some basic pretty-printing.  If a control source is
simply an enum whose value communicates the entire purpose, we print it.
If it has a numeric value (i.e. alignment, or data), we add a label.

For example:

   memory_store(16) (null):UD store shared flat addr: %2:UD coord_comps:1u align:16u d32 comps:2u data0: %3:UD

   memory_store(16) (null):UD store typed bti:%2+0.0<0>:UD addr: %3+0.0:D coord_comps:2u align:0u d32 comps:4u data0: %4:UD

This make them much easier to read.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
2c67729386 intel/brw: Expose functions to convert LSC enums to strings
We had tables for these in the disassembler already, but I'd like to use
them in brw_print.cpp as well.  Just wrap the tables in convenience
functions we can use there.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
d5f38be713 intel/brw: Introduce new MEMORY_*_LOGICAL opcodes
This is a new unified set of opcodes for memory access loosely patterned
after the new LSC-style data port messages introduced on Alchemist GPUs.

Rather than creating an opcode for every type of memory access, it has
only three opcodes: load, store, and atomic.  It has various sources to
indicate the rest:

- Binding type (raw pointer, pointer to surface state, or BT index)
- Address size (A64, A32, A16)
- Data size (bit size, number of components)
- Opcode (atomic opcode, or LOAD/STORE vs. LOAD_CMASK/STORE_CMASK)
- Mode (typed vs. untyped vs. shared-local vs. scratch)
- Address (and its dimensionality)
- Data (0 for loads, 1 for stores, 2 for atomics)
- Whether we want block access

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
b8f264cfe4 intel/brw: Handle load/stores in lsc_op_for_nir_intrinsic()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
8a6903e50d intel/brw: Rename lsc_aop_for_nir_intrinsic to "op" instead of "aop"
This is going to handle more than atomics shortly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
e8883bd40b intel/brw: Use size_written for NoMask instructions in is_partial_write
The intention of inst->is_partial_write() is that it should return true
when any REG_SIZE (32B) chunk of inst's destination is written but not
fully overwritten.  This can be used to tell whether inst combines new
data with existing data, or screens off any previous writes, so the old
values are no longer required.

The existing (exec_size * brw_type_size_bytes(this->dst.type) < 32)
check doesn't work in a number of cases.  For example, LSC block loads
have exec_size == 1 and force_writemask_all set, but may write multiple
full registers of data.  (Currently, we only see them with exec_size 1
after logical-send-lowering, so our SHADER_OPCODE_SEND special case
was covering those.)  We had also special cased UNDEF.

Instead, we can simply check:

   1. Predication
   2. !inst->dst.contiguous()
   3. inst->dst.offset % REG_SIZE != 0
   4. inst->size_written % REG_SIZE != 0

We had the first three already, but #4 is new.  If either #3 or #4
are true, then that implies there is a REG_SIZE chunk of the destination
which is written, but not entirely written, so it's a partial write.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
Kenneth Graunke
ab0b9b6792 intel/brw: Use NUM_BRW_OPCODES in can_omit_write() check
The intention here is to detect ALU hardware instructions, but not
virtual instructions that haven't been explicitly whitelisted.

For some reason we had arbitrarily hardcoded 128 here, but our virtual
opcodes don't start at 128.  They start at NUM_BRW_OPCODES.  So, use
that instead.

This prevents regressions later when we delete some opcodes, shifting
some virtual opcodes into the 72-128 range.

Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
2024-09-12 20:54:36 +00:00
David Heidelberg
e4b247ec9b ci/intel: Officially switch intel-adl-cl to pre-merge
It has proven to be useful.

Due to the .rusticl-rules reference, job was already running in pre-merge,
so let's make it official.

Reviewed-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31144>
2024-09-12 15:40:03 +00:00
Nanley Chery
e0157abec6 anv,iris: Pack depth pixels into initialized arrays
Coverity alerts that the uint32_t pointer I was passing into
isl_color_value_pack() could possibly be used as an array. The value is
being used as such, but only the first element of that array should be
accessed. That's because the depth buffer formats I'm also passing into
the function only have a single channel, R. Nonetheless, let's update
the code to avoid the warning.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31123>
2024-09-11 22:04:30 +00:00
Iván Briano
e4ee0a2ce1 anv: be consistent about aux usage with modifiers
In c1a7d520f3, we disabled AUX usage for imported images when they are
using an explicit modifier that doesn't support it.
We need to do the same when the modifier is picked by the driver,
otherwise the memory requirements reported for an exported image don't
match those we report for import.

Fixes: c1a7d520f3 ("anv: Disable aux if the explicit modifier lacks it")

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31051>
2024-09-10 22:09:41 +00:00
Caio Oliveira
eb68e6e84c anv: Advertise VK_KHR_compute_shader_derivatives
This was promoted from VK_NV_compute_shader_derivatives.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30956>
2024-09-10 18:22:42 +00:00
Dave Airlie
7531f6fd9c radv/anv/video: handling encoding both sps and pps in same buffer
This API should allow encoding these back to back into the same
buffer, so handle it properly.

Cc: mesa-stable
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31086>
2024-09-10 06:03:15 +00:00
Jordan Justen
c5c349a690 intel/dev: Fix warning for max_threads_per_psd when devinfo->verx10 == 120
Although we don't want to rely on hwconfig for devinfo->verx10 == 120,
due to the dependence on closed source software, we do check to see if
hwconfig reports different values in the DEVINFO_HWCONFIG macro.

Matt was seeing this warning on 8086:a7a0:

> MESA: warning: INTEL_HWCONFIG_TOTAL_PS_THREADS (128) != devinfo->max_threads_per_psd (64)

Reported-by: Matt Turner <mattst88@gmail.com>
Fixes: 3e4f73b3a0 ("intel/dev: Update hwconfig => max_threads_per_psd for Xe2")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31077>
2024-09-10 03:21:12 +00:00
Nanley Chery
c92e49e8f4 intel/isl: Always set EnableUnormPathInColorPipe
The TGL PRM says,

   This bit should never be programmed to 0

So, set it to true. I chose not to use the MBO attribute in genxml
because the field lacks the "Format: MBO" line in the PRM.

We previously made this programming conditional with commit 2e1be771e4
because of tests failing in
dEQP-GLES3.functional.texture.specification.tex*depth*. However, those
failures were fixed when we started using gl_FragDepth for depth buffer
copies in commit 6cec618e82.

Note: when bisecting this, I cherry-picked commit 7a68045b5d in order
to get past build failures related to a deprecated python function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31066>
2024-09-09 23:48:31 +00:00
Sviatoslav Peleshko
fa51595c7f brw: Fix mov cmod propagation when there's int signedness mismatch
If there's difference between scan_inst dest type and inst src type we
should be more careful, because difference in signedness can cause
incorrect results after the propagation.

Updated ror-default.trace hash, as the change fixes misrendering there.

Fixes: b23432c5 ("intel/fs: Fix a cmod prop bug when the source type of a mov doesn't match the dest type of scan_inst")
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30998>
2024-09-09 22:13:08 +00:00
Lionel Landwerlin
05dc524c75 anv: selectively disable binding table usage on Gfx20
Workaround broken Gfx20 dynamic BTI.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE")
Backport-to: 24.2
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30931>
2024-09-09 20:33:25 +00:00
Rohan Garg
7f65035078 hasvk: enable VK_KHR_shader_relaxed_extended_instruction
The extension only affects non semantic instructions that need no
handling in the backend compiler.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31098>
2024-09-09 17:46:32 +00:00
Rohan Garg
5f3339e44a anv: enable the VK_KHR_shader_relaxed_extended_instruction feature
Fixes: 29a2e5 ('anv: enable KHR_shader_relaxed_extended_instruction')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31098>
2024-09-09 17:46:32 +00:00
Daniel Stone
a78539e704 intel/tests: Reduce load from anv_tests
anv_tests tries to create a large number of threads, all of which wait
to be able to execute simultaneously, then launch a reasonable-size
workload.

Under load, cloning each of the 16 threads takes 15ms serially, for a
delay of 240ms before the tests start running; running the test 64
times gives us 15.36s for a single testcase in isolation, assuming that
the bits which aren't forking are free.

To give it the best shot at completing in time, mark it as a
non-parallelisable test (since Meson will also try to parallelise it
out), and also halve the number of runs it attempts. And then give it a
longer timeout so it doesn't fail even in extremis.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31059>
2024-09-09 12:54:34 +00:00
Caio Oliveira
2a5a12cb71 intel/executor: Small fixes to the help message
Add missing @eot to the example.
Reword INTEL_DEBUG=color description.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31076>
2024-09-07 16:32:50 +00:00
Alyssa Rosenzweig
1753bf599c ci: update traces
🤕

thanks Mike

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934>
2024-09-07 00:54:35 +00:00
Tapani Pälli
39a1f53890 anv: initialize pixel struct to zero when setting clear color
Otherwise we can end up with uninitialized values, this fixes following
valgrind warning:

==31283== Uninitialised byte(s) found during client check request
==31283==    at 0x503E4DE: anv_batch_bo_finish (anv_batch_chain.c:345)
==31283==    by 0x504220A: anv_cmd_buffer_end_batch_buffer (anv_batch_chain.c:1103)
==31283==    by 0x55A0E4F: end_command_buffer (genX_cmd_buffer.c:3455)
==31283==    by 0x55A0E82: gfx11_EndCommandBuffer (genX_cmd_buffer.c:3466)
==31283==    by 0x11233A: ??? (in /usr/bin/vkcube)
==31283==    by 0x10BDEE: ??? (in /usr/bin/vkcube)
==31283==    by 0x49B5149: (below main) (in /usr/lib64/libc.so.6)
==31283==  Address 0xc10c4d8 is 1,240 bytes inside a block of size 8,192 client-defined
==31283==    at 0x5036EF6: anv_bo_pool_alloc (anv_allocator.c:1284)
==31283==    by 0x503E0E1: anv_batch_bo_create (anv_batch_chain.c:262)
==31283==    by 0x5040D3F: anv_cmd_buffer_init_batch_bo_chain (anv_batch_chain.c:868)
==31283==    by 0x504F9C1: anv_create_cmd_buffer (anv_cmd_buffer.c:147)
==31283==    by 0x6B718C4: vk_common_AllocateCommandBuffers (vk_command_pool.c:206)
==31283==    by 0x4FB06B2: vkAllocateCommandBuffers (trampoline.c:1996)
==31283==    by 0x111E6B: ??? (in /usr/bin/vkcube)
==31283==    by 0x10BDEE: ??? (in /usr/bin/vkcube)
==31283==    by 0x49B5149: (below main) (in /usr/lib64/libc.so.6)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30990>
2024-09-06 13:19:04 +00:00
David Heidelberg
d16581652f ci/iris: implement nightly CL testing using piglit on ADL
Reviewed-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29516>
2024-09-05 08:57:51 +00:00
Lionel Landwerlin
aa494cbacf brw: align spilling offsets to physical register sizes
In commit fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message
width for Xe2 regs.") we aligned the width of scratch messages to
physical register sizes (32B prior to Xe2, 64B for Xe2+).

But our spilling offsets are computed using the register allocations
sizes which are in units of 32B. That means on Xe2, you can end up
spilling a virtual register allocated at 32B (which we use for surface
state computations with exec_all) and then the spilling of that
register will be emitted in SIMD16, having the upper 8 lanes
overwriting the next spilled register.

We could potentially limit spills to SIMD8 messages on Xe2 (only
writing 32B of data), but we're also unlikely to have all 32B virtual
register spilled next to one another. And if not tightly packed, we
would have 64B registers stored on 2 different cachelines which sounds
inefficient.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: fe3d90aedf ("intel/fs/xe2+: Fix calculation of spill message width for Xe2 regs.")
Backport-to: 24.2
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30983>
2024-09-04 23:05:31 +00:00
Jordan Justen
f817870aa9 anv: Don't warn about unsupported devices if INTEL_FORCE_PROBE was used
The user must have used INTEL_FORCE_PROBE to force the device to be
loaded, so they specifically opted-in to enabled unsupported device
support.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31011>
2024-09-04 12:09:12 -07:00