Commit graph

9902 commits

Author SHA1 Message Date
Lionel Landwerlin
300cc829de intel/nir: handle image_sparse_load in storage format lowering
The last component of sparse load is the residency data. We don't want
to touch/convert that value with the format lowering.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>
2023-07-27 02:02:34 +03:00
Lionel Landwerlin
d33aff783d intel/fs: add support for sparse accesses
Purely from the backend point of view it's just an additional
parameter to sampler messages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>
2023-07-27 02:02:30 +03:00
Lionel Landwerlin
50c29e1ffa anv: simplify buffer address+size loads from descriptor buffer
Only found a couple titles that have been helped by this :

 PERCENTAGE DELTAS Shaders   Instrs    Cycles
 cyberpunk_2077    10388     -0.00%    -0.00%
 -----------------------------------------------
 All affected      1         -2.24%    -0.39%
 -----------------------------------------------
 Total             10388     -0.00%    -0.00%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles
 red_dead_redemption2 5949      -0.10%    -0.00%
 --------------------------------------------------
 All affected         111       -0.74%    -0.14%
 --------------------------------------------------
 Total                5949      -0.10%    -0.00%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23318>
2023-07-26 09:41:23 +00:00
Lionel Landwerlin
f1f58c3bea isl: add ability to store buffer size in unused RENDER_SURFACE_STATE fields
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23318>
2023-07-26 09:41:23 +00:00
Lionel Landwerlin
d099e47de0 intel/fs: add more UNDEFs around SEND messages
lower_find_live_channel() in particular is used a lot in control flow
to find the live channel for the surface/sampler handle. Adding UNDEFs
on the temporary registers used for finding the live channels helps
reduce the liveness of those temporary registers, especially in loops.

Some titles affected :

Rise Of The Tomb Raider:
Totals from 2780 (22.58% of 12311) affected shaders:
Instrs: 1294455 -> 1294592 (+0.01%); split: -0.15%, +0.16%
Cycles: 1473136441 -> 1471302617 (-0.12%); split: -1.52%, +1.40%
Max live registers: 144282 -> 143595 (-0.48%)
Max dispatch width: 22200 -> 22232 (+0.14%)

Red Dead Redemption 2:
Totals from 435 (7.28% of 5972) affected shaders:
Instrs: 488472 -> 487594 (-0.18%); split: -0.31%, +0.14%
Cycles: 11354732 -> 11384928 (+0.27%); split: -0.44%, +0.71%
Spill count: 1217 -> 1172 (-3.70%)
Fill count: 3521 -> 3447 (-2.10%)
Scratch Memory Size: 64512 -> 62464 (-3.17%)
Max live registers: 35997 -> 35798 (-0.55%)

Fallout 4:
Totals from 8 (0.49% of 1638) affected shaders:
Instrs: 41908 -> 40509 (-3.34%)
Cycles: 3638464 -> 3555680 (-2.28%); split: -2.67%, +0.39%
Spill count: 717 -> 665 (-7.25%)
Fill count: 2542 -> 2438 (-4.09%)
Scratch Memory Size: 32768 -> 16384 (-50.00%)
Max live registers: 567 -> 534 (-5.82%)

Cyberpunk 2077:
Totals from 2984 (28.97% of 10301) affected shaders:
Instrs: 3888874 -> 3891600 (+0.07%); split: -0.20%, +0.27%
Cycles: 67906489 -> 67767721 (-0.20%); split: -0.68%, +0.47%
Spill count: 200 -> 98 (-51.00%)
Fill count: 237 -> 90 (-62.03%)
Scratch Memory Size: 10240 -> 8192 (-20.00%)
Max live registers: 215715 -> 212727 (-1.39%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24282>
2023-07-26 08:48:33 +00:00
Lionel Landwerlin
5c72724819 intel/fs: consider UNDEF as non-partial write
A few titles show max live register reductions, but nothing
significant in instruction count or other stats.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24282>
2023-07-26 08:48:32 +00:00
Lionel Landwerlin
d62e494b37 intel/vec4: fix log_data pointer
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3384f029be ("intel/compiler: rework input parameters")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9421
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24307>
2023-07-26 06:36:18 +00:00
Iván Briano
377c2a045f intel/compiler: call brw_nir_adjust_payload from brw_postprocess_nir
Calling anything after nir_trivialize_registers() risks undoing some of
its work.
In this case, brw_nir_adjust_payload() will do a constant folding pass
if any payload adjusting happened, and that can turn a bunch of
@store_regs into basically noops.

Fixes dEQP-VK.subgroups.*task

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24325>
2023-07-25 22:48:09 +00:00
Ian Romanick
cb0de0a1d3 intel/fs: Constant fold OR and AND
The path taken in fs_visitor::swizzle_nir_scratch_addr for DG2 generates
some AND and OR instructions before the SHL. This commit folds those so
the whold calculation becomes a constant (like on older platforms).

v2: Fix return type of src_as_uint. Noticed by Marcin.

shader-db results:

DG2
total instructions in shared programs: 23190475 -> 23179540 (-0.05%)
instructions in affected programs: 36026 -> 25091 (-30.35%)
helped: 7 / HURT: 0

total cycles in shared programs: 841196807 -> 841142563 (<.01%)
cycles in affected programs: 1660670 -> 1606426 (-3.27%)
helped: 7 / HURT: 0

No shader-db changes on any older Intel platforms.

fossil-db results:

DG2
Totals:
Instrs: 197780372 -> 197773966 (-0.00%)
Cycles: 14066410782 -> 14066399378 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 8438104 -> 8438112 (+0.00%)
Send messages: 8049445 -> 8049446 (+0.00%)
Scratch Memory Size: 14263296 -> 14264320 (+0.01%)

Totals from 9 (0.00% of 668055) affected shaders:
Instrs: 24547 -> 18141 (-26.10%)
Cycles: 1984791 -> 1973387 (-0.57%); split: -0.98%, +0.40%
Subgroup size: 88 -> 96 (+9.09%)
Send messages: 867 -> 868 (+0.12%)
Scratch Memory Size: 69632 -> 70656 (+1.47%)

No fossil-db changes on any older Intel platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00
Ian Romanick
61c786bad5 intel/fs: Constant fold SHL
This is a modified version of a commit originally in !7698. This version
add the changes to brw_fs_copy_propagation. If the address passed to
fs_visitor::swizzle_nir_scratch_addr is a constant, that function will
generate SHL with two constant sources.

DG2 uses a different path to generate those addresses, so the constant
folding can't occur there yet. That will be addressed in the next
commit.

What follows is the commit change history from that older MR.

v2: Previously this commit was after `intel/fs: Combine constants for
integer instructions too`.  However, this commit can create invalid
instructions that are only cleaned up by `intel/fs: Combine constants
for integer instructions too`.  That would potentially affect the
shader-db results of each commit, but I did not collect new data for
the reordering.

v3: Fix masking for W/UW and for Q/UQ types. Add an assertion for
!saturate. Both suggested by Ken. Also add an assertion that B/UB types
don't matically come back.

v4: Fix sources count. See also ed3c2f73db ("intel/fs: fixup sources
number from opt_algebraic").

v5: Fix typo in comment added in v3. Noticed by Marcin. Fix a typo in a
comment added when pulling this commit out of !7698. Noticed by Ken.

shader-db results:

DG2
No changes.

Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
total instructions in shared programs: 20655696 -> 20651648 (-0.02%)
instructions in affected programs: 23125 -> 19077 (-17.50%)
helped: 7 / HURT: 0

total cycles in shared programs: 858436639 -> 858407749 (<.01%)
cycles in affected programs: 8990532 -> 8961642 (-0.32%)
helped: 7 / HURT: 0

Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 18500780 -> 18496630 (-0.02%)
instructions in affected programs: 24715 -> 20565 (-16.79%)
helped: 7 / HURT: 0

total cycles in shared programs: 946100660 -> 946087688 (<.01%)
cycles in affected programs: 5838252 -> 5825280 (-0.22%)
helped: 7 / HURT: 0

total spills in shared programs: 17588 -> 17572 (-0.09%)
spills in affected programs: 1206 -> 1190 (-1.33%)
helped: 2 / HURT: 0

total fills in shared programs: 25192 -> 25156 (-0.14%)
fills in affected programs: 156 -> 120 (-23.08%)
helped: 2 / HURT: 0

No shader-db changes on any older Intel platforms.

fossil-db results:

DG2
Totals:
Instrs: 197780415 -> 197780372 (-0.00%); split: -0.00%, +0.00%
Cycles: 14066412266 -> 14066410782 (-0.00%); split: -0.00%, +0.00%

Totals from 16 (0.00% of 668055) affected shaders:
Instrs: 16420 -> 16377 (-0.26%); split: -0.43%, +0.17%
Cycles: 220133 -> 218649 (-0.67%); split: -0.69%, +0.01%

Tiger Lake, Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 153425977 -> 153423678 (-0.00%)
Cycles: 14747928947 -> 14747929547 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 8535968 -> 8535976 (+0.00%)
Send messages: 7697606 -> 7697607 (+0.00%)
Scratch Memory Size: 4380672 -> 4381696 (+0.02%)

Totals from 6 (0.00% of 662749) affected shaders:
Instrs: 13893 -> 11594 (-16.55%)
Cycles: 5386074 -> 5386674 (+0.01%); split: -0.42%, +0.43%
Subgroup size: 80 -> 88 (+10.00%)
Send messages: 675 -> 676 (+0.15%)
Scratch Memory Size: 91136 -> 92160 (+1.12%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00
Ian Romanick
56e6186dcf intel/fs: Always do opt_algebraic after opt_copy_propagation makes progress
opt_copy_propagation can create invalid instructions like

    shl(8) vgrf96:UD, 2d, 8u

These instructions will be cleaned up by opt_algebraic.  The irony is
opt_algebraic converts these to simple mov instructions that
opt_copy_propagation should clean up.  I don't think we want a loop like

   do {
      progress = false;
      if (OPT(opt_copy_propagation)) {
         OPT(opt_algebraic);
         OPT(dead_code_eliminate);
      }
   } while (progress);

But maybe we do?

Maybe this would be sufficient:

   while (OPT(opt_copy_propagation))
      OPT(opt_algebraic);
   OPT(dead_code_eliminate);

No shader-db or fossil-db changes (yet) on any Intel platform.  This is
expected.

v2: Do opt_algebraic immediately after every call to
opt_copy_propagation instead of being clever. Suggested by Lionel.

Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00
José Roberto de Souza
f59d272e93 anv: Request Xe KMD to place BOs to CPU visible VRAM when required
This is required to support discrete GPUs placed in systems with large
PCI bar or resizeble PCI bar not available or disabled.

Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23781>
2023-07-25 19:33:16 +00:00
José Roberto de Souza
f9fcd7168a intel/dev/xe: Add support for small-bar setups
This adds support for discrete GPUs placed in systems with large PCI
bar or resizeble PCI bar not available or disabled.

Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23781>
2023-07-25 19:33:15 +00:00
Faith Ekstrand
94f36cfaa3 intel/fs: Assume NIR is in SSA form
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24310>
2023-07-25 16:25:11 +00:00
Faith Ekstrand
965bbe5286 intel/fs: Rework the overlapping mov/vec case
Now that we're using load/store_reg intrinsics, the previous checks for
registers aren't what we want.  Instead, we need to be looking for a mov
or vec where both the destination and a source are load/store_reg with
matching decl_reg.

Fixes: b8209d69ff ("intel/fs: Add support for new-style registers")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24310>
2023-07-25 16:25:11 +00:00
Faith Ekstrand
45ee952efb intel/fs: Use write masks from store_reg intrinsics
Fixes: b8209d69ff ("intel/fs: Add support for new-style registers")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24310>
2023-07-25 16:25:10 +00:00
Marcin Ślusarz
4f1125e4ae intel/compiler/test: fix crashes when TEST_DEBUG is set
Dumping instructions requires that ISA info is not empty.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24274>
2023-07-25 15:13:29 +00:00
Illia Polishchuk
c2724b4d37 s/Intel: fix/anv: fix: potentially overflowing expression in genX
CID 1528164 (#1 of 1): Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression
pool->n_passes * pool->khr_perf_preamble_stride with type
unsigned int (32 bits, unsigned) is evaluated using 32-bit arithmetic,
and then used in a context that expects an expression of type uint64_t (64 bits, unsigned).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Illia Polishchuk <illia.a.polishchuk@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20893>
2023-07-25 08:55:56 +00:00
Jianxun Zhang
75452f611e intel/common: Only set op mask on instructions in decoder
When a default value of a struct's field, which is in the
higher half of the first dword, is specified in a gen xml
file, setting op mask makes decoder treat the field as a
header (intel_field_is_header()). As a result, it won't
output the field in batch dump. This is not a common case
but can happen once a gen xml file includes such fields.

The op mask is only meaningful to instructions, so we fix
the above issue by not setting op mask of structs (also
registers).

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24268>
2023-07-24 22:56:59 +00:00
Nanley Chery
1d12b29b3f intel/blorp: Ambiguate after CCS resolves on gfx7-8
ISL's state-machine of CCS_D describes full resolves as leaving the aux
buffer in the pass-through state. Hardware doesn't behave this way on
gfx8 however. On that platform, full resolves transition the aux buffer
to the resolved state. This was verified by dumping the CCS before and
after a full resolve on BDW (gfx7 is simply assumed to behave the same).
Ambiguate after resolving to match driver expectations.

Prevents iris from failing piglit's fcc-write-after-clear on BDW with a
future patch which relies on fast-clear encodings being removed after a
resolve. The avoided failure is:

   Testing implicit read of partial block UNORM -> SNORM
   Probe color at (0,1,0)
     Expected:  1.000000 1.000000 1.000000 1.000000
     Observed:  0.000000 0.000000 0.000000 0.000000

Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>
2023-07-24 22:29:01 +00:00
Lionel Landwerlin
8cbf730145 intel/fs: don't try to rebuild sequences of non ssa values
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 04777171e0 ("intel/fs: try to rematerialize surface computation code")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9378
Reviewed-by: Illia Polishchuk <illia.a.polishchuk@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24228>
2023-07-24 20:04:24 +00:00
Emma Anholt
61ec26db26 ci/tgl: Improve the info for ANGLE's MSAA regression on TGL.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24200>
2023-07-24 16:07:28 +00:00
Faith Ekstrand
079e8a9674 anv,hasvk,iris: sampler_prog_key::swizzles is only used on crocus
The field is no longer consumed by brw_complie_* and is instead handled
directly by the crocus driver.  Therefore, it's safe to leave it zero
and not even bother setting it.  This removes our reliance on the
SWIZZLE_* macros in prog_instructions.h.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24288>
2023-07-24 15:40:40 +00:00
Marcin Ślusarz
48885c7fe3 intel/compiler: load debug mesh compaction options once
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
2023-07-24 07:55:29 +00:00
Marcin Ślusarz
c1685f08dd intel/compiler,anv: put some vertex and primitive data in headers
Both per-primitive and per-vertex space is allocated in MUE in 8 dword
chunks and those 8-dword chunks (granularity of
3DSTATE_SBE_MESH.Per[Primitive|Vertex]URBEntryOutputReadLength)
are passed to fragment shaders as inputs (either non-interpolated
for per-primitive and flat vertex attributes or interpolated
for non-flat vertex attributes).

Some attributes have a special meaning and must be placed in separate
8/16-dword slot called Primitive Header or Vertex Header.

Primitive Header contains 4 such attributes (Cull Primitive,
ViewportIndex, RTAIndex, CPS), leaving 4 dwords (the rest of 8-dword
slot) potentially unused.

Vertex Header is similar - it starts with 3 unused dwords, 1 dword for
Point Size (but if we declare that shader doesn't produce Point Size
then we can reuse it), followed by 4 dwords for Position and optionally
8 dwords for clip distances.

This means we have an interesting optimization problem - we can put
some user attributes into holes in Primitive and Vertex Headers, which
may lead to smaller MUE size and potentially more mesh threads running
in parallel, but we have to be careful to use those holes only when
we need it, otherwise we could force HW to pass too much data to
fragment shader.

Example 1:
Let's assume that Primitive Header is enabled and user defined
12 dwords of per-primitive attributes.

Without packing we would consume 8 + ALIGN(12, 8) = 24 dwords of
MUE space and pass ALIGN(12, 8) = 16 dwords to fragment shader.

With packing, we'll consume 4 + 4 + ALIGN(12 - 4, 8) = 16 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(12 - 4, 8) = 16 dwords to
fragment shader.

16/16 is better than 24/16, so packing makes sense.

Example 2:
Now let's assume that Primitive Header is enabled and user defined
16 dwords of per-primitive attributes.

Without packing we would consume 8 + ALIGN(16, 8) = 24 dwords of
MUE space and pass ALIGN(16, 16) = 16 dwords to fragment shader.

With packing, we'll consume 4 + 4 + ALIGN(16 - 4, 8) = 24 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(16 - 4, 8) = 24 dwords to
fragment shader.

24/24 is worse than 24/16, so packing doesn't make sense.

This change doesn't affect vk_meshlet_cadscene in default configuration,
but it speeds it up by up to 25% with "-extraattributes N", where
N is some small value divisible by 2 (by default N == 1) and we
are bound by URB size.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
2023-07-24 07:55:29 +00:00
Marcin Ślusarz
a252123363 intel/compiler/mesh: compactify MUE layout
Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.

There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
  because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
   float[N] requires N 1-dword slots, but
   i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
  and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
  if possible is beneficial

This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.

Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
2023-07-24 07:55:29 +00:00
Zhang Ning
06db9bd3f6 Revert "intel/ci: disable iris-jsl-deqp because it always fails for an AMD MR"
This reverts commit da4b5b4a47.

Signed-off-by: Zhang Ning <zhangn1985@outlook.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23815>
2023-07-24 03:02:14 +00:00
Alyssa Rosenzweig
1466014184 nir: Rename lower_locals_to_reg_intrinsics back
The short name is freed up.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24253>
2023-07-21 11:25:49 +00:00
Alyssa Rosenzweig
a08286f993 intel/fs: Don't read reg.base_offset
It's not set in the new intrinsics path.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24253>
2023-07-21 11:25:48 +00:00
Rohan Garg
01965a2fe9 anv: drop CFE state validation checks
anv no longer needs to track if the CFE state is valid since we ensure
that the state is valid at pipeline creation time.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23934>
2023-07-21 10:46:08 +00:00
Rohan Garg
e7e7042093 anv,iris: program the maximum number of threads on compute queue init
Fixes: 90a39cac87 ("intel/blorp: Emit compute program based on BLORP_BATCH_USE_COMPUTE")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23934>
2023-07-21 10:46:08 +00:00
Marcin Ślusarz
06046a02f8 anv: merge cases leading to the same code
Added in: 688968e888 ("anv: add support for direct descriptor in allocation/writes")

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24260>
2023-07-21 07:22:22 +00:00
Marcin Ślusarz
0eb2679cdb anv: drop unused function
Added in: 02cecffe2b ("anv: add a pass to partially lower resource_intel")

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24260>
2023-07-21 07:22:22 +00:00
Marcin Ślusarz
3c83ac8002 intel/compiler: remove redundant code
has_lsc is checked few lines above, so this code doesn't matter.

Added in: a358b97c58 ("intel/fs: optimize uniform SSBO & shared loads")

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24260>
2023-07-21 07:22:22 +00:00
Hyunjun Ko
e3ecba3266 anv: use ycbcr_info for P010 format
Since !24096 landed, we can just use ycbcr_info to get information
of an image of the P010 format.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24265>
2023-07-21 06:15:30 +00:00
Nanley Chery
99ffa4043e intel/isl: Add a score for DG2_RC_CCS
This enables the DG2 render compression modifier in anv. When I tested
this against vkcube, I observed that the full resolve which happened at
the end of every frame was converted to a partial resolve, allowing the
framebuffer to retain compression.

According to Caleb Callaway's testing, enabling this modifier positively
impacts the FPS of the following game benchmarks:

 - Strange Brigade.vk-g6              +12.78%
 - Strange Brigade.dx12vk-g6          + 9.33%
 - Shadow of the Tomb Raider.vk-g6-lx + 2.37%
 - Dota 2 (replay Jul 2020).vk-g6     + 2.28%

Thanks to Felix Degrood for pointing out that Strange Brigade would
benefit from this optimization.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
2023-07-20 20:53:27 +00:00
Nanley Chery
15dec30877 intel/isl: Move the Tile4 modifier score case down
Group modifiers by platform first, then the score. I find it easier to
read this way.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
2023-07-20 20:53:27 +00:00
Nanley Chery
d9bdffa708 intel: Describe modifier compression with booleans
Replace the aux_usage field with two booleans: one for render
compression and one for media compression.

This more accurately describes how CCS_E is used on gfx12. On those
platforms, the FCV feature may be enabled or disabled, but ISL's
modifier table has been using the FCV aux-usage for every gfx12 render
compression modifier. Instead, set the newly-added render compression
boolean to true.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
2023-07-20 20:53:27 +00:00
Nanley Chery
f5f61c5bb7 hasvk: Delete modifier with aux code
Modifiers with compression are not supported.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
2023-07-20 20:53:26 +00:00
Nanley Chery
569f80f2df anv: Reduce accesses of isl_mod_info->aux_usage
This field will be replaced in an upcoming patch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
2023-07-20 20:53:26 +00:00
Nanley Chery
f2dab434d8 anv: Handle explicit surface layout of DG2_RC_CCS
We're going to enable the DG2 modifier. Account for the reduced plane
count that exists with it.

Also add an assert to make it clearer that the aux in use is CCS.
Otherwise, it may not be obvious because of the generic compression
names being used here.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
2023-07-20 20:53:26 +00:00
Nanley Chery
47565d31e1 intel: Add and use isl_drm_modifier_get_plane_count
We're going to enable the DG2_RC_CCS modifier in anv. Add and use this
function to prepare for the new plane count that comes with that
modifier.

iris is left alone for now because it supports more modifiers than
isl_drm_modifier_get_score is aware of.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
2023-07-20 20:53:26 +00:00
Nanley Chery
e50af52e3d anv: Don't support ASTC images with modifiers
Before this change, anv_get_image_format_features2 reported support for
ASTC formats with any modifier (even those not supported by anv). But,
we didn't intend to support that compressed image format with modifiers.

With this change, the format feature function reports no support for
modifiers on ASTC-formatted images.

This prevents the next patch from causing assertion failures due to
unsupported modifiers.

Fixes: 355f318843 ("anv: Allow transfer-only linear ASTC images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
2023-07-20 20:53:26 +00:00
Rohan Garg
ba071ee81c anv: use the correct GFX_VERx10 macro for WA
Fixes: 60b0d2c2cb ("add required invalidate/flush for Wa_14014427904")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23937>
2023-07-20 20:25:12 +00:00
Rohan Garg
097f3b4a98 anv: use the WA infrastructure where possible when generating state
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23937>
2023-07-20 20:25:12 +00:00
Felix DeGrood
d04be9770b intel/compiler: use shader source hash in shader dump code
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Felix DeGrood
6ac8a9a030 intel: use shader source hash in INTEL_MEASURE
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Felix DeGrood
124973c635 anv: Add Source hash field to VkPipelineExecutableStatisticKHR
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Felix DeGrood
b145d05381 anv: save a shader source uint32_t hash in gfx/compute pipelines
Save lowest dword of shader source sha1 in pipeline object for use
later as hash for uniquely identifying shader in debug outputs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Lionel Landwerlin
3384f029be intel/compiler: rework input parameters
Use a struct for various common parameters rather than per stage
structure or arguments to stage specific entrypoints.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00