Commit graph

289 commits

Author SHA1 Message Date
Caio Oliveira
0b73d163d4 intel/brw: Remove Gfx8- passes from optimize()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26887>
2024-02-26 20:54:24 +00:00
Kenneth Graunke
c12300844d intel/fs: Don't rely on CSE for VARYING_PULL_CONSTANT_LOAD
In the past, we didn't have a good solution for combining scalar loads
with a variable index plus a constant offset.  To handle that, we took
our load offset and rounded it down to the nearest vec4, loaded an
entire vec4, and trusted in the backend CSE pass to detect loads from
the same address and remove redundant ones.

These days, nir_opt_load_store_vectorize() does a good job of taking
those scalar loads and combining them into vector loads for us, so we
no longer need to do this trick.  In fact, it can be better not to:
our offset need only be 4 byte (scalar) aligned, but we were making it
16 byte (vec4) aligned.  So if you wanted to load an unaligned vec2,
we might actually load two vec4's (___X | Y___) instead of doing a
single load at the starting offset.

This should also reduce the work the backend CSE pass has to do,
since we just emit a single VARYING_PULL_CONSTANT_LOAD instead of 4.

shader-db results on Alchemist:
- No changes in SEND count or spills/fills
- Instructions: helped 95, hurt 100, +/- 1-3 instructions
- Cycles: helped 3411 hurt 1868, -0.01% (-0.28% in affected)
- SIMD32: gained 5, lost 3

fossil-db results on Alchemist:
- Instrs: 161381427 -> 161384130 (+0.00%); split: -0.00%, +0.00%
- Cycles: 14258305873 -> 14145884365 (-0.79%); split: -0.95%, +0.16%
- SIMD32: Gained 42, lost 26

- Totals from 56285 (8.63% of 652236) affected shaders:
- Instrs: 13318308 -> 13321011 (+0.02%); split: -0.01%, +0.03%
- Cycles: 7464985282 -> 7352563774 (-1.51%); split: -1.82%, +0.31%

From this we can see that we aren't doing more loads than before
and the change is pretty inconsequential, but it requires less
optimizing to produce similar results.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27568>
2024-02-20 23:16:27 -08:00
Caio Oliveira
26dd1f0bba intel/compiler: Rename BRW_WM_MSAA_* enums to INTEL_MSAA_*
And move to the intel_shader_enums.h file.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27475>
2024-02-14 22:31:23 -08:00
Kenneth Graunke
2e38024fd8 intel: Use hardware generated compute shader local invocation IDs
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27167>
2024-01-25 08:43:04 +00:00
Sagar Ghuge
6fcec87090 intel/fs: Track instance id in gs_thread_payload
This change moves the instance id gs_thread_payload constructor and
lowering code will simply use that.

Also, this change takes the Xe2 register width in consideration that
fixes a couple of tests involving geometry shaders with gl_InvocationID
on Xe2.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26960>
2024-01-22 22:15:38 +00:00
Ian Romanick
3756f60558 intel/fs: DPAS lowering
Implements integer dot product lowering both with and without
DP4A. Implements half-float dot product lowering.

There are a couple FINISHME comments describing future optimizations.

v2: Add a brw_compiler::lower_dpas flag to track when the lowering
should be applied.

v3: Use is_null() instead of checking file != ARF. Suggested by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
2023-12-29 20:27:15 -08:00
Francisco Jerez
49a867f67e intel/fs: Add support for vector payload values to fetch_payload_reg().
This extends fetch_payload_reg() to support fetching vector registers
like barycentrics stored on the payload as a contiguous sequence of
SIMD-wide vectors.  In the SIMD32 case, both halves of the SIMD16
vector registers specified as regs[0] and regs[1] are zipped to
construct a single SIMD32-wide vector.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
2023-12-28 11:07:03 -08:00
Francisco Jerez
4672fcbc76 intel/fs: Fix PS thread payload setup for depth_w_coef_reg.
It's not replicated per SIMD16 half of a SIMD32 thread on the PS
payload.  Make fs_visitor::payload::depth_w_coef_reg a scalar rather
than an array.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22 18:05:31 +00:00
Francisco Jerez
83a0252e8d intel/fs: Pass builder to per_primitive_reg().
Matches prototype of interp_reg(), will be useful in a subsequent commit.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22 18:05:30 +00:00
Francisco Jerez
8e9f09dbe5 intel/fs: Provide component index explicitly to interp_reg().
Main motivation is that for multipolygon PS shaders the i-th plane
parameter for the j-th input attribute will no longer necessarily be a
scalar, since different channels may be processing different polygons
with different input plane parameters, so simply taking a component()
of the result of interp_reg() will no longer work.  Instead of
duplicating the multipolygon handling logic in every caller of
interp_reg(), fold the component() call into interp_reg() so we can
replace it with multipolygon-correct code more easily.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22 18:05:30 +00:00
Francisco Jerez
e4aca2ebaa intel/fs: Add separate constructor of fs_visitor for fragment shaders.
To allow specifying the number of polygons that will be processed per
SIMD thread.

Rework:
 * Jordan: Add needs_register_pressure following
   09cdb77a92 ("intel/fs: report max register pressure in shader stats")

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22 18:05:30 +00:00
Francisco Jerez
1eff2fcb62 intel/compiler: Add polygon count statistic to brw_compile_stats.
And use it in ANV in order to return a "SIMDNxM" name from
vkGetPipelineExecutablePropertiesKHR.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
2023-12-22 18:05:30 +00:00
Kenneth Graunke
49b8ccbcdc intel/fs: Drop opt_register_renaming()
In the past, multiple writes to a single register were pretty common,
but since we've transitioned to NIR, and leave the IR in SSA form for
everything not captured in a phi-web, the pattern of generating new
temporary registers at each step is a lot more common.

This pass isn't nearly as useful now.  Across fossil-db on Alchemist,
this affects only 0.55% of shaders, which fall into two cases:

- Coarse pixel shading pixel-X/Y setup.  There are a few cases where
  we write a partial calculation into a register, then have a second
  instruction read that as a source and overwrite it as a destination.
  While we could use a temporary here, it doesn't actually help with
  register pressure at all, since there's the same amount of values
  live at both instructions regardless.  So while this pass kicks in,
  it doesn't do anything useful.

- Geometry shader control data bits (5 shaders total).  We track masks
  for handling EndPrimitive in a single register across the program,
  and apparently in some cases can split the live range.  However, it's
  a single register...only in geometry shaders...which use EndPrimitive.
  None of them appear to be in danger of spilling, either.  So this tiny
  benefit doesn't seem to justify the cost of running the pass.

So, just throw it out.  It's not worth keeping.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26343>
2023-12-19 11:07:18 +00:00
Caio Oliveira
a8b2426419 intel/compiler: Use reference instead of pointer for fs_visitor
Per Ian suggestion.  Also clear up a few unnecessary casts around the code and
use `s` for fs_visitor ("shader").  Note to include a reference in ntf we need
to set it during initialization, so create an explicit mem_ctx for it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:14 +00:00
Caio Oliveira
4e5fcccd01 intel/compiler: Create and use nir_to_brw() function
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:14 +00:00
Caio Oliveira
cf730adc58 intel/compiler: Make fs_builder include fs_visitor and not the other way
This will allow fs_builder have a reference to an fs_visitor (a
"fs_shader" really), instead of a reference to a backend_shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:14 +00:00
Caio Oliveira
f5032c4d52 intel/compiler: Make fs_visitor not depend on fs_builder
At this point this is more a header dependency due to inline functions,
so shuffle them around.  The end goal is to allow fs_builder have a
reference to a fs_visitor (really a fs_shader).

Note the header is still included, a later patch will move the includes
to the call-sites.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:14 +00:00
Caio Oliveira
4f991dec00 intel/compiler: Remove fs_visitor::bld
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:14 +00:00
Caio Oliveira
79735fa783 intel/compiler: Move remaining NIR conversion fields to nir_to_brw_state
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:13 +00:00
Caio Oliveira
5cb189636d intel/compiler: Move nir_ssa_value into a local structure
Create a nir_to_brw_state struct that is valid only during the
NIR to backend translation and use it for nir_ssa_values array.

This removes some NIR specific handling out of the fs_visitor -- nowadays
effectively an fs_shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:13 +00:00
Caio Oliveira
afe75d65be intel/compiler: Make NIR resources helpers static
Remove get_nir_src_block() since it is not used anywhere.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:13 +00:00
Caio Oliveira
a7a27ee95e intel/compiler: Make NIR atomic conversion functions static
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:13 +00:00
Caio Oliveira
5777943381 intel/compiler: Make non-intrinsic NIR conversion functions static
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:13 +00:00
Caio Oliveira
2385d6087a intel/compiler: Make setup functions of NIR emission static
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:13 +00:00
Caio Oliveira
3899e6b1d8 intel/compiler: Make functions for NIR control flow conversion static
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:13 +00:00
Caio Oliveira
860ec33f9a intel/compiler: Make more functions in NIR conversion static
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:13 +00:00
Caio Oliveira
acca9dbf6b intel/compiler: Make a NIR intrinsic emission functions static
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
2023-12-12 19:36:13 +00:00
Caio Oliveira
5de5a0d475 intel/compiler: Don't use fs_visitor::bld in thread payload classes
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26301>
2023-11-28 19:53:51 +00:00
Caio Oliveira
a9f95bf687 intel/compiler: Reuse same scheduler for all pre-RA scheduling modes
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
2023-11-13 23:05:47 +00:00
Kenneth Graunke
b35f1fc910 intel/compiler: Delete unused emit_dummy_fs()
This code is compiled out, but has been left in place in case we wanted
to use it for debugging something.  In the olden days, we'd use it for
platform enabling.  I can't think of the last time we did that, though.

I also used to use it for debugging.  If something was misrendering, I'd
iterate through shaders 0..N, replacing them with "draw hot pink" until
whatever shader was drawing the bad stuff was brightly illuminated.
Once it was identified, I'd start investigating that shader.

These days, we have frameretrace and renderdoc which are like, actual
tools that let you highlight draws and replace shaders.  So we don't
need to resort iterative driver hacks anymore.  Again, I can't think of
the last time I actually did that.

So, this code is basically just dead.  And it's using legacy MRF paths,
which we could update...or we could just delete it.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
2023-10-30 23:03:23 +00:00
Caio Oliveira
b91ed68fa0 intel/compiler: Don't emit calls to validate() in release build
While the fs_visitor::validate() implementation is empty in release
build, we still emit calls to it since it is defined in a separate
compilation unit than its callers.  To fix this, just expose an inline
empty function in the header for the release mode.

Fossil run time differences in TGL laptop (difference at 95.0% confidence):

```
Rise of The Tomb Rider (Native) [n=7]
        -0.482857 +/- 0.010932
        -1.60608% +/- 0.0363621%

Cyberpunk 2077 (DXVK) [n=7]
        -0.987143 +/- 0.0904516
        -0.82996% +/- 0.076049%

Batman Arkham City (DXVK) [n=7]
        -7.74857 +/- 0.329561
        -1.46298% +/- 0.0622231%
```

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25847>
2023-10-24 21:10:35 +00:00
Caio Oliveira
8944ac7d6c intel/fs/xe2+: Update BS payload setup for Xe2 reg size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
14e1b9ee69 intel/fs/xe2+: Update TES payload setup for Xe2 reg size.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Ian Romanick
0b23df3951 intel/compiler/xe2: Update fs_visitor::setup_vs_payload to account for Xe2 reg size
[ Francisco Jerez: Simplify. ]

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
2b7419d090 intel/fs: Fix signedness of payload_node_count argument of calculate_payload_ranges().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Ian Romanick
c262752d74 intel/fs: Make opt_copy_propagation_local file private
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
b5b2338c5c intel/fs: Make try_constant_propagate and try_copy_propagate file private
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:22 +00:00
Kenneth Graunke
d7daf78f62 intel/compiler: Respect NIR_DEBUG_PRINT_INTERNAL for DEBUG_OPTIMIZER
If the NIR_DEBUG_PRINT_INTERNAL flag is not set, don't print debugging
information for internal shaders in INTEL_DEBUG=optimizer dumps.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24684>
2023-08-17 18:19:53 +00:00
Faith Ekstrand
ce8b157b94 intel/fs: Stop passing around nir_dest and nir_alu_dest
We want to get rid of nir_dest so back-ends need to stop storing it
in structs and passing it through helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig
09d31922de nir: Drop "SSA" from NIR language
Everything is SSA now.

   sed -e 's/nir_ssa_def/nir_def/g' \
       -e 's/nir_ssa_undef/nir_undef/g' \
       -e 's/nir_ssa_scalar/nir_scalar/g' \
       -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \
       -e 's/nir_gather_ssa_types/nir_gather_types/g' \
       -i $(git grep -l nir | grep -v relnotes)

   git mv src/compiler/nir/nir_gather_ssa_types.c \
          src/compiler/nir/nir_gather_types.c

   ninja -C build/ clang-format
   cd src/compiler/nir && find *.c *.h -type f -exec clang-format -i \{} \;

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>
2023-08-12 16:44:41 -04:00
Lionel Landwerlin
0e244d56e3 intel/fs: track more steps with INTEL_DEBUG=optimizer
One particular nice thing to have is the first generated backend IR
before validation. Especially if you made a mistake in the NIR
translation, you can at least look at it before validation tells you
off.

Then the last 2 steps of the optimize() function can be interesting to
look at.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24552>
2023-08-10 06:39:57 +00:00
Faith Ekstrand
45ee952efb intel/fs: Use write masks from store_reg intrinsics
Fixes: b8209d69ff ("intel/fs: Add support for new-style registers")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24310>
2023-07-25 16:25:10 +00:00
Marcin Ślusarz
a252123363 intel/compiler/mesh: compactify MUE layout
Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.

There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
  because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
   float[N] requires N 1-dword slots, but
   i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
  and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
  if possible is beneficial

This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.

Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
2023-07-24 07:55:29 +00:00
Lionel Landwerlin
3384f029be intel/compiler: rework input parameters
Use a struct for various common parameters rather than per stage
structure or arguments to stage specific entrypoints.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Faith Ekstrand
39b5bb0809 intel/fs: Drop support for nir_register
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24104>
2023-07-19 02:11:57 +00:00
Lionel Landwerlin
0cd9f0c3d3 intel/fs: fix bindless/shared surface mistake
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 068bf1378d ("intel/fs: enable SSBO accesses through the bindless heap")
Tested-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23536>
2023-06-14 07:42:57 +00:00
Caio Oliveira
26f6ea5c30 intel/compiler: Remove unused functions and declarations
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23539>
2023-06-09 20:09:51 +00:00
Caio Oliveira
2bb26cc01d intel/compiler: Refactor dump_instruction(s)
Delete unnecessary virtual functions, we need just two.  Refactor code
so the 'default behavior' logic (stderr and/or creating file) is not
duplicated.

Rename the virtuals so overrides don't hide the common convenience
functions.  Finally, provide a variant of dump_instructions() with
a `FILE *` parameter.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23457>
2023-06-08 22:00:21 +00:00
Lionel Landwerlin
04777171e0 intel/fs: try to rematerialize surface computation code
This helps a lot with accessing surface handles in control flow. Our
resource_intel intrinsic has a non_uniform flag, in which case we
cannot apply this optimization. But in uniform cases, this is just a
massive win. We drop all kind of pipeline stalls due to
find_live_channel. We also reduce register pressure by doing the
surface handle computation in a single GRF (instead of 2 or 4).

There are some regressions in max dispatch width but those I think are
only on SIMD32 and due to the current heuristic disabling it after
throughput comparison with SIMD16. We know this heuristic is not
perfect, it should probably be updated in another change.

Here are some stats (all titles seem to have similar gains) :

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 red_dead_redemption2 5860     -36.80%    -5.67%      +0.77%        +0.06%      -81.26%     -79.16%        -70.62%             -8.63%             -6.93%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected         4716     -37.29%    -5.67%      +0.95%        +0.07%      -81.26%     -79.16%        -70.62%             -9.15%             -8.47%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total                5860     -36.80%    -5.67%      +0.77%        +0.06%      -81.26%     -79.16%        -70.62%             -8.63%             -6.93%

 PERCENTAGE DELTAS          Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 rise_of_the_tomb_raider_g2 12010    -37.19%   -22.12%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.62%             -4.96%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected               11732    -37.27%   -22.14%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.67%             -5.11%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total                      12010    -37.19%   -22.12%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.62%             -4.96%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 total_war_warhammer2 462      -27.45%   -12.42%    -82.35%     -88.46%        -66.67%             -5.52%             -5.62%
 -----------------------------------------------------------------------------------------------------------------------------------
 All affected         335      -28.31%   -12.77%    -82.35%     -88.46%        -66.67%             -6.25%             -7.24%
 -----------------------------------------------------------------------------------------------------------------------------------
 Total                462      -27.45%   -12.42%    -82.35%     -88.46%        -66.67%             -5.52%             -5.62%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 witcher_3_dxvk_g2 1049     -36.94%   -57.82%      +0.06%        +0.01%      -98.52%     -97.29%        -98.10%             -7.81%             -1.00%
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected      693      -41.93%   -58.45%      +0.09%        +0.01%      -98.52%     -97.29%        -98.10%             -10.25%            -1.33%
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total             1049     -36.94%   -57.82%      +0.06%        +0.01%      -98.52%     -97.29%        -98.10%             -7.81%             -1.00%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
ad9bc1ffb5 intel/fs: enable UBO accesses through bindless heap
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00