Note GFX9_DATAPORT_DC_PORT1_A64_SCATTERED_READ is marked as Gfx9 but
it is in the bspec and the PRM does mention it (although not in the
list), so keep it around since we've been using it for a while now.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27629>
On Gfx4, we had to emulate SIMD8 texturing with SIMD16 for some message
types. This ceased to be a thing with Gfx5 and hasn't come up again.
So, we can simply assert that we are truly "SIMD splitting", and assume
that the lowered size is smaller than the original instruction size.
This avoids some mental complexity as we can always think of the split
instructions as taking apart, operating on, and recombining subsets of
the original values.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27959>
This copy propagation can create MADs with immediates in src1, which
need to be cleaned up by constant combining (which puts them back in
VGRFs).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
opt_register_coalesce can sometimes unpleasantly coalesce both
SENDS payload sources into the larger of the two registers.
This can break the assumption that the VGRFs for sources 2-3
must occupy no more than 16 registers, so they fit in g112-127.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
Sometimes one source can be a larger register than the other, especially
since opt_register_coalesce can sometimes coalesce those sources into
larger registers.
Copy the smaller of mlen and ex_mlen. It's less copying.
shader-db and fossil-db on Alchemist show 47 shaders affected with
small 1-2 instruction improvements each, and no regressions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
We were handling FIXED_GRF, but we probably also ought to handle ATTR
(pushed inputs) and UNIFORM (pushed constants). Just check if file
isn't VGRF to handle everything.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
brw_lower_find_live_channel doesn't actually write a flag register,
but elk_find_live_channel notes that the flag was used on Gfx7.
This allows more CSE on FIND[_LAST]_LIVE_CHANNEL.
shader-db and fossil-db on Alchemist show minor reductions in cycles
and instruction count, a few minor increases, but it doesn't seem to
be a large effect in either direction.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27862>
When lower_simd_width() encounters an instruction that needs a larger
SIMD, for example SHADER_OPCODE_TXS_LOGICAL in Gfx4 needs at least
SIMD16. In this case the builder needs to be at least as large as
max_width, otherwise the group() setup will assert.
Turns out this did not assert before "by accident", since it was
relying on the default fs_visitor builder that had a dispatch width of 64,
a bogus placeholder value, expected not to be used.
However, when we changed the code to remove that builder (and the bogus
value), we created a new builder in the pass shader dispatch_width --
which work fine except in the case where we want to "lower" the SIMD above
the shader dispatch width. The fix is to also consider the already
calculated max_width when creating the builder.
Fixes: 5b8ec015f2 ("intel/compiler: Don't use fs_visitor::bld in remaining places")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10338
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27782>
We first generate the logical opcodes, and these days fully lower to
SHADER_OPCODE_SEND. In the past, we lowered to a non-logical variant
and handled that in the generator. These days, we were just using the
non-logical opcodes as an awkward intermediate opcode change during
the lowering...which isn't really necessary at all.
This patch eliminates them by using the original logical opcodes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
We're not really doing any fancy texturing here, just emitting a TEX
instruction that writes multiple destination registers. I plan to
remove the non-logical TEX instruction in the next commit, so we swap
these over to use the logical version instead. It should work just as
well for the purposes of the test.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
Thanks to Ken for suggesting this URB refactoring change and pointing
out that the LSC can operate on the byte offset granularity.
This should fix the geometry shader test cases where we have more than
32 vertices since previously we were failing to write the correct
control data bits because of incorrect write mask.
Shader-db results for Xe2:
total instructions in shared programs: 153475 -> 153437 (-0.02%)
instructions in affected programs: 1374 -> 1336 (-2.77%)
helped: 11
HURT: 0
helped stats (abs) min: 3 max: 5 x̄: 3.45 x̃: 3
helped stats (rel) min: 1.67% max: 4.92% x̄: 3.23% x̃: 2.70%
95% mean confidence interval for instructions value: -3.92 -2.99
95% mean confidence interval for instructions %-change: -4.10% -2.36%
Instructions are helped.
total loops in shared programs: 140 -> 140 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 16002649 -> 16002329 (<.01%)
cycles in affected programs: 9174 -> 8854 (-3.49%)
helped: 11
HURT: 0
helped stats (abs) min: 22 max: 38 x̄: 29.09 x̃: 32
helped stats (rel) min: 2.62% max: 5.54% x̄: 3.78% x̃: 3.85%
95% mean confidence interval for cycles value: -33.56 -24.62
95% mean confidence interval for cycles %-change: -4.48% -3.08%
Cycles are helped.
total spills in shared programs: 52 -> 52 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 94 -> 94 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
total sends in shared programs: 4240 -> 4240 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 0
GAINED: 0
Rework: (Sagar)
- Adjust offset/indirect offset calculation.
- Add shader-db results
- Always calculate dword index
- Drop changes for indirect writes
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27602>
This code uses cfg_t which we are going to rework a bit as part of
flattening the IR types. It is easier if it can see C++ types for now.
At the end we can change this back if needed.
To avoid casting and be consistent with existing structs,
use int for some offset parameters in the functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27866>
The compilers (brw and elk) static libraries depend only on
idep_nir_headers instead of idep_nir. This was done to
increase the parallelism in the build. One side effect is that
consumers of the compilers must depend on idep_nir themselves to
ensure nir symbols are resolved.
Various intel tools don't use NIR directly, so don't depend on it,
and only use a few functions of the compiler, that *mostly* don't
depend on linking NIR functions except for the case of nir_print_instr.
The current code adds a weak empty function to take its place in case
it is not linked. This is sort of a hack because if we change the
compiler in ways that use NIR differently, or we use different functions
of the compiler in the tools, we will end up having to add other
dummy definitions.
A better solution here (suggested by Dylan) is to add the idep_nir
to the list of dependencies of the compilers idep's. The static
libraries of the compilers still don't depend directly on NIR,
but any user of idep_compiler_* will get that dependency.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27865>
Instead of link_with, use meson dependency for the compilers. Will
be useful later to propagate some extra dependencies.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27865>
The base class was used when we had vec4, but now we can fold it with
its only subclass. Declare fs_visitor now as a struct to be able to
forward declare for C code without causing errors due to class/struct
being mixed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>