Sometimes one source can be a larger register than the other, especially
since opt_register_coalesce can sometimes coalesce those sources into
larger registers.
Copy the smaller of mlen and ex_mlen. It's less copying.
shader-db and fossil-db on Alchemist show 47 shaders affected with
small 1-2 instruction improvements each, and no regressions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
We were handling FIXED_GRF, but we probably also ought to handle ATTR
(pushed inputs) and UNIFORM (pushed constants). Just check if file
isn't VGRF to handle everything.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27876>
brw_lower_find_live_channel doesn't actually write a flag register,
but elk_find_live_channel notes that the flag was used on Gfx7.
This allows more CSE on FIND[_LAST]_LIVE_CHANNEL.
shader-db and fossil-db on Alchemist show minor reductions in cycles
and instruction count, a few minor increases, but it doesn't seem to
be a large effect in either direction.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27862>
When lower_simd_width() encounters an instruction that needs a larger
SIMD, for example SHADER_OPCODE_TXS_LOGICAL in Gfx4 needs at least
SIMD16. In this case the builder needs to be at least as large as
max_width, otherwise the group() setup will assert.
Turns out this did not assert before "by accident", since it was
relying on the default fs_visitor builder that had a dispatch width of 64,
a bogus placeholder value, expected not to be used.
However, when we changed the code to remove that builder (and the bogus
value), we created a new builder in the pass shader dispatch_width --
which work fine except in the case where we want to "lower" the SIMD above
the shader dispatch width. The fix is to also consider the already
calculated max_width when creating the builder.
Fixes: 5b8ec015f2 ("intel/compiler: Don't use fs_visitor::bld in remaining places")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10338
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27782>
We first generate the logical opcodes, and these days fully lower to
SHADER_OPCODE_SEND. In the past, we lowered to a non-logical variant
and handled that in the generator. These days, we were just using the
non-logical opcodes as an awkward intermediate opcode change during
the lowering...which isn't really necessary at all.
This patch eliminates them by using the original logical opcodes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
We're not really doing any fancy texturing here, just emitting a TEX
instruction that writes multiple destination registers. I plan to
remove the non-logical TEX instruction in the next commit, so we swap
these over to use the logical version instead. It should work just as
well for the purposes of the test.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27908>
Thanks to Ken for suggesting this URB refactoring change and pointing
out that the LSC can operate on the byte offset granularity.
This should fix the geometry shader test cases where we have more than
32 vertices since previously we were failing to write the correct
control data bits because of incorrect write mask.
Shader-db results for Xe2:
total instructions in shared programs: 153475 -> 153437 (-0.02%)
instructions in affected programs: 1374 -> 1336 (-2.77%)
helped: 11
HURT: 0
helped stats (abs) min: 3 max: 5 x̄: 3.45 x̃: 3
helped stats (rel) min: 1.67% max: 4.92% x̄: 3.23% x̃: 2.70%
95% mean confidence interval for instructions value: -3.92 -2.99
95% mean confidence interval for instructions %-change: -4.10% -2.36%
Instructions are helped.
total loops in shared programs: 140 -> 140 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 16002649 -> 16002329 (<.01%)
cycles in affected programs: 9174 -> 8854 (-3.49%)
helped: 11
HURT: 0
helped stats (abs) min: 22 max: 38 x̄: 29.09 x̃: 32
helped stats (rel) min: 2.62% max: 5.54% x̄: 3.78% x̃: 3.85%
95% mean confidence interval for cycles value: -33.56 -24.62
95% mean confidence interval for cycles %-change: -4.48% -3.08%
Cycles are helped.
total spills in shared programs: 52 -> 52 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 94 -> 94 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
total sends in shared programs: 4240 -> 4240 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 0
GAINED: 0
Rework: (Sagar)
- Adjust offset/indirect offset calculation.
- Add shader-db results
- Always calculate dword index
- Drop changes for indirect writes
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27602>
This code uses cfg_t which we are going to rework a bit as part of
flattening the IR types. It is easier if it can see C++ types for now.
At the end we can change this back if needed.
To avoid casting and be consistent with existing structs,
use int for some offset parameters in the functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27866>
The compilers (brw and elk) static libraries depend only on
idep_nir_headers instead of idep_nir. This was done to
increase the parallelism in the build. One side effect is that
consumers of the compilers must depend on idep_nir themselves to
ensure nir symbols are resolved.
Various intel tools don't use NIR directly, so don't depend on it,
and only use a few functions of the compiler, that *mostly* don't
depend on linking NIR functions except for the case of nir_print_instr.
The current code adds a weak empty function to take its place in case
it is not linked. This is sort of a hack because if we change the
compiler in ways that use NIR differently, or we use different functions
of the compiler in the tools, we will end up having to add other
dummy definitions.
A better solution here (suggested by Dylan) is to add the idep_nir
to the list of dependencies of the compilers idep's. The static
libraries of the compilers still don't depend directly on NIR,
but any user of idep_compiler_* will get that dependency.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27865>
Instead of link_with, use meson dependency for the compilers. Will
be useful later to propagate some extra dependencies.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27865>
The base class was used when we had vec4, but now we can fold it with
its only subclass. Declare fs_visitor now as a struct to be able to
forward declare for C code without causing errors due to class/struct
being mixed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
Register blocks and interp_mode[] were for Gfx4-5.
The binding table section doesn't seem to be used anymore, nor does
color_outputs_written.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27872>
Lowering/layout is pretty much the same as direct descriptors. The
caveats is that since the descriptor buffers are not visible from the
binding tables we can't promote anything to the binding table (except
push descriptors).
The reason for this is that there is nothing that prevents an
application to use both types of descriptors and because descriptor
buffers have visible address + capture replay, we can't merge the 2
types in the same virtual address space location (limited to 4Gb max,
limited 2Gb with binding tables).
If we had the guarantee that both are not going to be used at the same
time, we could consider a 2Gb VA for descriptor buffers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22151>
We can address samplers from 3 different locations :
- binding table
- dynamic state base address
- bindless sampler base address (only Gfx11+)
Here we allow samplers to be address from the dynamic state base
address with the embedded sampler flag.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22151>
Different sets were needed for SIMD8/SIMD16 in old Gfx versions, but now
we can use a single one regardless of the SIMD size.
Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>