Specifically, allow two immediate sources for BFE on Gfx12+. I stumbled
on this while trying some stuff with !31852.
v2: Don't be lazy. Add proper assertions for all the things on all the
platforms. Based on a suggestion by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7bed11fbde ("intel/brw: Allow immediates in the BFE instruction on Gfx12+")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31858>
Prior to Cherryview, align16 3src instruction sources had to have their
subregister number be DWord-aligned. Cherryview added a discontiguous
bit in the encoding to represent bit 1 of the subregister number. This
allows us to use packed HF sources.
Update the ISA encoding helpers to properly handle bit 1. While we're
at it, make them take a full subregister number and adjust accordingly,
rather than making the callers divide or multiply by some alignment.
Note that the destination subregister must still be DWord aligned, so
HF destinations must be strided.
Thanks to Ian Romanick for discovering that we were botching this.
BSpec: 12054, 12081
v2 (idr): Fix ordering of high and low bit parameters to brw_inst_bits.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
The long names were originally meant to map to the HW encoding but
nowadays the actual encoding values depend on gfx version, whether
instruction is 3src, etc.
Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
Create specific helper for register file encoding and handle it there.
Use ad-hoc structs to let the macro take optional named arguments.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
We're going to need to handle encoding/decoding differently for DPAS vs.
SEND/SENDC vs. other instructions. Pass the opcode so we can figure out
the encodings for each type of instruction.
Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>
The r0.5 thread payload register contains Surface State Offset bits
[27:6] as bits [31:10], so we need to shift the register right by 4 in
order to get the surface state offset expected in ExBSO mode, which is
the only extended descriptor encoding supported by the UGM shared
function for SS addressing on Xe2+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>
The enums were mixed up. Code was working because they were being
used only for their numerical values.
Fixes: e666872c75 ("intel/compiler: Initial bits for DPAS instruction")
Acked-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29762>
Rework:
* Francisco Jerez: Specify the src1 length value in the correct
units. Don't break earlier platforms.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
We'll use the delta for an upcoming internal printf mechanism, where
the PARAM_IDX will be the base printf reloc identifier and the BASE
will be the string id.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
Both of these helpers do the same thing. We now have brw_type_size_bits
and brw_type_size_bytes and can use whichever makes sense in that place.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
Icelake removed the PLN instruction for interpolating fragment shader
inputs, instead adding a special "Native Float" (NF) data type which
was a 66-bit floating point data type that could only be used with the
accumulator. On Tigerlake, they dropped NF support in favor of just
doing the interpolation with MAD instructions.
We stopped using NF years ago (commit 9ea90aae1e),
instead just using the fs_visitor::lower_linterp() pass to emit MADs.
Since this existed only for a short time, and had very limited utility,
we drop it from the compiler. One downside is that we can no longer
disassemble Icelake shaders containing NF types properly, but I doubt
anyone really minds.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
We already have a logical opcode and lower to what is basically a send
instruction. We just weren't using SHADER_OPCODE_SEND, instead having
extra redundant infrastructure for no real gain.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
align16 support is only used on Gen9 for 3-source instructions, quad
swizzling, and dPdy calculations. We don't need it for broadcast.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
brw_broadcast and generate_mov_indirect both had similar comments, both
with typos ("insead"). One still referred to IVB bugs, while the other
dropped that during the compiler split. The one that dropped the
comment mentioned "both of these" issues, while citing only one issue;
there was in fact a third issue (no-Q/UQ) that wasn't mentioned in
either comment. One also had some bad grammar in the comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
Folks, there's more than one accumulator. In general, when the
register file is ARF, the upper 4 bits of the register number specify
which ARF, and the lower 4 bits specify which one of that ARF. This
can be further partitioned by the subregister number.
This is already mostly handled correctly for flags register, but lots
of places wanted to check the register number for equality with
BRW_ARF_ACCUMULATOR. If acc1 is ever specified, that won't work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28281>
Having the reg set with predication disabled shouldn't cause any problems
during the execution. But when decompiling such instruction the flag won't
be shown in the output, so the recompiling will cause
functionally-identical but binary-different code. Fixing this makes
disasm/asm testing easier.
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25657>
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.
v3: Prevent lower_regioning from trying to "fix" DPAS sources.
v4: Add instruction latency information for scheduling and perf
estimates.
v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.
v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
align is a function and when we want use it, the align variable will shadow it
So replace it with other names
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25997>
Gives use 4Gb of bindless surface state on Gfx12.5+ instead of 64Mb.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
The instructions manipulation cr0 use the default mask on lane0. So if
for some reason that lane is disabled in some of the dispatchs, we can
end up not executing the instructions.
Fixes flakyness in dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_float_32_to_16.uniform_matrix_float_rtz_frag
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22314>
Now that we aren't using them on Gfx8+ we can drop a lot of cruft.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
This hardware bug is the result of a control flow optimization present
in Gfx8-9 meant to prevent the ELSE instruction from disabling all
channels and update the control flow stack only to have them
re-enabled at the ENDIF instruction executed immediately after it.
Instead, on Gfx8-9 an ELSE instruction that would normally have ended
up with all channels disabled would pop off the last element of the
stack and jump directly to JIP+1 instead of to the ENDIF at JIP,
skipping over the ENDIF instruction. In simple cases this would work
okay (though it's actual performance benefit is questionable), but in
cases where a branch instruction within the IF block (e.g. BREAK or
CONTINUE) caused all active channels to jump outside the IF
conditional, the optimization would break the JIP chain of "join"
instructions by skipping the ENDIF, causing the block of instructions
immediately after the ENDIF to execute with all channels disabled
until execution reaches the reconvergence point.
This issue was observed on SKL in the
dEQP-VK.reconvergence.subgroup_uniform_control_flow_elect.compute.nesting4.0.38
test in combination with some Vulkan binding model changes Lionel is
working on. In such cases the execution with all channels disabled
was leading to corruption of an indirect message descriptor, causing a
hang.
Unfortunately the hardware bug doesn't provide a recommended
workaround. In order to fix the problem we point the JIP of an ELSE
instruction to the instruction immediately before the ENDIF -- However
that's not expected to work due to the restriction that JIP and UIP
must be equal if and only if BranchCtrl is disabled -- So this patch
also enables BranchCtrl, which is intended to support join
instructions within the "ELSE" block, which in turn disables the
optimization described above, which in turn causes us to execute the
instruction immediately *before* the ENDIF with all channels disabled
-- So in order to avoid further fallout from executing code with all
channels disabled we need to insert a NOP before ENDIF instructions
that have a matching ELSE instruction.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20921>
This will affect MTL which will have fp64 support without int64
support.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19284>