Gives use 4Gb of bindless surface state on Gfx12.5+ instead of 64Mb.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
The instructions manipulation cr0 use the default mask on lane0. So if
for some reason that lane is disabled in some of the dispatchs, we can
end up not executing the instructions.
Fixes flakyness in dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_float_32_to_16.uniform_matrix_float_rtz_frag
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22314>
Now that we aren't using them on Gfx8+ we can drop a lot of cruft.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
This hardware bug is the result of a control flow optimization present
in Gfx8-9 meant to prevent the ELSE instruction from disabling all
channels and update the control flow stack only to have them
re-enabled at the ENDIF instruction executed immediately after it.
Instead, on Gfx8-9 an ELSE instruction that would normally have ended
up with all channels disabled would pop off the last element of the
stack and jump directly to JIP+1 instead of to the ENDIF at JIP,
skipping over the ENDIF instruction. In simple cases this would work
okay (though it's actual performance benefit is questionable), but in
cases where a branch instruction within the IF block (e.g. BREAK or
CONTINUE) caused all active channels to jump outside the IF
conditional, the optimization would break the JIP chain of "join"
instructions by skipping the ENDIF, causing the block of instructions
immediately after the ENDIF to execute with all channels disabled
until execution reaches the reconvergence point.
This issue was observed on SKL in the
dEQP-VK.reconvergence.subgroup_uniform_control_flow_elect.compute.nesting4.0.38
test in combination with some Vulkan binding model changes Lionel is
working on. In such cases the execution with all channels disabled
was leading to corruption of an indirect message descriptor, causing a
hang.
Unfortunately the hardware bug doesn't provide a recommended
workaround. In order to fix the problem we point the JIP of an ELSE
instruction to the instruction immediately before the ENDIF -- However
that's not expected to work due to the restriction that JIP and UIP
must be equal if and only if BranchCtrl is disabled -- So this patch
also enables BranchCtrl, which is intended to support join
instructions within the "ELSE" block, which in turn disables the
optimization described above, which in turn causes us to execute the
instruction immediately *before* the ENDIF with all channels disabled
-- So in order to avoid further fallout from executing code with all
channels disabled we need to insert a NOP before ENDIF instructions
that have a matching ELSE instruction.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20921>
This will affect MTL which will have fp64 support without int64
support.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19284>
v2: drop the hardcoded inst->mlen=1 (Rohan)
v3: Move back to LOAD/STORE messages (limited to SIMD16 for LSC)
v4: Also use 4 GRFs transpose loads for fills (Curro)
v5: Reduce amount of needed register to build per lane offsets (Curro)
Drop some now useless SIMD32 code
Unify unspill code
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
Fixes the following EU validation error:
ERROR: Header must be present for all URB messages.
The message header is ignored for URB fence messages, so I doubt that
this actually matters in practice. But we should probably mark it as
present, because you have to send something, and according to the
documentation, there is a message header, it's just ignored.
Fixes: e6a9501aa2 ("intel/fs: Add the URB fence message")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
This structure will contain the opcode mapping tables in the next
commit. For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V. However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().
We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
For any fence greater than local scope, always set flush type to at
least invalidate so that fence goes on properly.
v2: Fixup condition to trigger workaround (Lionel)
v3: Simplify workaround (Curro)
v4: Don't drop the existing WA (Curro)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: 22.0 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14947>
brw_find_next_block_end() scans through the instructions to find the end
of the block. We were calling it for every instruction in the program
which is, if you have a single basic block, makes the whole mess a nice
clean O(n^2) when it really doesn't need to be. Instead, only call
brw_find_next_block_end() as-needed. This brings it back to O(n) like
it should have been.
This cuts the runtime of the following Vulkan CTS on my SKL box by 5%
from 1:51 to 1:45: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
Having an integer destination type instead of a float destination type
confuses the SWSB code. This causes problems on some Intel GPUs. Fix
this by using the correct type in the destination of the F32TOF16
opcode.
Gfx7 doesn't have the HF type, so continue to emit W on that platform.
The assertions in brw_F32TO16 (brw_eu_emit.c) are updated to reflect
this. In scalar mode, UD is never emitted as a destination type for
this opcode, so remove it from the allowed types in the assertion.
I also condidered doing something like de55fd358f ("intel/fs/xehp:
Teach SWSB pass about the exec pipeline of
FS_OPCODE_PACK_HALF_2x16_SPLIT."), but Curro recommended that just using
the correct types is a better fix. I agree.
v2: Add missing changes to fs_generator::generate_pack_half_2x16_split.
I'm not sure how I (and the Intel CI) missed that the first time. :(
v3: Fix copy-and-paste issue in the v2 fix. Noticed by Tapani.
Reviewed-by: Francisco Jerez <currojerez@riseup.net> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14181>
When they re-arranged all the dataport stuff and added the LSC, doing
URB fencing through the dataport no longer makes sense. Instead, there
is now a fence message on the URB shared function.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
v2: Very significant rebase on changes to previous commits.
Specifically, brw_fs_nir.cpp changes were pretty much rewritten from
scratch after changing the NIR opcode names and types.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
Setting it to GPU can cause an L3$ flush in certain cases. That's not
what we want as we really only care about coherency within the GPU.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12291>
This commit adds a delta to be added to the relocated value as well as
the possibility of multiple types of relocations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
There were two bugs which crep in here as part of 64551610d1:
forgetting that exec sizes in HW are in log2 space and having the
exec_size condition for the subtype backwards.
Fixes: 64551610d1 "intel/compiler: rework message descriptors..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10588>
It's a Gen6 XFB thing. It's never used for anything else so there's no
point in having a target cache switch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
v2: Use the new inst->ex_desc field (Jason)
v3: Drop CPS LoD compensation from sampler messages (Lionel)
v4: Drop useless uses_rate_shading (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
Render target message descriptors are slightly different from the
dataport ones. In particular the msg_type field is on bits 14:17 for
RT while bits 14:18 for DP.
v2: Drop unused send_commit_msg field in brw_fb_write_desc() (Ken)
v3: Rebase on top renaming (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
The execution units of XeHP platforms have multiple asynchronous ALU
pipelines instead of (as far as software is concerned) the single
in-order pipeline that handled most ALU instructions except for
extended math in the original Xe. It's now the compiler's
responsibility to identify cross-pipeline dependencies and insert
synchronization annotations whenever necessary, which are encoded as
some additional bits of the SWSB instruction field.
This commit represents the cross-pipeline synchronization annotations
as part of the existing tgl_swsb structure used for codegen. The
existing tgl_swsb_*() helpers used by hand-crafted assembly are
extended to default to TGL_PIPE_ALL big-hammer synchronization in
order to ensure backwards compatibility with the existing assembly.
The following commits will extend the software scoreboard lowering
pass in order to keep track of cross-pipeline dependencies across IR
instructions, and insert more specific pipeline annotations in the
SWSB field.
The disassembler is also extended here to print out any existing
pipeline sync annotations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
clang generates a warning if there's no explicit break or fall-through
annotation. The latter would be kind of silly in this case, and not
robust against any future changes turning the fall-through invalid.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
I meant to do this years ago when I first added SHADER_OPCODE_SEND. At
the time, the only use for the extended descriptor was bindless handles
which were always one thing and never non-constant. However, it doesn't
actually require any extra instructions because we have to OR in ex_mlen
anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8748>
The immediate case is pretty uncommon to see but it can happen, in
theory. BROADCAST is typically used to uniformize values and those are
usually 32-bit. However, it does come up in some subgroup ops.
Fixes: 49c21802cb "intel/compiler: Split has_64bit_types into float/int"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6211>
We don't want to use it on gen5 and earlier because only RNDD can be
done with a single instruction and we can implement RNDU(x) as -RNDD(-x)
so it's better to just do that when we have the instruction. On gen6
and above, we may as well just use the right instruction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>