Instead of making it a fragment-specific thing based on uses_kill, track
whether or not we need one in fs_visitor and emit HALT_TARGET at the end
of emit_nir_code() if needed.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
This means the pass has to walk all the instructions but it was doing
that in a bunch of cases anyway when it didn't have a HALT_TARGET.
However, removing HALT_TARGET frees up the scheduler a bit because
HALT_TARGET is considered a scheduling barrier. The shader-db results
are kind-of a wash but we're about to add HALT_TARGET unconditionally so
we want to be able to get rid of it.
Shader-db results on Ice Lake:
total instructions in shared programs: 19935623 -> 19935623 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 976758472 -> 976766135 (<.01%)
cycles in affected programs: 11097707 -> 11105370 (0.07%)
helped: 1750
HURT: 875
helped stats (abs) min: 1 max: 866 x̄: 26.39 x̃: 4
helped stats (rel) min: <.01% max: 39.24% x̄: 1.25% x̃: 0.46%
HURT stats (abs) min: 1 max: 1678 x̄: 61.54 x̃: 10
HURT stats (rel) min: <.01% max: 65.69% x̄: 1.86% x̃: 0.42%
95% mean confidence interval for cycles value: -2.48 8.32
95% mean confidence interval for cycles %-change: -0.40% -0.03%
Inconclusive result (value mean confidence interval includes 0).
LOST: 62
GAINED: 46
All of the lost/gained programs are SIMD32 fragment shaders.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
We're about to start using it to implement nir_jump_halt which has
nothing inherently to do with fragment shaders or discards. May as well
name it for the HW instruction it generates.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
The code works but is a bit fragile if we ever add a case that has a
less strict requirement (a smaller gen) than the case above. To avoid
having to reason about this, refactor code to use a variable to
indicate whether the SFID is supported or not.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7742>
They're not really "push" anymore but that's because there is no such
thing as push constants in bindless shaders on Intel. They should be
fast enough, though. There is some room for debate here as to whether
we want to do the pull in NIR or push it into the back-end. The
advantage of doing it in the back-end is that it'd be easier to use
MOV_INDIRECT for indirect push constant access rather than falling back
to a dataport message.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
For triangle geometry, the hit attributes are always two floats which
contain the barycentric coordinates of the hit. For procedural
geometry, they're an arbitrary blob of data passed from the intersection
shader to the hit shaders. In our implementation, we stash that data
right after the HW RayQuery in the ray stack.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Unlike graphics and compute pipelines, Vulkan ray-tracing pipelines do
not have a single entrypoint. Instead, the raygen shader is specified
as a one-element shader binding table in the vkCmdTraceRay call. This
means that raygen shaders have to be bindless shaders just like any
other ray tracing shader. To launch them, we have a tiny compute shader
that acts as a trampoline and sets up the hotzone and uses btd_spawn to
fire off the raygen shader.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Most of the work for this is done for us by spirv_to_nir which gives us
a load_global from a memory address based on the shader_record_ptr
system values.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
This is a little bit more work than executeCallable() because we also
have to set up the MemRay data structure which the ray traversal
hardware uses to keep its state.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Both traceRay() and executeCallable() take a payload parameter which
gets passed from the caller to the callee and which the callee can write
to pass data back to the caller. We implement these by passing a
pointer to the data structure in the callee to the caller as the second
QWord on its stack. Coming out of spirv_to_nir, the incoming call
payloads get the nir_var_shader_call_data variable mode allowing us to
easily identify them. Outgoing call payloads get assigned the
nir_var_shader_temp mode and will have been turned into function_temp by
nir_lower_global_vars_to_local. All we have to do is crawl the shader
looking for references to the nir_var_shader_call_data variable and
rewrite those to use the passed in pointer. nir_lower_explicit_io will
do the rest for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
These are required for ray-tracing. There are many cases where the
ray-tracing hardware may decide to execute some but not all of our
shaders. In these cases, it needs a shader to execute at the end which
will pop the stack back to the shader which called traceRay().
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Each callable ray-tracing shader shader stage has to perform a return
operation at the end. In the case of raygen shaders, it retires the
bindless thread because the raygen shader is always the root of the call
tree. In the case of any-hit shaders, the default action is accep the
hit. For callable, miss, and closest-hit shaders, it does a return
operation. The assumption is that the calling shader has placed a
BINDLESS_SHADER_RECORD address for the return in the first QWord of the
callee's scratch space. The return operation simply loads this value
and calls a btd_spawn intrinsic to jump to it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
In ray-tracing shader stages, we have a real call stack and so we can't
use the normal scratch mechanism. Instead, the invocation's stack lives
in a memory region of the RT scratch buffer that sits after the HW ray
stacks. We handle this by asking nir_lower_io to lower local variables
to 64-bit global memory access. Unlike nir_lower_io for 32-bit offset
scratch, when 64-bit global access is requested, nir_lower_io generates
an address calculation which starts from a load_scratch_base_ptr. We
then lower this intrinsic to the appropriate address calculation in
brw_nir_lower_rt_intrinsics.
When a COMPUTE_WALKER command is sent to the hardware with the BTD Mode
bit set to true, the hardware generates a set of stack IDs, one for each
invocation. These then get passed along from one shader invocation to
the next as we trace the ray. We can use those stack IDs to figure out
which stack our invocation needs to access. Because we may not be the
first shader in the stack, there's a per-stack offset that gets stored
in the "hotzone".
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
These will eventually contain per-stage lowering for various ray-tracing
things. This is separate from brw_nir_lower_rt_intrinsics because, for
reasons that will become apparent later, brw_nir_lower_rt_intrinsics has
to be run very late in the compile process, right before brw_compile_bs.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The new intrinsics we added for doing address calculations are all
things we fetch from the RT_DISPATCH_GLOBALS struct. We could emit an
RT_DISPATCH_GLOBALS load at every point we want it and trust NIR to CSE
it for us but it's easier to use intermediate intrinsics.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The Intel bindless thread dispatch model is very simple. When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs. These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned. Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke. When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers. The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
I checked the Bspec for both Gen11 and Gen12, and it appears that rotate
instructions cannot have source modifiers or saturate modifer. Saturate
was already handled.
Fixes: 1e92e83856 ("intel/compiler: Emit ROR and ROL instruction")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7650>
Icelake's sampler message header introduces a field in m0.3 bit 0
which controls whether the sampler state pointer should be relative
to bindless sampler state base address or dynamic state base address.
g0.3 bit 0 is part of the per-thread scratch space field. On older
hardware, we were able to copy that along because the sampler ignored
bits 4:0. Now, however, we need to mask them out.
Fixes various textureGatherOffsets piglit tests when forcing the FS
to run with 2048 bytes of per-thread scratch space (which is a
per-thread scratch space encoding of 1, meaning bit 0 will be set).
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6735>
In our source languages, interpolateAtOffset() takes a floating point
offset in the range [-0.5, +0.5]. However, the hardware takes integer
valued offsets in the range [-8, 7], in units of 1/16th of a pixel.
So, we need to multiply and clamp the coordinates. We were doing this
in the FS backend, but with the advent of IBC, I'd like to avoid doing
it twice. This patch instead moves the lowering to NIR so we can reuse
it across both backends.
v2: Use nir_shader_instructions_pass (suggested by Eric Anholt).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6193>
Motivation is to detect earlier certain bugs that can occur when
missing a check for the stage before using the downcast.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7540>
In commit eda3e4e055, Eric added names
to various programs. In that patch, he also renamed our passthrough
TCS shader from "passthrough" to "passthrough TCS". The passthrough
TCS directly supplies the VUE headers rather than doing the whole
"patch parameters are in backwards order" reswizzling dance.
We failed to detect this and started trying to supply vec4s starting
at component 3, leading to a stack smash on an array of 7 sources,
not to mention the values were being put in the wrong place.
Easy fix: update the code for the new name.
Fixes: eda3e4e055 ("nir/builder: Add a name format arg to nir_builder_init_simple_shader().")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3777
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7564>
This cleans up a bunch of gross sprintfs and keeps the caller from needing
to remember to ralloc_strdup. I added a couple of '"%s", name ? name :
""' to radv where I didn't fully trace through whether a non-null name was
being passed in.
I also took the liberty of adding a basic name to a few shaders (pan_blit,
unit tests)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7323>
These two consumers were the only ones out of the ~65 calls to
init_simple_shader, so there's a pretty clear consensus on how to allocate
simple shaders. I suspect that actually these would be just fine with
b.shader being the mem_ctx, but that would take a bit more rework.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7323>
Intel hardware supports 8-bit arithmetic but it's tricky and annoying:
- Byte operations don't actually execute with a byte type. The
execution type for byte operations is actually word. (I don't know
if this has implications for the HW implementation. Probably?)
- Destinations are required to be strided out to at least the
execution type size. This means that B-type operations always have
a stride of at least 2. This means wreaks havoc on the back-end in
multiple ways.
- Thanks to the strided destination, we don't actually save register
space by storing things in bytes. We could, in theory, interleave
two byte values into a single 2B-strided register but that's both a
pain for RA and would lead to piles of false dependencies pre-Gen12
and on Gen12+, we'd need some significant improvements to the SWSB
pass.
- Also thanks to the strided destination, all byte writes are treated
as partial writes by the back-end and we don't know how to copy-prop
them.
- On Gen11, they added a new hardware restriction that byte types
aren't allowed in the 2nd and 3rd sources of instructions. This
means that we have to emit B->W conversions all over to resolve
things. If we emit said conversions in NIR, instead, there's a
chance NIR can get rid of some of them for us.
We can get rid of a lot of this pain by just asking NIR to get rid of
8-bit arithmetic for us. It may lead to a few more conversions in some
cases but having back-end copy-prop actually work is probably a bigger
bonus. There is still a bit we have to handle in the back-end. In
particular, basic MOVs and conversions because 8-bit load/store ops
still require 8-bit types.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>
We can't really support these directly on any platform. May as well let
NIR lower them. The NIR lowering is potentially one more instruction
for scan/reduce ops thanks to not being able to do the B->W conversion
as part of SEL_EXEC. For imax/imin exclusive scan, it's yet another
instruction thanks to the extra imax/imin NIR has to insert to deal with
the fact that the first live channel will contain the identity value
which, when signed, will cast wrong. However, it does let us drop some
complexity from our back-end so it's probably worth it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>