This is a little bit more work than executeCallable() because we also
have to set up the MemRay data structure which the ray traversal
hardware uses to keep its state.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Both traceRay() and executeCallable() take a payload parameter which
gets passed from the caller to the callee and which the callee can write
to pass data back to the caller. We implement these by passing a
pointer to the data structure in the callee to the caller as the second
QWord on its stack. Coming out of spirv_to_nir, the incoming call
payloads get the nir_var_shader_call_data variable mode allowing us to
easily identify them. Outgoing call payloads get assigned the
nir_var_shader_temp mode and will have been turned into function_temp by
nir_lower_global_vars_to_local. All we have to do is crawl the shader
looking for references to the nir_var_shader_call_data variable and
rewrite those to use the passed in pointer. nir_lower_explicit_io will
do the rest for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
These are required for ray-tracing. There are many cases where the
ray-tracing hardware may decide to execute some but not all of our
shaders. In these cases, it needs a shader to execute at the end which
will pop the stack back to the shader which called traceRay().
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Each callable ray-tracing shader shader stage has to perform a return
operation at the end. In the case of raygen shaders, it retires the
bindless thread because the raygen shader is always the root of the call
tree. In the case of any-hit shaders, the default action is accep the
hit. For callable, miss, and closest-hit shaders, it does a return
operation. The assumption is that the calling shader has placed a
BINDLESS_SHADER_RECORD address for the return in the first QWord of the
callee's scratch space. The return operation simply loads this value
and calls a btd_spawn intrinsic to jump to it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
In ray-tracing shader stages, we have a real call stack and so we can't
use the normal scratch mechanism. Instead, the invocation's stack lives
in a memory region of the RT scratch buffer that sits after the HW ray
stacks. We handle this by asking nir_lower_io to lower local variables
to 64-bit global memory access. Unlike nir_lower_io for 32-bit offset
scratch, when 64-bit global access is requested, nir_lower_io generates
an address calculation which starts from a load_scratch_base_ptr. We
then lower this intrinsic to the appropriate address calculation in
brw_nir_lower_rt_intrinsics.
When a COMPUTE_WALKER command is sent to the hardware with the BTD Mode
bit set to true, the hardware generates a set of stack IDs, one for each
invocation. These then get passed along from one shader invocation to
the next as we trace the ray. We can use those stack IDs to figure out
which stack our invocation needs to access. Because we may not be the
first shader in the stack, there's a per-stack offset that gets stored
in the "hotzone".
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
These will eventually contain per-stage lowering for various ray-tracing
things. This is separate from brw_nir_lower_rt_intrinsics because, for
reasons that will become apparent later, brw_nir_lower_rt_intrinsics has
to be run very late in the compile process, right before brw_compile_bs.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The new intrinsics we added for doing address calculations are all
things we fetch from the RT_DISPATCH_GLOBALS struct. We could emit an
RT_DISPATCH_GLOBALS load at every point we want it and trust NIR to CSE
it for us but it's easier to use intermediate intrinsics.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The Intel bindless thread dispatch model is very simple. When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs. These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned. Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke. When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers. The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The RT_DISPATCH_GLOBALS struct is half HW-defined by the ray-tracing
spec and half SW-defined. However, due to the addresses in it, it's
convenient for it to all be in GenXML.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The cloned version is the one that has updated start and end bits
fields. We're about to start passing those through to a new
__gen_address function and we need the correct start/end in order to do
that reliably.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
This is the first of the HW data structures added for ray-tracing.
These are added to their own file because it's not really associated
with any hardware we've enabled in Mesa just yet. Eventually, these
will likely get folded into the appropriate genX.xml file as they are
hardware data structures and needed to be tracked as such.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Halt is like a return for the entire shader or exit() if you prefer to
think of it that way. Once an invocation hits a halt, it's 100% dead.
Any writes to output variables which happened before the halt do,
however, still apply.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
If valgrind is installed, these components need to find valgrind.h.
Fixes: 53f7d539cd ("util: Add helgrind support for simple_mtx")
Closes: #3876
Acked-by: Rob Clark <robclark@freedesktop.org>
Fixes helgrind complaint found with piglit glx-multithread-clearbuffer.
This is a legit race because override[api].version is cleared before
parsing the override string.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7644>
Also, since there is a second call-path into st_init_extensions() from
get_version(), add an extra st_debug_init() call.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7644>
Makes the code more concise, and makes helgrind/drd happy at the same
time!
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7644>
A fairly common pattern for debug envvars is something like:
static int should_print = -1;
if (should_print < 0)
should_print = env_var_as_unsigned("NIR_PRINT", 0);
Unfortunately helgrind doesn't realize that we expect to always get the
same return value, so we don't actually care about the race condition
here.
Add a helper get_once() and do_once macros, with extra locking to make
helgrind/drd happy. Note that other than the nir usages (which are
limited to debug builds), other usages are not in hot-paths.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7644>
In the final version of SPV_KHR_ray_tracing, these are now block
terminators like OpKill or OpReturn. This means that they need special
handling in vtn_cfg.c.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7734>
The old NV version (and the provisional KHR version) specified the data
payload via an integer location. This was quite annoying for the parser
and potentially error-prone. The final KHR version of the SPIR-V
ray-tracing spec replaces these integers with actual pointers. We don't
really need to implement the NV versions but we have the code and
someone might want to parse some NV ray-tracing shaders.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7734>