This optimization doesn't work when the ray query index isn't uniform across
the subgroup, which is something the spec allows. While there are some smart
ways to fix this and still avoid unnecessary spilling, its not worth investing
the time until we find a realtime raytracing workload that actually needs to
use multiple live ray queries for something.
Fixes: 1f1de7eb ("anv,brw: Allow multiple ray queries without spilling to a shadow stack")
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39445>
This becomes more complex for gen8, as the lrz-status is per-slice.
Additionally the lrz-status layout isn't "stable" between GPUs of a
given generation.. the hw can change the layout, as it's not really
considered a sw interface.
Dropping HIC support for depth images removes one of two places in
the driver that reach into the lrz-status memory. The other is
tu_trace_end_render_pass(), but that is relatively safer.. at the
point that it is reading the status, all slices should be in the
same state.
Since HIC is not required for depth images, lets just delete some
code and not have this problem.
Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39375>
this enables (some) shaders generated by vtn to successfully pass through
ntv and generate valid spirv
the majority of the plumbing is to handle deref casts, which are currently
assumed to originate solely from loading descriptors
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39488>
- the vkSetDebugMetadataAsyncGOOGLE command should
not have an entry in the function table: it
leads to missing prototype errors
- Make gfxstream respect cpp_msvc_compat_args, since
it is a C++ project. -Wmissing-prototypes will be
made a cpp error *eventually*.
Reviewed-by: Aaron Ruby <aruby@qnx.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39418>
Otherwise:
gallium/auxiliary/gallivm/lp_bld_nir_soa.c:2394:7:
error: variable 'opname' is used uninitialized whenever switch default is taken
is observed.
Reviewed-by: @LingMan
Fixes: 12bceb228a ("gallivm: let reduce ops use llvm intrinsics")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39418>
It turns out that it was intended to round down when dividing the
framebuffer size by FDM size and all other implementations of
VK_EXT_fragment_density_map did that. We followed the spec, which
doesn't say to round (which is equivalent to rounding up), but the spec
will be updated to reflect the intended behavior.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39434>
There were a few missing things here:
- The max_waves can be odd even when wavesize_granularity = 2, unlike
with registers, so we should not multiply by wavesize_granularity.
This means we have to double branchstack_size to compensate.
- The actual limit was half what it should be on a6xx-a7xx, because when
I originally calculated this computerator was using the wrong
branchstack units. We need to double branchstack_size again.
- We should limit the branchstack based on max_branchstack and align it
to 2 on a5xx+, as we do when programming the HW.
- On a8xx the limit is doubled compared to a7xx to compensate for losing
wave128.
Fix all of these.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39468>
The branchstack starts one bit lower, and we have to round to the next
even value instead of dividing by 2. This matches the actual HW
definition and will make the next commits simpler.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39468>
This moves the dispatching for each winsys function out to arch-specific
variants of the pvr_winsys_ops structure instead. This gets rid of some
needless complexity, and should make the code easier to maintain in the
long run.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39348>
All uses of PVR_ARCH_DISPATCH in the powervr winsys were due to needing
to reach the kmd_stream.xml definitions. However, this isn't quite
enough to do this multi-arch; we also need to widen the interface to
pass extra context-switching information for future GPUs.
But, doing this with the per-arch infrastructure isn't a huge gain,
because all of this code runs during context-init. So let's walk things
a bit, and drop the dispatching here.
This does mean we need to stop using kmd_stream.xml definitions; I don't
think this is a huge loss; we're mostly open-coding the firmware
interface here anyway.
Unfortunately, the same is not the case in the pvrsrvkm winsys, because
the kernel driver used there doesn't abstract away the same HW details,
so we'll need to set up a bunch of things based on HW definitions. So
let's take a different approach there.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39348>
Disable RHWO by default for singlesample draws and for MSAA
draws if a drirc key is set (avoid perf hit if not needed).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39404>
Disable RHWO by default for singlesample draws and for MSAA
draws if a drirc key is set (avoid perf hit if not needed).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39404>