Mesh and Task shaders can use workgroup memory, so generalize its
handling in anv by moving it from anv_pipeline_compile_cs() to
anv_pipeline_lower_nir().
Update Pipeline Statistics accordingly.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11230>
Move it out the "cs" sub-struct, since the bit will be used for other
shader stages in the future.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11225>
This drops
-0000000000011e90 R isl_format_layouts
+0000000000008f48 R isl_format_layouts
I think that's about 36k.
Thanks to Jason for suggesting PACKED
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11232>
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
Since 40e1d798c6, we are now using physical register numbers for
everything which makes it all simpler. In particular, we no longer need
the special case for setting up the payload for SIMD16 on Gen4-5. This
fixes a pile of piglit tests on ILK and similar.
Fixes: 40e1d798c6 "intel/fs: Use ra_alloc_contig_reg_class()..."
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11221>
The format enum space isn't necessarily contiguous so we can't assume
that if it's in the table it's valid. We need to check something.
Fixes: ed6e586562 "intel: properly constify isl_format_layouts"
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11191>
By using the new class type, we don't need to make 1928 different
registers to represent each contigous reg size starting from the actual
128 HW register, or have a mapping between RA regs and HW base regs. With
the number of regs reduced, and the fast q computation when using the new
classes, we no longer need to compute our own q.
This drops the FS RA initialization time on my CFL system from about 1ms to
50us.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
Putting a const char * in the struct means it's a pointer that has to be
resolved at rtld time, which means it can be in .data.rel.ro but not
.rodata like you'd hope. Fix this with the usual string table trick.
Cuts about 20k (-80k read-write +60k read-only) and ~280 relocations
from the gallium driver.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11168>
This avoids multiple copies as we will need this in multiple places.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10366>
In 2db8867943, we introduced a new meta-op MOV_FOR_SCRATCH which is
identical to MOV except it lets us identify MOVs emitted during spilling
so we know not to re-spill those instructions. We emit them from
shuffle_for_64bit_data whenever the new for_scratch parameter is true.
Unfortunately, we missed the one used for resolving swizzles.
Fixes: 2db8867943 "intel/vec4: Don't spill fp64 registers more..."
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11155>
Iris only runs on BDW+ and ANV already handles this by not even trying
on anything older than HSW. The only driver benefiting from this common
check is i965. Moving it out makes the pass more generic and if some
driver comes along which can push UBOs on IVB, it should work for that.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11145>
Fixes the following building error:
FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so
...
ld.lld: error: undefined symbol: brw_compile_ff_gs_prog
>>> referenced by brw_ff_gs.c:56 (external/mesa/src/mesa/drivers/dri/i965/brw_ff_gs.c:56)
Fixes: 52e426fd8b ("intel/compiler: add support for compiling fixed function gs")
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10718>
Note that CCS isn't ambiguated during a HiZ ambiguate. Dumping the CCS
surface after a HiZ ambiguate shows that the CCS is unchanged.
Fixes: 98dc7f56b7 ("intel/isl: Add a separate ISL_AUX_USAGE_HIZ_CCS_WT")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9112>
There is an effort to drop the "Gen" prefix from much of our codebase.
This just applies this to the metrics.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10930>
Adding a register, similar to what was done for RenderBasic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10930>
The availability of those counters depends on the topology.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10930>
A successful AHardwareBuffer_allocate itself will increase a refcount on
the newly allocated AHB. For the import case, the implementation must
acquire a reference on the AHB. So if we layer the exportable allocation
on top of AHB allocation and AHB import, we must release an AHB
reference to avoid leak.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10940>
On XeHP there are restrictions on types of source and destinations
with float types. As shuffle is implemented using MOV we need to make
sure we lower it to supported types.
This fixes tests like :
dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemax_vec4_vertex
dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemul_f16vec3_vertex
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10902>
Fixes perf regression introduced from tileY LID order for CS
shaders that access both textures and buffers. Walks LIDs in
X-major fashion, but with blocks of height 4. This maps LIDs per
HW thread for SIMD8/16/32 as (2x4/4x4/8x4), which is always good
for tileY resources and usually good for linear resources.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10733>
Computer shaders that access tileY resources (textures) benefit
from Y-locality accesses. Easiest way to implement this is walk
local ids in Y-major fashion, instead of X-major fashion. Y-major
local ids will reduce partial writes and increase cache locality
for tileY accesses since tileY resources cachelines progress in
Y direction.
Improves performance on TGL:
Borderlands3.dxvk-g2 +1.5%
Y-major can introduce a performance drop on CS that use mixture
of buffers and images. This should be fixed in next commit.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10733>