This allows, for example, fs_inst::components_read() without passing
devinfo as extra argument.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
For SIMD8 half float payload, each component takes a full register, so
we can use existing LOAD_PAYLOAD infrastruture for required padding by
alternating plain 8-wide half float vector and null vector.
Also this patch removes an unwanted assertion from
opt_copy_propagation_local for LOAD_PAYLOAD.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
To support SIMD8 half float payloads, each component takes one full
32bit wide register in both SIMD8H and SIMD16H mode. So we can make use
of existing LOAD_PAYLOAD infrastructure alternating a half float vector
and a null vector, in order to handle required padding.
v2: (Francisco)
- Skip header sources
- Fix comparision units
- Don't allocate VGRF for padded source
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
We are always aligning to REG_SIZE but when we have payload sources less
than REG_SIZE, size written is miscalculated.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
We can use LOAD_PAYLOAD infrastructure in order to handle 16bit float
payload. Let's rely on source type for padding sources, if not set
previously then default one would be 32-bit.
This patch will be used later in the series to handle 16-bit float
payloads.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
Update return format field and add SIMD Mode [2] field in sampler
descriptor. Now we can tell sampler to return data in either 32/16 bit
format precision.
v1:
- Drop unnecessary descriptor fields (Jason)
- Handle return format (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
on GFX8 onwards, we have only single bit to determine correct return
format.
v2:
- Define macro and use it instead of hardcoded value. (Lionel)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
This function is big and I don't think it will won't get meaningfully
constant-propagated during inlining without LTO. Move it to a .c file so
we just have one copy, saving 2.8MB from libnir.a on an amd64 release
build.
text data bss total filename
before:
18953406 7768312 687260 27408978 build-release/driver-symlinks/iris_dri.so
9734366 5542453 481692 15758511 build-release/lib/libvulkan_intel.so
28687772 13310765 1168952 43167489 (TOTALS)
after:
15478350 7767864 687260 23933474 build-release/driver-symlinks/iris_dri.so
6810366 5541685 481692 12833743 build-release/lib/libvulkan_intel.so
22288716 13309549 1168952 36767217 (TOTALS)
No statistically significant performance difference on iris shader-db, n=8.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13889>
For drivers that use this in fragment shaders, load_input is going to
produce incorrect results (flat-shaded values).
Fixes clipping tests on a4xx.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>
Drivers expect to know the number of clip distances irrespective of
whether compact arrays are used or not.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13900>
The float value may be out of range, so must be clamped to the allowed
range. Unclear if a3xx also has a SNORM factor that we're just missing
there, but that will be a separate investigation.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13903>
Unlike with regular textures, we really have to support all the formats
directly for TBOs to work properly. Add the missing formats to fix
arb_texture_buffer_object-formats piglit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13906>
Otherwise some of these fall back to RGBA_SNORM, which can screw up
blend factors.
Fixes spec@ext_texture_snorm@fbo-blending-formats.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13904>
According to the spec the hardware locks the scoreboard on the first
or last thread switch (selected via shader state) and any TLB accesses
executed before this are not synchronized by hardware.
This change updates the logic to ensure we respect this requirement
and that we don't assume that the lock is acquired automatically
on the first TLB access, which is not valid at least since V3D 4.1+.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13910>
Writes to physical registers are not allowed after thread end. We
were checking this for ALU writes, but we need to check it for
signal writes too.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13910>
At high frequency sampling, this generates a lot of messages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13831>
Otherwise we need to include intel headers in generic code.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13831>
The issue seems to be that without proper timestamps & clock_id, the
recording might discard some packets if they go backward in time.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13831>
On newer HW it will require more work than just reading a dword. It
could also vary depending on the report format.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13831>
For this each driver must :
- report its clock_id (if no particular clock just default to cpu
boottime one)
- be able to sample its clock (gpu_timestamp())
The PPSDataSource will then emit timestamp correlation events in the
trace ensuring perfetto is able to display GPU & CPU events
appropriately on its timeline.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13831>
This solves a case where a NIR geometry shader was storing the output in
a non-constant:
vec4 32 ssa_1 = load_const (0xc0800000 /* -4.000000 */, 0xc1100000 /* -9.000000 */, 0x40400000 /* 3.000000 */, 0x40e00000 /* 7.000000 */)
vec1 32 ssa_7 = load_const (0x00000000 /* 0.000000 */)
vec1 32 ssa_8 = load_const (0x00000001 /* 0.000000 */)
vec1 32 ssa_9 = iadd ssa_7, ssa_8
vec1 32 ssa_19 = mov ssa_1.x
intrinsic store_output (ssa_19, ssa_9) (1, 1, 0, 160, 288) /* base=1 */ /* wrmask=x */ /* component=0 */ /* src_type=float32 */ /* location=32 slots=2 gs_streams(x=0 y=0 z=0 w=0) */
When lowering the VPM output we check if the destination (ssa_9 in this
case) is a constant to add to the VPM offset. We run a constant folding
optimization in an earlier VS lowering, and we should do the same for
GS.
This fixes multiple dEQP-VK.pipeline.interface_matching.* failures.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13884>
While fragment and geometry shader were handling structs as inputs, they
weren't doing for it arrays of structures.
This fixes multiple dEQP-VK.pipeline.interface_matching.* failures and
assertions.
v2:
- Fix style (Iago).
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13884>
Now that we removed the intel intrinsic and just use the generic one,
we can skip it in the intel call lowering pass and just deal with it
in the intel rt intrinsic lowering.
v2: rewrite with nir_shader_instructions_pass() (Jason)
v3: handle everything in switch (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 423c47de99 ("nir: drop the btd_resume_intel intrinsic")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12113>
TLSDESC speeds up access to dynamic TLS. This is especially important
for non-glibc targets, but is also helpful for non-initial-exec TLS
variables.
The entry asm does not support TLSDESC, but it only accesses
initial-exec symbols, so it is not necessary to handle that separately.
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12722>
It is not portable to use initial-exec TLS in dlopened libraries. glibc
and FreeBSD allocate extra memory for extra initial-exec variables
specifically for libGL, but other libcs including musl do not.
Keep initial-exec disabled on FreeBSD since it is apparently broken for
some reason:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/966#note_39451281dbdb15d5
Enable TLS on OpenBSD and Haiku based on the u_thread.h comment that
emutls is better than pthread_getspecific, which seems plausible given
that emutls has strictly more information to work with.
Fixes#966.
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12722>
We have to validate the target before fetching the current texture
object. Move this so that it happens later.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13767>
libLLVM for Android is built without RTTI, but after commit ad86267
mesa inherits meson default RTTI enabled state
cpp_rtti=false is added to meson options in android/mesa3d_cross.mk
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13888>
It already does compaction, so we just need to load vertex positions
and cull. This was easier than expected.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13829>
si_llvm_translate_nir() changes ctx.stage, so the outside code shouldn't
use it. This hasn't caused any issues yet. Since ctx.stage starts as 0,
the first use in this commit was a tautology.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13829>