The hardware skips over unallocated slots, so we have to make sure those
registers are packed together.
Fixes KHR-GL45.enhanced_layouts.fragment_data_location_api
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
currently while insterting barriers, writes and reads to FILE_FLAGS aren't
considered. This can lead to WaR hazards in some situations.
With the previous commit fixes shaders with intstructions like this:
mad u32 $r2 $r4 $r11 $r2
mad u32 { $r5 $c0 } $r4 $r10 $r6
mad (SUBOP:1) u32 $r3 $r4 $r10 $r2 $c0
Affects OpenCL CTS tests on Maxwell+:
basic/test_basic intmath_long
basic/test_basic intmath_long2
basic/test_basic intmath_long4
v2: only put barriers on instructions which actually read flags
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
In the sched data calculator we have to track first use of defs by iterating
over all defs of an instruction, not just the first one.
v2: fix minGRP and maxGRP values
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
so that it can be removed and replaced with inline VBO descriptors,
and the pointer can be packed in unused bits of VBO descriptors.
This also removes the pointer from merged TES-GS where it's useless.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
On cayman this was hitting an assert later, which probably wasn't
see on non-cayman due to having the t slot.
Fixes: 9041730d1 (r600: add support for ARB_shader_clock.)
ALU_EXTENDED needs 4 DWORDS instead of the usual 2, hence if the last ALU
clause within a IF-JUMP or ELSE branch is ALU_EXTENDED the target jump
offset needs to be adjusted accordingly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104654
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address
aligned to 512KB. Hey, it's a 13-bit pointer!
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If 32-bit pointers are supported, both pointers can be moved into s[0:1]
and then ESGS has exactly the same user data SGPR declarations as VS.
If 32-bit pointers are not supported, only one pointer can be moved into
s[0:1]. In that case, the 2nd pointer is moved before TCS constants,
so that the location is the same in HS and GS.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cube maps are entire miptrees repeated, while 3D textures have each level
have all of its layers next to each other. Fixes tex3d and
tex-miplevel-selection GL2:texture() 3D.
Like for vc4, the new DISPLAY_TARGET flag ended up causing no formats to
match. Just drop the whole retval == usage thing and return early when we
hit a known unsupported case.
Fixes: f7604d8af5 ("st/dri: only expose config formats that are display targets")
LLVM requirement was bumped to 4.0.0 with earlier commit.
Hence any code tailored for older versions is now unreachable.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
When we set up the shadow resource we were copying the original resource
as the template, including its prsc->next field. When we shadowed the
first YUV plane's resource for linear-to-tiled conversion, we would end up
unbalancing the refcount on the shadow resource's destruction.
TS tiles map to a fixed amount of bytes in the color/depth surface,
so the blocksize of the format needs to be taken into account when
calculating the number of tiles to fill.
The simplest fix is to just use the layer stride, which is the surface
size in bytes.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Some of the 16bit formats misrender with missing tiles with the current
"2" state. As all the previously working formats also work with the "3"
state, just always use that one.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This feature has caused some trouble already. Add a debug switch to
allow users to quickly check if a specific issue is caused by this
feature.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reduces size of struct etna_specs from 100 to 94 bytes.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
We don't want filtering for integer textures, same as depth/stencil.
Fixes: KHR-GL45.direct_state_access.renderbuffers_storage_multisample
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
We need to mark the range as valid, and validate the resource using a
helper to ensure that the buffer status is marked properly.
Fixes some CTS pipeline stats query tests, and
KHR-GL45.direct_state_access.queries_functional
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
Two things were off:
- valid range was not updated, which could affect waiting for future
maps
- fencing was done manually instead of using the *_resource_validate
helper, which resulted in a missed dirty buffer flag being set
Fixes: KHR-GL45.direct_state_access.buffers_clear
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
This fixes a segfault exposed by a29d63ecf7 which occurs when swr is
used on an unsupported architecture.
v2: re-work to place logic in xmesa_init_display
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: George Kyriazis <george.kyriazis@intel.com>
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Fixes assert in the glsl-1.50-gs-max-output-components piglit test.
Note that the double handling will only work for doubles that
don't take up multiple slots i.e. double and dvec2. However
dual slot double handling is an existing bug which is made no
worse by this patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Delaying unrolling and allowing NIR to do it instead has been shown
to result in better code in drivers such as i965. shader-db results
appear to show the same is true for radeonsi.
The other advantage is that using NIR unrolling improves compile
times significantly.
Totals from affected shaders:
SGPRS: 9624 -> 10016 (4.07 %)
VGPRS: 6800 -> 6464 (-4.94 %)
Spilled SGPRs: 0 -> 2 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 359176 -> 332264 (-7.49 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1355 -> 1432 (5.68 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes the following piglit tests:
tests/spec/arb_tessellation_shader/execution/double-array-vs-tcs-tes.shader_test
tests/spec/arb_tessellation_shader/execution/double-vs-tcs-tes.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>