With the ssao demo from Vulkan demos:
radv/rx480: 440->440fps
anv/haswell: 24->34 fps
The demo does a 0->32 loop across a ubo with 32 members.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This assert was firing just running demos.
Jason said it should be this.
Fixes: 6c7720ed78 (anv/wsi: Allocate enough memory for the entire image)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This respects the size from the range setting like ssbo.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These paths are again 90% the same, consolidate them into
one.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are 90% the same code, consolidate them into a couple of
common codepaths.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
One binding to bind them all, these are all the same thing.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
this adds automatic size support to the atomic buffer code,
but also realigns the code to act like the ubo/ssbo code.
v1.1:
add missing blank lines.
reindent one block properly.
check for NullBufferObj.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks has a MOV that hits
this validation path. MOVs don't have a src1 file, but calling
brw_inst_src1_type() was tripping on src1.file being BRW_IMMEDIATE_VALUE
and the hw_type being something invalid for immediates.
To work around this, just pretend src1 is src0 if there isn't a src1.
Fixes: 2572c2771d (i965: Validate "Special
Requirements for Handling Double Precision Data Types")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Fixes 'KHR-GL45.copy_image.functional' on Nouveau and i965.
v2: (by Kenneth Graunke)
Rewrite patch according to Jason Ekstrand's review feedback.
This makes it handle differing strides, which i965 needed.
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Older kernels fail the va_op with this flag set. If the kernel
supports GFX9 usefully, it will also support this flag.
Fixes: e8d57802fe "radv/gfx9: allocate events from uncached VA space"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jason and I investigated several OpenGL CTS failures where the tests
bind the same texture for rendering and texturing, at the same time.
This has defined results as long as the reads happen before writes,
or the regions are non-overlapping. Normally, this just works out.
However, CCS can cause problems. If the shader is reading one set of
pixels, and writing to different pixels that are adjacent, they may end
up being covered by the same CCS block. So rendering may be writing a
CCS block, while the sampler is trying to read it. Corruption ensues.
Disabling CCS is unfortunate, but safe.
Fixes several KHR-GL45.texture_barrier.* subtests.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This lowers ffma to a * b + c.
This seems like it should keep Marek happiest, so
we'd never get to the fma instruction emission code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
So it appears the Vulkan SPIR-V fma opcode can be equivalent to a
mad operation, and the fma hw opcode on AMD hw is issued like a double
opcode so is slower. Also the radeonsi stack does this.
This appears to improve performance on a number of games from Feral,
and thanks to Feral for noticing the problem.
I'm reposting this one as Marek indicated he thinks this is what
we should be doing on AMD hw.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The HW will halt when you hit a HALT packet, or when you hit the end
address. Tell CLIF if there's an end address is so that it can stop
correctly. (There was usually a 0 byte after the CL, so it would stop
anyway).
In order to keep early-Z from writing early in a discard shader, you need
to set the "modifies Z" bit in the shader state (which the new
prog_data.discards will indicate). Then, in the shader we do a TLB write
to make Z passthrough happen (the QPU result is ignored, so we use a NULL
source).
I had base_vertex hacked into the shader state setup like in vc4, but it's
not correct for big offsets. Using the proper packet is easier and
hopefully means we can re-emit shader state setup less frequently.
These existed so I could unpack just the sub-id field to switch on in the
old manual CLIF dumper. The new codegen handles sub-id automatically, but
only if these stub packets aren't there with an implicit sub-id=0.
V3D 3.3 is a continuation of the 3D implementation in VC4 (v2.1 and v2.6).
V3D 3.3 introduces an MMU (no more CMA allocations) and support for
GLES3.1. This driver is not currently conformant, though that will be a
target as soon as possible.
V3D 3.x parts use a new texture tiling layout common across many Broadcom
graphics parts including and the HVS scanout engine. It also massively
changes the QPU instructions, introducing a common physical register file
(no more A/B split) and half-float instructions, while removing the 4x8
unorm instructions in favor of half-float for talking to fixed function
interfaces. Because so much has changed, vc5 is implemented in a separate
gallium driver, using only the XML code-generation support from vc4.
v2: Fix tile layout for 64bpp textures. Fix texture swizzling for 32-bit
returns. Fix up a bit of MRT setup. Sync the simulator to kernel
behavior a bit more. Improve uniform debugging code. Rebase on
QIR->VIR rename. Move texture state mostly to the CSOs. Improve
cache flushing on the simulator. Fix program deletion
use-after-frees.
Acked-by: Dave Airlie <airlied@gmail.com> (uabi plan)
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> (uabi plan)