Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
The previous iteration algorithm would advance the field pointer right
after we advance the group. This meant that you would end up with
skipping the first field of the group. In the common case, where the
only field is a struct (e.g. 3DSTATE_VERTEX_BUFFERS), it would get
skipped entirely.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
They serve no purpose other than to just fill empty space in the packet
so each dword has something. Just disallowing empty groups is a bit
easier on some of the tools. This does not change the generated packing
headers in any way.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gather operations in both GLSL and SPIR-V require a sampler. Fixes
gathers returning garbage when using separate texture/samplers (on AMD,
was using an invalid sampler descriptor).
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We should use the result type of the OpSampledImage opcode, rather than
the type of the underlying image/samplers.
This resolves an issue when using separate images and shadow samplers
with glslang. Example:
layout (...) uniform samplerShadow s0;
layout (...) uniform texture2D res0;
...
float result = textureLod(sampler2DShadow(res0, s0), uv, 0);
For this, for the combined OpSampledImage, the type of the base image
was being used (which does not have the Depth flag set, whereas the
result type does), therefore it was not being recognised as a shadow
sampler. This led to the wrong LLVM intrinsics being emitted by RADV.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It turned out that with recent changes that call into dri3 from glFinish(),
it appears like different thread end up waiting for X events simultaneously,
causing deadlocks since they steal events from eachoter and update the dri3
counters behind eachothers backs.
This patch intends to improve on that. It allows at most one thread at a
time to wait on events for a single drawable. If another thread intends to
do the same, it's put to sleep until the first thread finishes waiting, and
then it rechecks counters and optionally retries the waiting. Threads that
poll for X events never pulls X events off the event queue if there are
other threads waiting for events on that drawable. Counters in the
dri3 drawable structure are protected by a mutex. Finally, the mutex we
introduce is never held while waiting for the X server to avoid
unnecessary stalls.
This does not make dri3 drawables completely thread-safe but at least it's a
first step.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102358
Fixes: d5ba75f888 "st/dri2 Plumb the flush_swapbuffer functionality through to dri3"
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: include the file also in the meson.build (Eric Engestrom).
Fixes: f1e1c60ff6 ("etnaviv: Update from rnndb")
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Patch adds support and capability to match with new surface attribute,
component type. Currently no configs with floating point type are exposed.
With this change, following dEQP test starts to pass:
dEQP-EGL.functional.choose_config.color_component_type_ext.dont_care
dEQP-EGL.functional.choose_config.color_component_type_ext.fixed
dEQP-EGL.functional.choose_config.color_component_type_ext.float
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This should not be needed, if the allocation fails an error is
returned and the host should handle it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
All radv_fence fields are initialized here.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For consistency and it might help for debugging purposes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead just dirty RADV_CMD_DIRTY_FRAMEBUFFER and it will be
re-emitted if necessary before the next draw.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Just after the vertex shader.
This seems to give a minor boost for, at least, Serious Sam
Fusion 2017 and Dawn of War 3. I don't see any real impacts
with The Talos Principle.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Will be used for VBO descriptors prefetching.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For consistency because this function will also prefetch VBO
descriptors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is what we do in the condition too, so it makes sense.
v2: Only compute without_array() once (Ilia).
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This register is the same on all gpus so far, so emit it in one
place and also for the pre-gfx9 gpus set the value in the pipeline
creation.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves some calculations of register values into the pipeline
construction, it saves looking at outinfo in the cmd buffer emit.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In case an instruction only writes one register, and it is .x, we can
skip the extra level of fanout indirection.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Seems to fix dEQP compute related tests.. and matches what i965 does, so
perhaps there is some assumption that std430 packing is on by default
somewhere in NIR?
Add a new pass that inserts additional dependencies, rather than simply
relying on SSA srcs added in the nir->ir3 frontend. This makes it
easier to deal with barriers, but the additional false deps also lets us
deal properly with ensuring a write depends on all previous reads.
Since conversion to barrier instructions is lossy (ie. just knowing the
instruction doesn't tell us enough about what other instructions the
barrier applies to), use barrier_class/barrier_conflict fields in the
ir3_instruction to retain this information.
This could probably be relaxed somewhat by considering *which* array/
buffer/image variable is being referenced. Ie. a write to buffer A
can overtake a read from buffer B, if B is not coherent. (right?)
Signed-off-by: Rob Clark <robdclark@gmail.com>
I want to add a growable array to ir3_instruction, so we can append
false dependencies for purposes of scheduling barriers, atomics, and
dealing with write after read hazards.
Just code motion preparing for next patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Atomic instructions take a different # of src args depending on .g or .l
variant, split these out into different helpers with INSTR*F() helper
macro that lets you specify instruction flag.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some instructions (like barriers) have no dst, which causes problems
with dereferencing a NULL dst. Flip the logic around to reject opc's
that can't be a type of move first, to filter out those instructions.
Signed-off-by: Rob Clark <robdclark@gmail.com>
User consts and driver consts such as UBO addresses and immediates are
handled the same for all shader stages, so split out a shared helper for
these, to make it easier to add more.
Signed-off-by: Rob Clark <robdclark@gmail.com>
It is unfortunate that image state isn't a real CSO, since (at least for
a4xx/a5xx) it is a combination of sampler and "SSBO" image state, and it
would be useful to pre-compute the state block "register" values rather
than doing it at emit time.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In case the IR is NIR, the driver takes reference to the nir_shader.
Also, because there are no variants, we need to clone the shader,
instead of sharing the reference with gl_program, which would result
in a double free in _mesa_delete_program().
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Having both an ir3_compile (which was really context for compiling a
single shader variant) and ir3_compiler (which is the compiler object
that compiles all variants, ie. basically holds the RA regset) is a
bit confusing.
Signed-off-by: Rob Clark <robdclark@gmail.com>