I'm not sure why I thoughtt here was an off-by-one, other than maybe bad
data collection.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 66411521ea ("etnaviv: combine translate_ts_sampler_format/translate_msaa_format")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Gen11 SLM is not on L3 anymore, so now the hardware has two separate
fences. Add a way to control which fence types to use.
At this time, we don't have enough information in NIR to control the
visibility of the memory being fenced, so for now be conservative and
assume that fences will need a stall. With more information later
we'll be able to reduce those.
Fixes Vulkan CTS tests in ICL:
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp
The whole set of supported tests in dEQP-VK.memory_model.* group
should be passing in ICL now.
v2: Pass BTI around instead of having an enum. (Jason)
Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets
transformed into two. (Jason)
List tests fixed. (Lionel)
v3: For clarity, split the decision of which fences to emit from the
emission code. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This reverts commit 812ce2ce9e.
We massively regress with the reverted patch. So in the meantime, take
it out.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Only Z24S8 is properly supported right now, so let's be careful. Fixes a
number of issues relating to improper Z/S handling. The most obvious is
depth buffers with incorrect strides, which manifests in truly bizarre
ways and can happen commonly with FBOs.
Fixes WebGL (Aquarium runs, etc).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In emit_vertex we optimize storage if the output mask does not
have all bits set. Do the same in the epilogue so the indices
actually match up.
Fixes dEQP-VK.geometry.input.basic_primitive.points because it
outputs PSIZE with an output mask of 1, which cause the generic
attribute for the color to be loaded from the wrong indices.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The driver shouldn't set the copy shader bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The dimensions also have to be adjusted if the number of supported
mip levels is changed.
This fixes dEQP-VK.api.info.image_format_properties.3d.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For some reasons D32_SFLOAT is also affected on GFX10, it works
fine with previous generations.
This fixes some dEQP-VK.renderpass2.depth_stencil_resolve.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We were failing to flag the program dirty when it changed. Also, we
were unnecessarily setting key->input_vertices for SINGLE_PATCH mode,
which would reduce program cache hits. Only set it if needed.
There are two constructors for glsl_struct_field with different
parameters. Instead of repeating them for both constructors, this
patch adds a convenience macro. This will make it easier to add a
third constructor in a later patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
All of the builtin variables mentioned in the GLSL ES spec and the
extensions include a precision declaration which is different
depending on what the variable is used for. This patch makes it set
the corresponding precision when creating the variable. This will make
a difference once we start using the precision information for
optimisation. Previously all of the builtin variables ended up with a
precision of NONE.
v2: Made gl_PointSize and gl_FragCoord highp since GLSL ES 3.00. Fixed
gl_MaxViewPorts to always be highp. (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
We were recompiling the softfp64 library of functions from GLSL to NIR
every time we compiled a shader that used fp64. Worse, we were ralloc
stealing it to the GL context. This meant that we'd accumulate lots of
copies for the lifetime of the context, which was a big space leak.
Instead, we can simply stash a single copy in the GL context, and use
it for subsequent compiles. Having a single copy should be fine from
a memory context point of view: nir_inline_function_impl already clones
the necessary nir_function_impl's as it inlines.
KHR-GL45.enhanced_layouts.ssb_member_align_non_power_of_2 was previously
OOM'ing a system with 16GB of RAM when using softfp64. Now it finishes
much more quickly and uses only ~200MB of RAM.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It will help for GS as NGG on GFX10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For selecting a different SQ_EXP_POS target.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When they are exported to the next stage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data. I'd like to add another thing or two and need a
place to put it. This commit adds a new brw_base_prog_key struct which
contains those two common bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're about to change the type of key to be brw_base_prog_key and that
will mean blindly adding the key size without a cast will lead to the
wrong calculation. It's safer to cast to char * first anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm not 100% sure how this ever worked because gem_create usually shoots
you if the BO size isn't page-aligned.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is the MOCS setting used for the A64 stateless messages which we
sometimes use for SSBO operations.
Fixes: 48ed2a7bb0 "anv: Implement VK_EXT_buffer_device_address"
Fixes: 79fb0d27f3 "anv: Implement SSBOs bindings with GPU addr..."
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's not clear the hardware really has a maximum which confuses dEQP;
clamp to whatever we report as our maximum.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In preparation for a Panfrost-based non-Gallium driver (maybe
Vulkan...?), hoist everything except for the Gallium driver into a
shared src/panfrost. Practically, that means the compilers, the headers,
and pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The reason for doing this is two-fold:
1. These passes are likely to be shared with the Bifrost compiler
Therefore, we don't want to restrict them to Midgard
2. The coding style is different (NIR-style vs Panfrost-style)
The NIR passes are candidates for moving upstream into
compiler/nir, so don't block that off for stylistic reasons
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>