Only clang has this argument (at least as of clang 8 and gcc 9), which
errors when using the gcc empty initializer syntax in C:
```C
struct foo f = {};
```
GCC has a warning for this, but only when using -Wpedantic, which is a
lot of noise to lose useful warnings in.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Most of these will never actually be compiled by windows, but in the
interest of being able to make using struct foo = {}; an error and
avoiding breaking windows removing a handful of safe uses seems like a
good trade off.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
The vertex cache uses the full 48-bit address on Gen11+. See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.
Interestingly, the docs don't mention index buffers as needing a
workaround at all. So either we've been overzealous, or the docs
never got updated to record that. Which begs the question of whether
the issue there was fixed, if there was one...
Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).
The slices table and most of the other layout fields in the
freedreno_resource moves into fdl_layout.
v2: Changes by anholt to not have duplicate fields, which was introducing
a surprising behavior change in resource layout (using the
level_linear helper before the setup of the shadowed fields)
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Clark <robdclark@chromium.org>
This gets the worst of the sed required for shared resource layout out of
the way. The texture layout comment is dropped now that we're referencing
the shared header, which has a more complete description.
Acked-by: Rob Clark <robdclark@chromium.org>
This will be used for sharing resource layout code between freedreno and
tu. Mostly copied from a commit by Rob, with a new location and the slice
struct renamed for consistency.
Acked-by: Rob Clark <robdclark@chromium.org>
Multiple places were doing the same thing to get the tile mode of a level,
so refactor it out. This will make the shared resource helper transition
cleaner.
Acked-by: Rob Clark <robdclark@chromium.org>
This factors out a bit of duplicated code, but will also make the shared
resource layout transition process clearer.
Acked-by: Rob Clark <robdclark@chromium.org>
The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:
result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);
->
temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);
nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.
Back in 2016 in 7be8d07732 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first. This helped his cases, but I
believe introduced this failure mode. Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things). There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.
Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):
Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
-65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
-68.8066% +/- 6.49523% (was 4.8s)
v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
tricky code. Runtime of uniform_api.random.3 down -0.790299% +/-
0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
fixing infinite loops.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
My motivation was to clarify the changes in the following commit, but
incidentally, it reduces runtime of
dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy
testcase) by -5.39524% +/- 2.21179% (n=15)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In order to have nir_opt_algebraic be able to do further algebraic
work on the output of a replacement, we need to maintain the
automaton's state.
Reviewed-by: Eric Anholt <eric@anholt.net>
If not all bits are cleared, then BLT needs to be given the current clear
value and not the new one.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Fixes:
Wolfenstein:Youngblood (w/o shader_ballot)
dEQP-VK.descriptor_indexing.combined_image_sampler_in_loop_with_lod
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
I don't think the bug applies for global/scratch instructions and
load_barycentric_at_sample selection expects this feature to work.
Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Commit 0899bf55 made some deqp-gles3 tests related to RGB8 PBOs fail
on R600 because it exposed PIPE_FORMAT_R8G8B8_UNORM and R600 doesn't
propely handle this. Disabling this format also for buffers fixes the
issue.
In addition, disabling also the related RGB8 integer formats for buffers
fixes some deqp-gles3 tests:
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_3d
Fixes: 0899bf55
st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM
Closes#2118
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’:
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
4567 | return ac_build_intrinsic(ctx, intr, type, params, 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4568 | AC_FUNC_ATTR_READNONE);
| ~~~~~~~~~~~~~~~~~~~~~~
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
In the case of promoted extensions we can end up with an entrypoint that
we support being an alias of an entrypoint we do not support. For
instance, if an extension gets promoted from EXT to KHR, the EXT entry-
points may be aliases of the KHR ones. We want to leave everything as
EXT until we get around to advertising the KHR so that we don't break
things when we update the XML and headers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The current code can only handle enum aliases if the original enum is
declared first followed by the alias as we walk the XML in a linear
fashion. This commit allows us to handle aliases where the alias
declaration comes before the thing it's aliasing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We may have replaced the backing storage for a texture buffer while it
was unbound, at which point iris_rebind_buffer would not have caught it
and updated it. We need to ensure that the current resource's address
matches the one our SURFACE_STATE points at. If not, update addresses
and re-upload the SURFACE_STATE.
Shader images and buffers do not suffer from this problem because we
re-stream the surface state on every set call, since there isn't a
created CSO object for those with a saved SURFACE_STATE. Constant
buffers are also currently re-streamed (we pitch the SURFACE_STATE
on every set_constant_buffer call). Surfaces would need this
treatment (as they're created CSOs) except that we never swap out
their backing storage today (we only do it for buffers), so it's OK
for now.
Fixes misrendering in Unreal 4 demos (Elemental, Matinee Fight Scene).
Huge thanks to Andrii Simiklit for tracking down the problem - it was
quite difficult to find! Also fixes Andrii's new Piglit test for the
bug, 'arb_texture_buffer_object-re-init'.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1365
When replacing the backing storage for texture buffers, image buffers,
and so on, we may need to update the "Surface Base Address" field in
any corresponding SURFACE_STATE. This is easier to accomplish if we
have a copy on the CPU - we can just compare the current field, update
it, and re-upload.
This patch adds a CPU-side copy to the new iris_surface_state wrapper
struct, and reworks allocation and upload to fill things out on the
CPU copy first, then upload that to the GPU when finished.
This will be necessary to fix iris_invalidate_resource bugs shortly.
Technically, we never replace the backing storage for pipe_surfaces
(render targets), so we don't need to make this change there. However,
it's nice to have surfaces, sampler views, and image views handled
similarly. Plus, if we ever wanted to swap out backing storage for
busy textures, we'd need this infrastructure.
v2: Properly free memory (caught by Andrii Simiklit)
Today, we only have a state reference to the GPU buffer containing our
uploaded SURFACE_STATEs. However, we're going to want a CPU-side copy
soon. Making a wrapper struct means we can talk about both together,
and also put both in the field called "surface_state".
We can just compare the VERTEX_BUFFER_STATE address field to the
current BO's address. When calling rebind, we've already updated
the resource to the new buffer, but the state will have the old
address.
Mutating fields of global resources is generally not safe, and the only
reason we were doing it was to avoid passing an extra parameter to
the fill_surface_state helper.
This takes a noticable amount of time in piglit and some tests don't
need it.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable. A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.
Results for v3d:
total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)
v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
patch, include SSA defs in live values so we can switch to bottom-up
if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
successfully register allocate on GLES3.1 dEQP. Make sure that
discards don't move after store_output. Comment spelling fix.
At the same time, update etna_clear_blit_pack_rgba to work with integer
formats.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
At the same time, this change allows using BLT for 8bpp formats
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Use the extended format if an such a format was passed.
v1 -> v2:
- set FORMAT_MASK bit when using ext PE format as suggested
by Wladimir J. van der Laan
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
No functional changes as the driver still uses the depth+stencil
pipeline.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This refactoring will help for creating more decompress pipelines.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>