This is only needed to fix rendering corruptions caused by not flushing
after doing a resolve operation. The resolve now does all the needed
flushing so this is unnecessary.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This also removes an unneeded brw_render_cache_set_check_flush() call.
We were calling it in the case where the surface got resolved to satisfy
the flushing requirements around resolves. However, blorp now does this
itself, so the extra is just redundant.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This commit adds a new unified interface for doing resolves. The basic
format is that, prior to any surface access such as texturing or
rendering, you call intel_miptree_prepare_access. If the surface was
written, you call intel_miptree_finish_write. These two functions take
parameters which tell them whether or not auxiliary compression and fast
clears are supported on the surface. Later commits will add wrappers
around these two functions for texturing, rendering, etc.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This enum describes all of the states that a auxiliary compressed
surface can have. All of the states as well as normative language for
referring to each of the compression operations is provided in the
truly colossal comment for the new isl_aux_state enum. There is also
a diagram showing how surfaces move between the different states.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
We have two different bits of resolve code for render targets: one in
brw_draw where it's always been and one in brw_context to deal with sRGB
on gen9. Let's pull them together.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
There are several places where we were resolving the entire miptree
when we really only needed to resolve a single slice. Let's avoid the
unneeded resolving.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This way it happens before we call get_aux_state.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Previously, we had two checks for can_fast_clear and a tiny bit of
shared code in between. This commit pulls all of the fast clear code
together and duplicates the tiny bit that declares some surface structs
and calls blorp_surf_for_miptree. The duplication is no real loss and
we're about to change the two in slightly different ways.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
None of the other methods such as blit work with CCS either so we need
to do the resolve for all maps. This change also makes us only resolve
the one slice we're mapping and not the entire image.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There is exactly one caller so it's a bit pointless to have all of this
plumbing. Just inline it at the one place it's used.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The new version now takes a range of levels as well as a range of
layers. It should also be a tiny bit faster because it only walks the
resolve_map list once instead of once per layer.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
For clang, enums are unsigned by default and gives the following warning:
external/mesa3d/src/mesa/main/buffers.c:764:21: warning: comparison of constant -1 with expression of type 'gl_buffer_index' is always false [-Wtautological-constant-out-of-range-compare]
if (srcBuffer == -1) {
~~~~~~~~~ ^ ~~
Replace -1 with an enum value to fix this.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
clang gives a warning in blob_overwrite_bytes because offset type is
size_t which is unsigned:
src/compiler/glsl/blob.c:110:15: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
if (offset < 0 || blob->size - offset < to_write)
~~~~~~ ^ ~
Remove the less than 0 check to fix this.
Additionally, if offset is greater than blob->size, the 2nd check would
be false due to unsigned math. Rewrite the check to avoid subtraction.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
This removes the linear search which is fail when number of variables
goes up to 30000 or so.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of having to search the whole array, just use the whole
thing and store a valid bit in there with the rename.
Removes this from the profile on some of the fp64 tests
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
with some of the fp64 emulation, we are seeing shaders coming in with
> 32K temps, they go out with 40 or so used, but while doing register
renumber we need to store a lot of them.
So bump this fields back up to 32-bit.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Also solve "outinfo may be used uninitialized" warning by putting in an
unreachable().
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Most functions are only inspecting nir, so nir related arguments can be
marked const. Some more can be done if/when some nir changes are
accepted.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: Add a func pointer to radeon_winsys to support radeon later.
Change-Id: I614ea71424f9e5c97e4ae68654315d28c89eaa5f
Signed-off-by: Samuel Li <Samuel.Li@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Also, prepare for the next commit by correcting some coding style
changes. This should be all non-functional changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
These will need to be in place to avoid regressions when
removing these includes from the u_dynarray
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.
v2: Marek - set the flags outside the loop
- also clear and set framebuffer.do_update_surf_dirtiness there
- do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that LLVM IR looks like CSE has been run on it. It's also recommended
by the instruction combining pass.
This also fixes:
- GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash)
- piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail)
The code size decrease is positive, the register usage isn't. There is
a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown
and GRID Autosport.
EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE.
SGPRS: 1935420 -> 1938076 (0.14 %)
VGPRS: 1645504 -> 1645988 (0.03 %)
Spilled SGPRs: 2493 -> 2651 (6.34 %)
Spilled VGPRs: 107 -> 115 (7.48 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 1512 -> 1516 (0.26 %) dwords per thread
Code Size: 61981592 -> 61890012 (-0.15 %) bytes
Max Waves: 371847 -> 371798 (-0.01 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We can use a union in si_shader_key::mono.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Heaven LDS usage for LS+HS is below. The masks are "outputs_written"
for LS and HS. Note that 32K is the maximum size.
Before:
heaven_x64: ls=1f1 tcs=1f1, lds=32K
heaven_x64: ls=31 tcs=31, lds=24K
heaven_x64: ls=71 tcs=71, lds=28K
After:
heaven_x64: ls=3f tcs=3f, lds=24K
heaven_x64: ls=7 tcs=7, lds=13K
heaven_x64: ls=f tcs=f, lds=17K
All other apps have a similar decrease in LDS usage, because
the "outputs_written" masks are similar. Also, most apps don't write
POSITION in these shader stages, so there is room for improvement.
(tight per-component input/output packing might help even more)
It's unknown whether this improves performance.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If the XRGB view is sampling from an ARGB svga format, change
PIPE_SWIZZLE_W to PIPE_SWIZZLE_1 for all channels.
Previously we unconditionally set PIPE_SWIZZLE_1 on the alpha channel which
could be both insufficient and incorrect.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When deciding to create a view with or without an alpha channel we need to
look at the SVGA3D format and not the PIPE format.
This fixes the glx-tfp piglit test for dri3/xa.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Gallium RGB textures may be backed by imported ARGB svga3d surfaces. In those
and similar cases we need to set the alpha value to 1 when sampling.
Fixes piglit glx::glx-tfp
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
For the purpose of surface sharing, treat SVGA3D_R5G6B5 and
SVGA3D_B5G6R5_UNORM as identical formats.
This fixes the following piglit tests with dri3/xa:
glx@glx-visuals-depth -pixmap
glx@glx-visuals-stencil -pixmap
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Singh Rawat <drawat@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
It appears like the GLX_EXT_buffer_age extension also prevents Compiz /
Ubuntu Unity from performing partial buffer swaps when it otherwise
feels like doing so. So try to get them back again. We also disable
GLX_OML_sync_control since it appears it had a favourable impact on
gnome-shell.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>