TCCNTLREG contains additional L3 cache write merging optimizations.
The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)
Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging". This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.
It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
TCCNTLREG contains additional L3 cache write merging optimizations.
The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)
Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging". This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.
It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.
Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed
frequency, according to Felix Degrood. I didn't see any improvements
at out-of-the-box power management settings, however.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
TCCNTLREG contains additional cache programming settings. In
particular, there are several write combining controls we'd like to use.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This makes simple_mtx_destroy set the counter to an invalid canary
value and then makes lock/unlock assert that the value is legal.
That way, calling lock/unlock after destroy will assert fail,
rather than deadlocking or potentially even working.
This has caught real deadlocks in dEQP multithreaded tests (in st/mesa
shader variant zombie list handling), which have since been fixed.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
In particular, we need to invalidate the LRZ state when we cannot be
confident in what the Z state would be during rendering:
1) depth test modes not supported by LRZ
2) stencil test, which would require full rasterization and stencil
test in the binning pass (whereas LRZ normally just needs to
determine the min and max z value in an 8x8 quad)
Signed-off-by: Rob Clark <robdclark@chromium.org>
This was reverted needlessly because if was part of another series.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.
Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Again, no real effect, just the name of a temporary build file.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This has no real effect other than the names of the temporary files in
the build folder.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
It either clears the whole HTILE buffer or part of it depending
on the HTILE mask parameter.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I don't think this makes much differences and a potential clear
following the initialization will overwrite HTILE anyways.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For depth+stencil images, the driver might use an optimized path
if only one aspect is cleared. It either clears the depth or the
stencil part of HTILE. Because the two separate aspects might use
the same HTILE memory we have to synchronize.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
PA_SC_AA_CONFIG might be updated when conversative rasterization is
enabled. Because the driver only re-emits the multisample state if
the number of samples is different, that register value might not
be updated correctly.
Found by inspection, doesn't fix anything known.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
They don't have to be updated dynamically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Previously the flush was done before the call to st->flush but
could lead to problems as FLUSH_VERTICES could push some work
that would change the backbuffer (or modify it).
With this commit, all the backbuffer flushing code is executed
right before the call to st_flush.
Closes: https://gitlab.freedesktop.org/drm/amd/issues/842
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205049
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The new callback is called right before the flush is done to allow
users of st->flush to do some work after all the previous work has
been flushed.
This will be used by dri_flush in the next commit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When using 3 planes, the sequence produces this chain:
plane0 -> plane2
This commit fixes this to produce:
plane0 -> plane1 -> plane2
Fixes: 86e60bc265 ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
instead of keeping the IR indefinitely in st_vp_variant.
This trivially fixes Selection/Feedback/RasterPos for NIR.
Reviewed-by: Dave Airlie <airlied@redhat.com>
gallivm receives these opcodes anyway because st_draw_feedback.c uses
shaders that were assembled for drivers, not llvmpipe.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
With 781a78 ("mesa: enable ARB_direct_state_access in compat for
GL3.1+), it's possible to have DSA with GL3.1+.
FTL creates a GL3.1 compat context, but fails the
_mesa_has_geometry_shaders(..) check in frame_buffer_texture.
Bump the compat version to pass the check.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Braces mismatch (flagged by CI, untested).
Fixes: 385d13f26d "util/atomic: Add a _return variant of p_atomic_add"
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>