Harvest the information gathered in the previous patch
inside of iris.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
GEN:BUG:14010455700 (lineage 1808121037):
"To avoid sporadic corruptions “Set 0x7010[9] when Depth Buffer
Surface Format is D16_UNORM , surface type is not NULL & 1X_MSAA"
Required for fixing ttps://gitlab.freedesktop.org/mesa/mesa/issues/2501.
GEN:BUG:1806527549:
"Set HIZ_CHICKEN (7018h) bit 13 = 1 when depth buffer is D16_UNORM."
This one could fix a GPU hang in some workloads.
v2: Implement WA in isl and add another similar WA (Jason).
v3: Add flushes before changing chicken registers (Jason)
v4: Depth flush and stall + end of pipe sync when changing registers
(Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
Even though the workaround description says:
"all the listed commands are non-pipelined and hence flush caused due
to pipeline mode change must not cause performance issues..."
My understanding is that we still need to have the flushes. Also, the
flushes are required not only to stall the pipeline, but also to clear
caches, so I don't think they can simply be discarded.
Additionally, while doing some testing that increased the number of
surface STATE_BASE_ADDRESS emitted, I got a lot more GPU hangs. Adding
these flushes fixes those hangs.
Fixes: b8fbb39a (iris: Implement Gen12 workaround for non pipelined
state)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3908>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3908>
Now that it uses ISL rather than genxml code, there's no need for it to
live as a vtable function inside the state module. We can just make it
a static inline helper in iris_resource.h so it's available throughout
the codebase.
Fixes: a4da6008b6 ("iris: Use mocs from isl_dev.")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3720>
Like Skylake, Gen12 requires a workaround for PIPE_CONTROLs using a
post-sync operation.
v2: Restrict to A0
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3405>
We only calculate them based on device info and never change them so
this seems like a reasonable place to put them. We could also put them
in the context, but that's not accessible from iris_init_*_context.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
SML is no longer in the L3$ on Gen11+. It's not incredibly clear from
the docs but no Gen11 platforms are in the list of platforms on which
this bit exists. Also, we've been always setting it false on Gen11 in
ANV and i965 thanks to GEN_L3P_SLM being zero with no ill effects.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
According to the BSpec, this should prevent hangs when using shaders
with large URB entries. A more precise fix can be done but it requires
re-arranging URB setup.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
Before flushing the instruction cache with a pipe control, we need to
use a CS Stall pipe control.
Ref: GEN:BUG:1409226450
Rework: Add stall-at-scoreboard (Lionel)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3457>
ALIGN() brilliantly uses uintptr_t, making it unsafe for use with 64-bit
GPU addresses in 32-bit builds of the driver. Use align64() instead,
which uses uint64_t.
Fixes assertion failures when running any 32-bit program on Tigerlake.
Fixes: 2e6a7ced4d ("iris/gen12: Write GFX_AUX_TABLE base address register")
Fixes: 0d0290bb3f ("intel/common: Add surface to aux map translation table support")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3507>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3507>
Having VERTEX_BUFFER_STATE.BufferSize greater than the size of
a bound vertex buffer allows shader to read uninitialized vertex
attributes from BO, instead of allowing hardware to return zeroes
on out-of-bounds access.
OpenGL spec "6.4 Effects of Accessing Outside Buffer Bounds" says:
"Robust buffer access can be enabled by creating a context with robust access
enabled through the window system binding APIs. When enabled, any command
unable to generate a GL error as described above, such as buffer object accesses
from the active program, will not read or modify memory outside of the data
store of the buffer object and will not result in GL interruption or termination.
Out-of-bounds reads may return values from within the buffer object or zero
values."
Fixes three webgl tests:
conformance/rendering/out-of-bounds-array-buffers.html
conformance2/rendering/out-of-bounds-index-buffers-after-copying.html
conformance2/rendering/element-index-uint.html
See #1996
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
I copy and pasted some of the boilerplate but never the implementation.
For now, ASTC 5x5 is disabled and faked via uncompressed RGBA; let's
delete these remnants until such a time when we implement it properly.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
A lot of the brw_*_prog_key fields are for emulating features on legacy
hardware that iris doesn't support. In particular, all of the texture
swizzle fields take up a lot of space. These dead fields make hashing
the shader keys more expensive than it ought to be.
We introduce iris-specific keys with only the information we need, and
translate them to brw keys when actually compiling new variants. This
way, key comparisons can use the small keys. The size reductions are:
VS: 328 bytes -> 8 bytes
TCS: 312 bytes -> 24 bytes
TES: 304 bytes -> 24 bytes
GS: 284 bytes -> 8 bytes
FS: 304 bytes -> 16 bytes
CS: 280 bytes -> 4 bytes
Scores for the Piglit drawoverhead microbenchmark case with a shader
program change improve by roughly 30%.
Reviewed-by: Eric Anholt <eric@anholt.net>
i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.
iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone, So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.
TCCNTLREG contains additional L3 cache write merging optimizations.
The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)
Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging". This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.
It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.
Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed
frequency, according to Felix Degrood. I didn't see any improvements
at out-of-the-box power management settings, however.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The following programming note shows up in all 3DSTATE_CONSTANT_*
packets:
"The sum of all four read length fields must be less than or equal to
the size of 64."
The backend compiler should guarantee this for us, so let's just add a
check here.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).
There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.
v2 (Suggestions from Caio):
- use max_length instead of large_buffers.
- remove UNUSED and use #if GEN_GEN >= 12 instead.
- inline "buffers" and drop BITSET_RANGE() usage.
- add assert(n <= max_pointers)
- move emit to outside of the loop.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Split into a function the logic to gather the push constant buffers,
which now stores them in struct push_bos. Another function is added to
emit the packet, using data from the push_bos struct.
This will be useful when adding a new function for emitting push
constants for newer platforms.
v2 (Suggestions from Caio):
- rename 'n' -> 'buffer_count'
- remove large_buffers (for now)
- initialize push_bos
- remove assert
- change for() condition (i <= 3 -> i < 4)
v3:
- Add comment about size limit.
- Rework "shift" logic and 'for' loop.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The vertex cache uses the full 48-bit address on Gen11+. See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.
Interestingly, the docs don't mention index buffers as needing a
workaround at all. So either we've been overzealous, or the docs
never got updated to record that. Which begs the question of whether
the issue there was fixed, if there was one...
Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).
We may have replaced the backing storage for a texture buffer while it
was unbound, at which point iris_rebind_buffer would not have caught it
and updated it. We need to ensure that the current resource's address
matches the one our SURFACE_STATE points at. If not, update addresses
and re-upload the SURFACE_STATE.
Shader images and buffers do not suffer from this problem because we
re-stream the surface state on every set call, since there isn't a
created CSO object for those with a saved SURFACE_STATE. Constant
buffers are also currently re-streamed (we pitch the SURFACE_STATE
on every set_constant_buffer call). Surfaces would need this
treatment (as they're created CSOs) except that we never swap out
their backing storage today (we only do it for buffers), so it's OK
for now.
Fixes misrendering in Unreal 4 demos (Elemental, Matinee Fight Scene).
Huge thanks to Andrii Simiklit for tracking down the problem - it was
quite difficult to find! Also fixes Andrii's new Piglit test for the
bug, 'arb_texture_buffer_object-re-init'.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1365
When replacing the backing storage for texture buffers, image buffers,
and so on, we may need to update the "Surface Base Address" field in
any corresponding SURFACE_STATE. This is easier to accomplish if we
have a copy on the CPU - we can just compare the current field, update
it, and re-upload.
This patch adds a CPU-side copy to the new iris_surface_state wrapper
struct, and reworks allocation and upload to fill things out on the
CPU copy first, then upload that to the GPU when finished.
This will be necessary to fix iris_invalidate_resource bugs shortly.
Technically, we never replace the backing storage for pipe_surfaces
(render targets), so we don't need to make this change there. However,
it's nice to have surfaces, sampler views, and image views handled
similarly. Plus, if we ever wanted to swap out backing storage for
busy textures, we'd need this infrastructure.
v2: Properly free memory (caught by Andrii Simiklit)
Today, we only have a state reference to the GPU buffer containing our
uploaded SURFACE_STATEs. However, we're going to want a CPU-side copy
soon. Making a wrapper struct means we can talk about both together,
and also put both in the field called "surface_state".
We can just compare the VERTEX_BUFFER_STATE address field to the
current BO's address. When calling rebind, we've already updated
the resource to the new buffer, but the state will have the old
address.
Mutating fields of global resources is generally not safe, and the only
reason we were doing it was to avoid passing an extra parameter to
the fill_surface_state helper.
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In 2ca0d913ea, we began updating cso_fb->layers to the actual layer
count, rather than 0. This fixed cases where we were setting "Force
Zero RTA Index Enable" even when doing layered rendering. Sadly, it
also broke the check entirely: cso_fb->layers is now 1 for non-layered
cases, but the Force Zero RTA Index check was still comparing for 0.
Fixes: 2ca0d913ea ("iris: Fix framebuffer layer count")
When starting a BLORP operation, we do the BTI-change flush. However,
when ending it and transitioning back to regular drawing, we change the
render target again - without a set_framebuffer_state() call. We need
to do the BTI flush there too. BLORP flags IRIS_DIRTY_RENDER_BUFFER
now, which will cause the next draw to get the BTI flush again.
(explanation of fix by Ken)
Fixes: 2b956a093a ("iris: totally untested icelake support")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>