This workaround disables batch level preemption for Polygon,
Trifan and Lineloop primitive topologies.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18456>
SOL unit issues, wa is to send PC with CS stall after SO_DECL.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18409>
Also while switching to GPGPU pipeline, make sure to flush the untyped
dataport cache. HDC pipeline flush bit must be set if we are flushing
untyped dataport L1 data cache.
v2: Add utrace support (Lionel)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16905>
Set write back L1 cache policy in STATE_BASE_ADDRESS instruction for A64
messages.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16905>
Wa_14010455700 is dependent on the format and sample count, but our
code to track whether or not it had been applied was only dependent on
the format.
As a result, we failed to enable the workaround when an app used a D16
2xMSAA buffer, then a D16 1xMSAA buffer right afterwards.
Make the workaround tracking code sample-dependent to fix this.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17859>
In iris_create_surface, use the fill_surface_states helper function instead of
an open-coded solution for compressed resources.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17598>
Before this patch, we were leaking compressed resources in iris_create_surface.
Specifically, when we failed to create an uncompressed ISL surface and view for
a compressed resource, we didn't unreference the resource pointer we referenced
into the pipe_surface.
Fix this by delaying the pipe_surface initialization code to after attempting
to create the uncompressed surface and view.
Cc: 22.1 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17598>
Before this patch, we were leaking surface states in iris_create_surface.
Specifically, when we failed to create an uncompressed ISL surface and view for
a compressed resource, we didn't free surface states we allocated for it.
Fix this by attempting to create the uncompressed surface and view before we
allocate the surface states.
Cc: 22.1 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17598>
[anholt: changed to make all drivers do the right thing by moving the
payload barycentric check into the compiler]
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>
Correct the max thread number for DG2+ platforms according
to below bspec.
Ref: Bspec: 47202
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17506>
Gallium drivers shouldn't be including src/mesa/main headers, but we're
picking up a rogue main/config.h via the compiler, so this code I ported
over from i965 kept compiling. Use the PIPE_* defines instead so that
we can stop including that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
For cases with lots of very small primitives, this may improve
performance because we're not executing those dead channels all the
time.
Shader-db reports no instruction or cycle-count changes. However, by
hacking up the driver to report when this optimization triggers, it
appears to affect about 10% of shader-db.
v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now,
because using VMask on those platforms allows us to perform the
eliminate_find_live_channel() optimization. However, XeHP doesn't
seem to have packed fragment shader dispatch, so we lose that
optimization regardless, and there's no reason not to avoid vmask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
Production Tigerlake and DG1 hardware shouldn't need this workaround.
It was only needed on the very first steppings which never went public.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16575>
This may slightly increase perf somewhere because the hardware can now
pre-cache binding tables. The real feature is that INTEL_DEBUG=bat now
dumps out surface states for compute.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15759>
anv sets the default EDSC flag, do the same for iris too
Fixes: 5ae278da18 ("iris: use vtbl to avoid multiple symbols, fix state base address")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15905>
FLUSH_HDC is sufficient to flush things out to L3, so we'd rather
use that where possible. It's also emulated via DATA_CACHE_FLUSH
on platforms where it isn't supported, so we can use it unconditionally.
We still use DATA_CACHE_FLUSH for invalidating the data cache, and to
flush the DC-tagged cachelines in L3 to be globally-observable.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Push constant loading is not coherent with L3 according to the document
that describes the hardware change for the vertex buffer L3 Bypass
Disable field.
If we've updated a push constant buffer with say, a blorp_buffer_copy,
we may need to flush both the render cache and the tile cache.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
We should be using the cache tracker for this. We can consider
this access IRIS_DOMAIN_OTHER_READ now that it's the catch-all
non-L3-coherent read-only access domain.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
Most clients are L3-coherent these days. However, there are some
notable exceptions, such as push constants, stream output, and command
streamer memory reads and writes.
With the advent of the tile cache, flushing the render or depth caches
alone are no longer sufficient for memory to become globally-observable.
For those, we need to flush the tile cache as well. However, we'd like
to avoid that for L3-coherent clients, as it shouldn't be necessary,
and is expensive.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
On Tigerlake, we use the data cache for reading indirect UBOs instead
of the sampler. But we still use the constant cache for direct UBO
access, so unfortunately we may access it through two different domains.
To work around this, we add a new domain for pull constants (UBOs),
which will be either constant+texture or constant+data.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
The bulk of IRIS_DOMAIN_OTHER_READ domain usage was the 3D sampler, but
there were also a few oddball cases like command streamer reads, blitter
access, and so on. The sampler is definitely L3 coherent, but some off
the more esoteric reads may not be, so I'd like to separate them, so
that OTHER_READ can become a non-L3-coherent kitchen-sink domain.
The sampler cases only need TEXTURE_CACHE_INVALIDATE, and can skip the
CONSTANT_CACHE_INVALIDATE we had on IRIS_DOMAIN_OTHER_READ.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
We were using IRIS_DOMAIN_OTHER_READ for read-only depth/stencil access
in an attempt to avoid unnecessary flushing; IRIS_DOMAIN_DEPTH_WRITE
could indicate read-write access.
However, IRIS_DOMAIN_OTHER_READ is clearly the wrong domain. Depth and
stencil data is read via the depth cache, while IRIS_DOMAIN_OTHER_READ
currently corresponds to the sampler cache and constant cache together
(although this will change in future patches).
It's unclear whether this hack was useful. For now, just drop it and
use the correct depth cache domain, even if it's marked as read-write.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15275>
This removes a couple of remaining history flushes which were
open-coded instead of using the iris_flush_and_dirty_for_history()
helper.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15738>
v2: Use CS_STALL instead of FLUSH_ENABLE in Iris (Lionel)
Add missing CS_STALL after SO_BUFFER change in Anv (Lionel)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: 22.0 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14947>
On Icelake and later, we can use a new 3DSTATE_BINDING_TABLE_POOL_ALLOC
command to update the location of the binder (buffer containing binding
table entries), rather than having to move Surface State Base Address
via a STATE_BASE_ADDRESS command. This has less stalling and also means
our surface addresses can remain relative to a fixed 4GB address range,
meaning we don't have to re-stream them any time the binder changes.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
This workaround is needed on all Gfx12.0 parts, but doesn't appear to be
necessary on XeHP. The other drivers do not appear to be applying this
workaround on those parts. As further evidence, we accidentally added
the 3DSTATE_BINDING_TABLE_POOL_ALLOC commands after switching back to
GPGPU mode, which would be an incorrect way to implement the workaround,
and things seem to be working.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
On Gfx11+, we're going to stop changing Surface State Base Address
and instead start changing the Binding Table Pool address instead.
So, rename a few things to track the last binder address, which is
what we're actually changing, regardless of how we program it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
Skylake and older use a 15:5 binding table pointer format, which means
our binder can be at most 64kB in size. Each binding table within the
binder must be aligned to 32B.
XeHP uses a new 20:5 binding table format, which allows us to increase
the binder size to 1MB while retaining the nice 32B alignment. Larger
binders mean fewer stalls as we update the base address for the binder.
Icelake and Tigerlake can either use the 15:5 format or an 18:8 format.
18:8 mode requires the base of each binding table to be aligned to 256B
instead of 32B, but it gives us a maximum binder size of 512kB.
We can store 64 binding table entries in a 256B chunk (256B / 4B = 64),
but only 8 entries in a 32B chunk (32B / 4B = 8). Assuming that most
binding tables have fewer than 64 entries, this means that with the 18:8
format, we're likely to be able to fit 2048 (512KB / 256B) tables into a
a buffer before needing to allocate a new one and stall.
Technically, the old format could also store 2048 binding tables per
buffer as well (64KB / 32B = 2048). However, tables that needed more
than 8 entries would need multiple 32B chunks. A single table would
take multiple aligned chunks, while with the larger 256B format, it
could fit in a single one.
This cuts binder resets by 6.3% on a Shadow of Mordor benchmark trace.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
The MI_FLUSH_DW post-sync operation uses the same encoding as the
PIPE_CONTROL one so we can use the same helper. Write PS Depth Count
is not supported, of course, as the blitter has no depth pipeline.
This means that we can write the timestamp register from the blitter.
Fixes: 604d97671b ("iris: Add support for flushing the blitter (hackily)")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15157>
XeHP scratch space is handled differently. Commit ae18e1e707
implemented support for it, but handled it differently between render
and compute shaders: it calculates scratch_addr differently and
doesn't pin the buffer on compute. Make it work on compute shaders by
calling pin_scratch_space() from iris_compute_walker(), which fixes
both the address and the pinning.
This commit can be verified by the two-year-old-but-still-unreviewed
Piglit MR 234. You can also verify this by running a very simple
compute shader with INTEL_DEBUG=spill_fs.
References: https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/234
Fixes: ae18e1e707 ("iris: Add support for scratch on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15070>
Have a single border color pool per bufmgr instead of per context.
We want to have a single VM shared among every context and the border
color pool is the last feature preventing us from having that.
Previously we had 1024 colors per context but once the buffer was full
we just waited for the buffer to be unused and restarted it. After
this patch we have 4096 colors for every single context and we can't
just flush buffers if they are full, so we simply return black.
There are many strategies we could try to implement to help alleviate
this new 4096 limit, none of which are implemented by this patch:
- We could just expand the buffer to the full 16MB we can use,
allowing 262144 colors.
- We could use multiple buffers and make the contexts refcount them,
so eventually older buffers would reach zero references and be
recycled, moving us to a working set maximum from a lifetime
maximum.
- We could also make the border color pool be a standard memzone and
then give smaller buffers to each context when they need, so the
limit would be in the number of contexts that can use border color
pools. This was my first implementation but Ken suggested I switch
to the one provided by this patch, which is simpler.
Keep it like this since border colors don't seem to be used very much
and other Mesa drivers such as radeonsi also seem to employ the
"return black once we reach the limit" strategy.
As a last note, we could also move the contents of iris_border_color.c
to iris_bufmgr.c in order to avoid breaking some abstractions we have
in Iris, like we do with iris_bufmgr_get_border_color_pool(). I can do
this in case we want it.
v2: Switch from standard memzone to a per-screen thing (see above).
v3: Actually make it per bufmgr. Just making it per screen is not
enough, since screens can share the same VM, an example being the
gputest benchmark suite.
v4: Rebase.
v5: Remove dead code, lock around hash table lookup (Ken).
v6: Simple rebase.
v7: Another rebase (for_each_batch).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12028>
A big reason we had these fields was to help create a set of surface
states for a resource. That's largely being handled through other means
now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>
Although a resource may support CCS with its original format, a texture
view of that resource may have a format that doesn't support
compression. Don't create CCS surface states for such texture views.
This change affects iris' behavior when running piglit's
arb_texture_view-rendering-formats_gles3 test on SKL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14806>