Commit graph

684 commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho
33c61eb2f1 iris: Implement ARB_compute_variable_group_size
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:37 -07:00
D Scott Phillips
65b05ebdda anv,iris: Fix input vertex max for tcs on gen12
gen12 does away with the single patch dispatch mode for tcs, and
increases some limits so that 8_patch mode can always work. Make the
necessary changes so we don't try to fall back to single patch mode.

Fixes KHR-GL46.tessellation_shader.single.max_patch_vertices and others

Fixes: 44754279ac ("intel/fs/gen12: Use TCS 8_PATCH mode.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4843>
2020-05-01 16:49:11 +00:00
Mike Blumenkrantz
91375f13ce iris: move iris_vtable to iris_screen
instead of inlining this into every context, now a struct is used in the screen
struct to reduce memory usage and simplify a couple of the methods

Closes: https://gitlab.freedesktop.org/kwg/mesa/-/issues/6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4376>
2020-04-29 16:59:45 +00:00
Kenneth Graunke
506414e837 iris: Fix downcast of bound_vertex_buffers from uint64_t to int
This is the wrong data type, the original field - and the values we're
adding in - are both 64-bit unsigned.  Keep the original data type.

Thanks to Dave Airlie for finding this while reading the code.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4802>
2020-04-29 06:50:54 +00:00
Dylan Baker
c495c3af26 replace imports memory functions with utils memory functions
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
2020-04-21 11:09:03 -07:00
Caio Marcelo de Oliveira Filho
9ff55621d9 iris: Stop using cs_prog_data->threads
This is a preparation for dropping this field since this value is
expected to be calculated by the drivers now for variable group size
case.  And also the field would get in the way of brw_compile_cs
producing multiple SIMD variants (like FS).

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
2020-04-09 19:23:12 -07:00
Caio Marcelo de Oliveira Filho
c54fc0d07b intel/compiler: Replace cs_prog_data->push.total with a helper
The push.total field had three values but only one was directly
used (size).  Replace it with a helper function that explicitly takes
the cs_prog_data and the number of threads -- and use that in the
drivers.

This is a preparation for ARB_compute_variable_group_size where the
number of threads (hence the total size for push constants) is not
defined at compile time (not cs_prog_data->threads).

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
2020-04-09 19:23:12 -07:00
Sagar Ghuge
c40acdef52 iris: Set patch count threshold in 3DSTATE_HS
Lets specifiy maximum number of patches that will be accumulated before
a thread is dispatched.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3563>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3563>
2020-03-23 17:57:57 +00:00
Tapani Pälli
cd132a8eed iris: determine aux usage during predraw and state setup
Patch changes surface state setup to alloc/fill states for all possible
usages for image resource on gen12. Also predraw and binding table
population is changed to determine correct aux usage with the new
iris_image_view_aux_usage.

v2: alloc always all states independent on current image
    aux state on gen >= 12 , code cleanups (Nanley)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4080>
2020-03-16 10:34:21 +00:00
Tapani Pälli
d4c879e69e iris: move existing image format fallback as a helper function
Patch adds a helper function for determining image format and changes
iris_set_shader_images to use it.

v2: pass iris_context instead of pipe one, rename function,
    code cleanup (Nanley)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4080>
2020-03-16 10:34:21 +00:00
Mathias Fröhlich
0ea3ca3eca iris: Move down iris_emit_sbe_swiz in profiles.
Harvest the information gathered in the previous patch
inside of iris.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
2020-03-10 14:28:36 +00:00
Rafael Antognolli
5f13996262 intel/gen12+: Disable mid thread preemption.
Fixes a GPU hang in Car Chase.

Cc: mesa-stable@lists.freedesktop.org

v2: Add comment explaining why (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4035>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4035>
2020-03-03 19:52:06 +00:00
Rafael Antognolli
cd40110420 intel/isl: Implement D16_UNORM workarounds.
GEN:BUG:14010455700 (lineage 1808121037):

   "To avoid sporadic corruptions “Set 0x7010[9] when Depth Buffer
   Surface Format is D16_UNORM , surface type is not NULL & 1X_MSAA"

Required for fixing ttps://gitlab.freedesktop.org/mesa/mesa/issues/2501.

GEN:BUG:1806527549:

   "Set HIZ_CHICKEN (7018h) bit 13 = 1 when depth buffer is D16_UNORM."

This one could fix a GPU hang in some workloads.

v2: Implement WA in isl and add another similar WA (Jason).
v3: Add flushes before changing chicken registers (Jason)
v4: Depth flush and stall + end of pipe sync when changing registers
(Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
2020-03-03 16:25:54 +00:00
Rafael Antognolli
b4ddc6139b iris: Wait for the GPU to be idle before invalidating the aux table.
An end of pipe sync seems to satisfy this restriction. It takes care of
GPU hangs seen in dEQP-GLES31.functional.copy_image.* tests.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4005>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4005>
2020-03-02 22:28:11 +00:00
Rafael Antognolli
a7de6f1321 iris: Split aux map initialization from invalidation.
We can write the aux map address only once during the batch
initialization, and then only invalidate it once we modify it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4005>
2020-03-02 22:28:11 +00:00
Rafael Antognolli
a70a605ad6 iris: Apply the flushes when switching pipelines.
Even though the workaround description says:

   "all the listed commands are non-pipelined and hence flush caused due
   to pipeline mode change must not cause performance issues..."

My understanding is that we still need to have the flushes. Also, the
flushes are required not only to stall the pipeline, but also to clear
caches, so I don't think they can simply be discarded.

Additionally, while doing some testing that increased the number of
surface STATE_BASE_ADDRESS emitted, I got a lot more GPU hangs. Adding
these flushes fixes those hangs.

Fixes: b8fbb39a (iris: Implement Gen12 workaround for non pipelined
                 state)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3908>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3908>
2020-02-26 21:16:24 +00:00
Kenneth Graunke
4d57a27504 iris: Set MOCS for constant packets on Gen12+
It seems to be back, and we shouldn't use 0, as that's now considered
an error.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3720>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3720>
2020-02-21 16:44:55 -08:00
Kenneth Graunke
1cdf5abdfa iris: Make mocs an inline helper in iris_resource.h
Now that it uses ISL rather than genxml code, there's no need for it to
live as a vtable function inside the state module.  We can just make it
a static inline helper in iris_resource.h so it's available throughout
the codebase.

Fixes: a4da6008b6 ("iris: Use mocs from isl_dev.")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3720>
2020-02-21 16:44:38 -08:00
Lionel Landwerlin
4151d84323 iris: add support INTEL_blackhole_render
v2: Use a software mechanism to manage blackhole state

v3: s/iris_batchbuffer/iris_batch/ (Ken)

v4: Fixup state transition mistake (Ken/Lionel)

v5: Cleanup iris_batch_flush (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2964>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2964>
2020-02-13 17:05:05 +00:00
Lionel Landwerlin
19e7bcee17 iris: implement gen12 post sync pipe control workaround
Like Skylake, Gen12 requires a workaround for PIPE_CONTROLs using a
post-sync operation.

v2: Restrict to A0

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3405>
2020-02-05 00:25:48 +00:00
Jason Ekstrand
2ccdf881ab iris: Plumb deref block size through to 3DSTATE_SF
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:46:30 -06:00
Jason Ekstrand
fdc0c19328 intel/common: Return the block size from get_urb_config
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:46:26 -06:00
Jason Ekstrand
e928676b69 iris: Consolodate URB emit
Now that we don't have to carry a URB state emit function for BLORP we
can roll some stuff together and drop a genX helper.

Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:46:22 -06:00
Jason Ekstrand
73a684964b intel: Take a gen_l3_config in gen_get_urb_config
Instead of making each driver pass in the same push constant size and do
it's own L3$ config URB size calculation, just make them pass in their
L3$ configuration.

Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:46:18 -06:00
Jason Ekstrand
bff7b3c7bd iris: Use the URB size from the L3$ config
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:46:14 -06:00
Jason Ekstrand
99f3178a24 iris: Store the L3$ configs in the screen
We only calculate them based on device info and never change them so
this seems like a reasonable place to put them.  We could also put them
in the context, but that's not accessible from iris_init_*_context.

Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:46:13 -06:00
Jason Ekstrand
6471bac99e iris: Set SLMEnable based on the L3$ config
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:46:10 -06:00
Jason Ekstrand
73434b665b intel/genxml: Drop SLMEnable from L3CNTLREG on Gen11
SML is no longer in the L3$ on Gen11+.  It's not incredibly clear from
the docs but no Gen11 platforms are in the list of platforms on which
this bit exists.  Also, we've been always setting it false on Gen11 in
ANV and i965 thanks to GEN_L3P_SLM being zero with no ill effects.

Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:45:53 -06:00
Jason Ekstrand
e1bdb127b6 anv,iris: Set 3DSTATE_SF::DerefBlockSize to per-poly on Gen12+
According to the BSpec, this should prevent hangs when using shaders
with large URB entries.  A more precise fix can be done but it requires
re-arranging URB setup.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:45:52 -06:00
Kenneth Graunke
94f9c5fff6 iris: Make iris_emit_default_l3_config pull devinfo from the batch
No need to pass it, we can just use batch->screen->devinfo.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3613>
2020-01-29 19:53:22 +00:00
Jordan Justen
da03e07cc2 iris: Emit CS Stall before Instruction Cache flush for gen12 WA
Before flushing the instruction cache with a pipe control, we need to
use a CS Stall pipe control.

Ref: GEN:BUG:1409226450
Rework: Add stall-at-scoreboard (Lionel)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3457>
2020-01-28 21:57:17 +00:00
Lionel Landwerlin
d101907de9 anv/iris: warn gen12 3DSTATE_HS restriction
This should never happen but better off documenting it in case someone
plays with max threads numbers.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3489>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3489>
2020-01-23 15:06:59 +02:00
Kenneth Graunke
8dc0540a17 intel: Fix aux map alignments on 32-bit builds.
ALIGN() brilliantly uses uintptr_t, making it unsafe for use with 64-bit
GPU addresses in 32-bit builds of the driver.  Use align64() instead,
which uses uint64_t.

Fixes assertion failures when running any 32-bit program on Tigerlake.

Fixes: 2e6a7ced4d ("iris/gen12: Write GFX_AUX_TABLE base address register")
Fixes: 0d0290bb3f ("intel/common: Add surface to aux map translation table support")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3507>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3507>
2020-01-23 02:16:50 +00:00
Kenneth Graunke
311cab27e2 iris: Drop some workarounds which are no longer necessary
These workarounds are no longer required by 10th Gen hardware.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3495>
2020-01-21 13:58:40 -08:00
Danylo Piliaiev
3f9a6011a6 iris: Fix value of out-of-bounds accesses for vertex attributes
Having VERTEX_BUFFER_STATE.BufferSize greater than the size of
a bound vertex buffer allows shader to read uninitialized vertex
attributes from BO, instead of allowing hardware to return zeroes
on out-of-bounds access.

OpenGL spec "6.4 Effects of Accessing Outside Buffer Bounds" says:

"Robust buffer access can be enabled by creating a context with robust access
 enabled through the window system binding APIs. When enabled, any command
 unable to generate a GL error as described above, such as buffer object accesses
 from the active program, will not read or modify memory outside of the data
 store of the buffer object and will not result in GL interruption or termination.
 Out-of-bounds reads may return values from within the buffer object or zero
 values."

Fixes three webgl tests:
 conformance/rendering/out-of-bounds-array-buffers.html
 conformance2/rendering/out-of-bounds-index-buffers-after-copying.html
 conformance2/rendering/element-index-uint.html

See #1996

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
2020-01-21 09:52:40 +00:00
Jordan Justen
5d7381c645
iris: Fix some indentation in iris_init_render_context
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2020-01-17 15:28:07 -08:00
Tapani Pälli
3cec148455 iris: set depth stall enabled when depth flush enabled on gen12
This implements HW workaround #1409600907 for iris driver.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3378>
2020-01-16 14:05:54 +02:00
Lionel Landwerlin
9eca823cce iris: implement another workaround for non pipelined states
v2: add comment (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3408>
2020-01-16 11:51:22 +02:00
Lionel Landwerlin
e6e5cbac04 iris: handle new PIPE_CONTROL field
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3408>
2020-01-16 11:48:11 +02:00
Lionel Landwerlin
b8fbb39ab2 iris: Implement Gen12 workaround for non pipelined state
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3365>
2020-01-14 11:52:36 +02:00
Kenneth Graunke
645b195312 iris: Delete remnants of the unimplemented ASTC 5x5 workaround
I copy and pasted some of the boilerplate but never the implementation.
For now, ASTC 5x5 is disabled and faked via uncompressed RGBA; let's
delete these remnants until such a time when we implement it properly.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2020-01-03 18:06:38 -08:00
Kenneth Graunke
7a9c0fc0d7 intel: Drop Gen11 WaBTPPrefetchDisable workaround
This isn't needed on production Icelake hardware.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3250>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3250>
2020-01-03 00:20:17 +00:00
Rafael Antognolli
59de5d9b6a iris: Implement WA for push constants.
v2: Apply WA to gen11+ instead of gen12+ (Jordan).

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-12-13 14:15:04 -08:00
Kenneth Graunke
2e654db27a iris: Create smaller program keys without legacy features
A lot of the brw_*_prog_key fields are for emulating features on legacy
hardware that iris doesn't support.  In particular, all of the texture
swizzle fields take up a lot of space.  These dead fields make hashing
the shader keys more expensive than it ought to be.

We introduce iris-specific keys with only the information we need, and
translate them to brw keys when actually compiling new variants.  This
way, key comparisons can use the small keys.  The size reductions are:

   VS:  328 bytes ->  8 bytes
   TCS: 312 bytes -> 24 bytes
   TES: 304 bytes -> 24 bytes
   GS:  284 bytes ->  8 bytes
   FS:  304 bytes -> 16 bytes
   CS:  280 bytes ->  4 bytes

Scores for the Piglit drawoverhead microbenchmark case with a shader
program change improve by roughly 30%.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-10 22:25:41 -08:00
Kenneth Graunke
69d7782b15 intel/decoder: Make get_state_size take a full 64-bit address and a base
i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.

iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone,  So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.
2019-12-10 19:10:49 -08:00
Kenneth Graunke
5cc7636993 iris: Enable Gen11 Color/Z write merging optimization
TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed
frequency, according to Felix Degrood.  I didn't see any improvements
at out-of-the-box power management settings, however.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:19:43 -08:00
Rafael Antognolli
50f60d69e4 iris: Add restriction to 3DSTATE_CONSTANT_ packets.
The following programming note shows up in all 3DSTATE_CONSTANT_*
packets:

   "The sum of all four read length fields must be less than or equal to
   the size of 64."

The backend compiler should guarantee this for us, so let's just add a
check here.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
06438ea7fa iris: Use 3DSTATE_CONSTANT_ALL when possible.
Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).

There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.

v2 (Suggestions from Caio):
 - use max_length instead of large_buffers.
 - remove UNUSED and use #if GEN_GEN >= 12 instead.
 - inline "buffers" and drop BITSET_RANGE() usage.
 - add assert(n <= max_pointers)
 - move emit to outside of the loop.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Rafael Antognolli
1ba9a18911 iris: Rework push constants emitting code.
Split into a function the logic to gather the push constant buffers,
which now stores them in struct push_bos. Another function is added to
emit the packet, using data from the push_bos struct.

This will be useful when adding a new function for emitting push
constants for newer platforms.

v2 (Suggestions from Caio):
   - rename 'n' -> 'buffer_count'
   - remove large_buffers (for now)
   - initialize push_bos
   - remove assert
   - change for() condition (i <= 3 -> i < 4)
v3:
   - Add comment about size limit.
   - Rework "shift" logic and 'for' loop.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-04 20:48:25 +00:00
Kenneth Graunke
3fdf2bb313 iris: Disable VF cache partial address workaround on Gen11+
The vertex cache uses the full 48-bit address on Gen11+.  See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.

Interestingly, the docs don't mention index buffers as needing a
workaround at all.  So either we've been overzealous, or the docs
never got updated to record that.  Which begs the question of whether
the issue there was fixed, if there was one...

Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).
2019-11-26 12:13:34 -08:00