Commit graph

3167 commits

Author SHA1 Message Date
Jason Ekstrand
d1c778b362 anv: Be more careful about hashing pipeline layouts
Previously, we just hashed the entire descriptor set layout verbatim.
This meant that a bunch of extra stuff such as pointers and reference
counts made its way into the cache.  It also meant that we weren't
properly hashing in the Y'CbCr conversion information information from
bound immutable samplers.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-07-02 13:07:06 -07:00
Jason Ekstrand
06412bfc98 anv,intel: Enable nir_opt_large_constants for Vulkan
According to RenderDoc, this shaves 99.6% of the run time off of the
ambient occlusion pass in Skyrim Special Edition when running under DXVK
and shaves 92% off the runtime for a reasonably representative frame.
When running the actual game, Skyrim goes from being a slide-show to a
very stable and playable framerate on my SKL GT4e machine.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-02 12:09:50 -07:00
Jason Ekstrand
70ce880434 anv: Add state setup support for shader constants
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-02 12:09:49 -07:00
Jason Ekstrand
3a5ed18c51 anv: Add support for shader constant data to the pipeline cache
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-02 12:09:47 -07:00
Iago Toral Quiroga
1b54824687 anv/cmd_buffer: make descriptors dirty when emitting base state address
Every time we emit a new state base address we will need to re-emit our
binding tables, since they might have been emitted with a different base
state adress.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>
2018-07-02 08:31:20 +02:00
Iago Toral Quiroga
6a1d8350c9 anv/cmd_buffer: clean dirty push constants flag after emitting push constants
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>
2018-07-02 08:31:02 +02:00
Iago Toral Quiroga
198a72220b anv/cmd_buffer: never shrink the push constant buffer size
If we have to re-emit push constant data, we need to re-emit all
of it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>
2018-07-02 08:30:40 +02:00
Jose Maria Casanova Crespo
a99c9e63a0 anv: finish the binding_table_pool on destroyDevice when use_softpin
Running VK-CTS in batch execution mode was raising the
VK_ERROR_INITIALIZATION_FAILED error in multiple tests. But when the
same failing tests were run isolated they always passed.

createDevice and destroyDevice were called before and after every
tests. Because the binding_table_pool was never closed, we reached the
maximum number of open file descriptors (ulimit -n) and when that
happened every call to createDevice implied a
VK_ERROR_INITIALIZATION_FAILED error.

Fixes: c7db0ed4e9
      ("anv: Use a separate pool for binding tables when soft pinning")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-29 21:49:31 +02:00
Francisco Jerez
c2c803be7b intel/fs: Build 32-wide FS shaders.
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-28 13:25:21 -07:00
Jason Ekstrand
b95b0e2918 intel/anv,blorp,i965: Implement the SKL 16x MSAA SIMD32 workaround
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-28 13:25:18 -07:00
Jason Ekstrand
d5e028a57b intel/fs: Add fields to wm_prog_data for SIMD32 dispatch
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
bcbc7d3a17 intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
7144247c2c intel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
37c1df28c9 intel/fs: Fix Gen6+ interpolation setup for SIMD32
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
e208bc3bb7 intel/fs: Get rid of MOV_DISPATCH_TO_FLAGS
We can just emit the MOV in the two places where we use this.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
5e3028d826 intel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaround
There's no reason for us to emit it a pile of times and then have a
whole pass to clean it up.  Just emit it once like we really want.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
40fe108e2b intel/fs: Generalize the unlit centroid workaround
This generalizes the unlit centroid workaround so it's less code and now
supports SIMD32.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
1d381731e0 intel/fs: Fix sample id setup for SIMD32.
v2 (Jason Ekstrand):
 - Disallow gl_SampleId in SIMD32 on gen7

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
2fd0aed89a intel/fs: Fix Gen7 compressed source region alignment restriction for SIMD32
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
6909aed90e intel/fs: Implement 32-wide FS payload setup on Gen6+
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
f6c4aace22 intel/fs: Extend thread payload layout to SIMD32
And handle 32-wide payload register reads in fetch_payload_reg().

v2 (Jason Ekstrand);
 - Fix some whitespace and brace placement

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
8f143f70d6 intel/fs: Wrap FS payload register look-up in a helper function.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
d996e5b812 intel/fs: Use fs_regs instead of brw_regs in the unlit centroid workaround
While we're here, we change to using horiz_offset() instead of abusing
half().

v2 (Jason Ekstrand):
 - Use horiz_offset() instead of half()

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
38aee1a06d intel/fs: Simplify fs_visitor::emit_samplepos_setup
The original code manually handled splitting the MOVs to 8-wide to
handle various regioning restrictions.  Now that we have a SIMD width
splitting pass that handles these things, we can just emit everything at
the full width and let the SIMD splitting pass handle it.  We also now
have a useful "subscript" helper which is designed exactly for the case
where you want to take a W type and read it as a vector of Bs so we may
as well use that too.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
244a0ff3a8 i965: Add plumbing for shader time in 32-wide FS dispatch mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
2d7d652d5c intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
db6ca13efc intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates
On g4x through Sandy Bridge, src1 (the coordinates) of the PLN
instruction is required to be an even register number.  When it's odd
(which can happen with SIMD32), we have to emit a LINE+MAC combination
instead.  Unfortunately, we can't just fall through to the gen4 case
because the input registers are still set up for PLN which lays out the
four src1 registers differently in SIMD16 than LINE.

v2 (Jason Ekstrand):
 - Take advantage of both accumulators and emit LINE LINE MAC MAC
   (Based on a patch from Francisco Jerez)
 - Unify the gen4 and gen4x-6 cases using a loop

v3 (Jason Ekstrand):
 - Don't unify gen4 with gen4x-6 as this turns out to be more fragile
   than first thought without reworking the gen4 barycentric coordinate
   layout.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
566e6abd6d intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLN
When we don't have PLN (gen4 and gen11+), we implement LINTERP as either
LINE+MAC or a pair of MADs.  In both cases, the accumulator is written
by the first of the two instructions and read by the second.  Even
though the accumulator value isn't actually ever used from a logical
instruction perspective, it is trashed so we need to make the scheduler
aware.  Otherwise, the scheduler could end up re-ordering instructions
and putting a LINTERP between another an instruction which writes the
accumulator and another which tries to use that result.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
73d60455e9 intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET
This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU
operation and less like a send.  This is less code over-all and, as a
side-effect, it now properly handles execution groups and lowering so
SIMD32 support just falls out.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
74b477039d intel/fs: Add the group to the flag subreg number on SNB and older
We want consistent behavior in the meaning of the flag_subreg field
between SNB and IVB+.

v2 (Jason Ekstrand):
 - Add some extra commentary

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
2aefa5e19f intel/fs: Fix FB read header setup for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
e06f5b30cc intel/fs: Fix logical FB write lowering for SIMD32
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
ce370902d4 intel/fs: Fix FB write message control codegen for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
8b788069fb intel/fs: Don't enable dual source blend if no outputs are written
This prevents a crash in some arb_enhanced_layouts tests that would be
caused by the next commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
48241c780a intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
789d20df36 intel/eu: Fix pixel interpolator queries for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
1650442026 intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.
Current discard handling requires dedicating the second flag register to
discard.  However, control-flow in SIMD32 requires both flag registers
so it's incompatible with the current discard handling.  Just don't
support SIMD32+discard for now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
1811cbdc25 intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow
The hardware's control flow logic is 16-wide so we're out of luck
here.  We could, in theory, support SIMD32 if we know the control-flow
is uniform but we don't have that information at this point.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
d5b617a28e intel/fs: Split instructions low to high in lower_simd_width
Commit 0d905597f fixed an issue with the placement of the zip and unzip
instructions.  However, as a side-effect, it reversed the order in which
we were emitting the split instructions so that they went from high
group to low instead of low to high.  This is fine for most things like
texture instructions and the like but certain render target writes
really want to be emitted low to high.  This commit just switches the
order back around to be low to high.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 0d905597f "intel/fs: Be more explicit about our placement of [un]zip"
2018-06-28 13:19:38 -07:00
Jason Ekstrand
0b830081f0 intel/fs: Rework KSP data to be SIMD width-based
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
9d78abbef8 intel/compiler: Add and use helpers for working with KSP indices
The pixel shader dispatch table is kind-of a confusing mess.  This adds
some helpers for dealing with it and for easily extracting the correct
data from wm_prog_data.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
5b6e91dd35 intel/fs: Remove program key argument from generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
a14fb0184a intel/fs: Set up FB write message headers in the visitor
Doing instruction header setup in the generator is awful for a number
of reasons.  For one, we can't schedule the header setup at all.  For
another, it means lots of implied writes which the instruction scheduler
and other passes can't properly read about.  The second isn't a huge
problem for FB writes since they always happen at the end.  We made a
similar change to sampler handling in ff4726077d.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
dda31a7bbc intel/fs: Fix implied_mrf_writes() for headerless FB writes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
90643689aa intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
ed09e78023 intel/eu: Return new instruction to caller from brw_fb_WRITE().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
c0a1c248b8 intel/fs: Pull FB write implied headers from src[0]
Now that we have the implied header in src[0] for tracking purposes, we
may as well use it in the generator.  This makes things a tiny bit more
general.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
b1cc9a9ae1 intel/fs: Properly track implied header regs read by FB writes
The FB write opcode on gen4-5 does implied copies from g0 and g1 to the
message payload.  With this commit, we start tracking that as part of
the IR by having the FB write read from g0-1.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
d91fa20655 intel/fs: FS_OPCODE_REP_FB_WRITE has side effects
It doesn't matter since we don't ever run replicated write shaders
through the optimizer but it's good to be complete.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
e8eb182ec5 Revert "anv: Print the actual enum for ignored structure types"
This reverts commit fda7014c35.  It was
hitting an unreachable when the sType was unknown.
2018-06-27 14:10:37 -07:00