Commit graph

391 commits

Author SHA1 Message Date
Kenneth Graunke
c58f52f0ef iris: Only set key->flat_shade if COL0/COL1 are written.
This was just laziness on my part, we already added similar checks in
the VS key handling.  Just need to do it here too.  Should improve cache
hits.
2019-07-11 00:12:50 -07:00
Kenneth Graunke
cb82d534a0 iris: Drop comment about var->data.binding not being set.
I refactored the sampler lowering passes a long time ago to ensure
that gl_nir_lower_samplers_as_deref is run and var->data.binding is set.
2019-07-11 00:12:00 -07:00
Kenneth Graunke
38f9954208 iris: Drop comments about missing NOS
These stages don't need NOS.  If they do, we can add it - the
infrastructure is there if we need it someday.
2019-07-11 00:12:00 -07:00
Jason Ekstrand
14781e2122 intel/compiler: Add a "base class" for program keys
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data.  I'd like to add another thing or two and need a
place to put it.  This commit adds a new brw_base_prog_key struct which
contains those two common bits.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-10 19:35:55 +00:00
Kenneth Graunke
10560f8506 iris: Minor tidying 2019-07-03 22:24:44 -07:00
Timur Kristóf
3b6d787e40 iris: move sysvals to their own constant buffer
This commit moves the sysvals to a separate, new constant buffer
at the end (before the shader constants). It also allows us to
remove the special handling we had for cbuf0, and enables all
constant buffers to support user-specified resources and user
buffers.

v2: (by Kenneth Graunke)
- Rebase on the previous patch to fix system value uploading.
- Fix disk cache num_cbufs calculation
- Fix passthrough TCS to report num_cbufs = 1 so upload actually occurs
- Change upload_sysvals to assert that num_cbufs > 0 when
  num_system_values > 0.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-23 18:33:23 +02:00
Kenneth Graunke
ebc8c20b3e iris: Mark cbuf0 as not needing uploading every single time
I neglected to mark cbuf0_needs_upload = false after uploading it.
The obvious fix regressed user clip plane tests, because of a second
bug: we also forgot to mark that they may need re-uploading when
changing shader programs (which may have more or less system values).

Thanks to Timur Kristóf for catching the original issue.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
2019-06-23 18:32:11 +02:00
Caio Marcelo de Oliveira Filho
f346b277d1 iris: Create binding table slot for num_work_groups only when needed
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 17:57:37 -07:00
Kenneth Graunke
30314270d4 iris: Zero shs->cbuf0 when binding a passthrough TCS
Fixes valgrind errors when running two CTS tests back to back:
- KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT*
(The first test has an actual TCS, the second uses passthrough.)
2019-06-07 15:13:42 -07:00
Kenneth Graunke
cd796120c9 iris: Rename bind_state to bind_shader_state.
bind_state is possibly the worst name ever.  For create, we used
create_shader_state, which is more descriptive.  Put shader in the name.
2019-06-07 11:26:20 -07:00
Kenneth Graunke
22025595f3 iris: Sweep the NIR in iris_create_uncompiled_shader().
We run a ton of backend specific passes here (mostly brw_preprocess_nir)
and ought to sweep up any unused memory at this point, since we're going
to hang on to this NIR for as long as the linked program lives.
2019-06-07 01:29:38 -07:00
Jason Ekstrand
bb67a99a2d intel/nir: Stop returning the shader from helpers
Now that NIR_TEST_* doesn't swap the shader out from under us, it's
sufficient to just modify the shader rather than having to return in
case we're testing serialization or cloning.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-05 20:07:28 +00:00
Kenneth Graunke
34d3103dee iris: Fix SO stride units for DrawTransformFeedback
Mesa measures in DWords.  The hardware also claims to measure in DWords.
Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ.
Which means that it really measures in bytes.  So, convert to bytes.

Without this, our offset / stride denominator was 1/4th the size it
should be, leading to 4x the vertex count that we should have had.

Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers
2019-06-03 22:51:18 -07:00
Caio Marcelo de Oliveira Filho
045aeccf0e iris: Always reserve binding table space for NIR constants
Don't have a separate mechanism for NIR constants to be removed from
the table.  If unused, we will compact it away.  The use_null_surface
is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
5611444809 iris: Print binding tables when INTEL_DEBUG=bt
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
97cd865be2 iris: Compact binding tables
Change the iris_binding_table to keep track of what surfaces are
actually going to be used, then assign binding table indices just for
those.  Reducing unused bytes on those are valuable because we use a
reduced space for those tables in Iris.

The rest of the driver can go from "group indices" (i.e. UBO #2) to
BTI and vice-versa using helper functions.  The value
IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is
not used or a certain BTI is not valid.

The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be
set to skip compacting binding table.

v2: (all from Ken)
    Use BITFIELD64_MASK helper. Improve comments.
    Assert all group is marked as used when we have indirects.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
79f1529ae0 iris: Create an enum for the surface groups
This will make convenient to handle compacting and printing the
binding table.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
1c8ea8b300 iris: Handle binding table in the driver
Stop using brw_compiler to lower the final binding table indices for
surface access.  This is done by simply not setting the
'prog_data->binding_table.*_start' fields.  Then make the driver
perform this lowering.

This is a better place to perfom the binding table assignments, since
the driver has more information and will also later consume those
assignments to upload resources.

This also prepares us for two changes: use ibc without having to
implement binding table logic there; and remove unused entries from
the binding table.

Since the `block` field in brw_ubo_range now refers to the final
binding table index, we need to adjust it before using to index
shs->constbuf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
518f83236b iris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniforms
We'll change iris to perform lowering of the binding table indices
earlier (before the backend kick in), but the backend compiler uses
the result of the analysis to identify load_ubo intrinsics, so we do
the analysis after the lowering to have the right indices.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Jason Ekstrand
e459d6d6df iris: Enable nir_opt_large_constants
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15306230 -> 15304726 (<.01%)
    instructions in affected programs: 4570 -> 3066 (-32.91%)
    helped: 16
    HURT: 0

    total cycles in shared programs: 361703436 -> 361680041 (<.01%)
    cycles in affected programs: 129388 -> 105993 (-18.08%)
    helped: 16
    HURT: 0

    LOST:   0
    GAINED: 2

The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal
Space Program

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-29 21:09:16 +00:00
Jason Ekstrand
9dc57eebd5 iris: Don't assume UBO indices are constant
It will be true for the constant/system value buffer because they use a
constant zero but it's not true in general.  If we ever got here when
the source wasn't constant, nir_src_as_uint would assert.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2019-05-29 21:09:16 +00:00
Jason Ekstrand
744f93f5c1 iris: Move upload_ubo_ssbo_surf_state to iris_program.c
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-29 21:09:16 +00:00
Kenneth Graunke
6892d2b94a iris: Clone before calling nir_strip and serializing
This is non-destructive and leaves the debugging information in place.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-29 18:16:32 +00:00
Kenneth Graunke
e1409aead5 iris: Only store the SHA1 of the NIR in iris_uncompiled_shader
Jason pointed out that we don't need to keep an entire copy of the
serialized NIR around, we just need the SHA1.  This does change our
disk cache key to be taking a SHA1 of a SHA1, which is a bit odd,
but should work out and be faster and use less memory.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-29 18:16:32 +00:00
Kenneth Graunke
6dc1c2d8bd iris: Fix ALT mode regressions from shader cache
We were checking this based on nir->info.name, but with the shader
cache enabled, nir_strip throws out the name, causing us to use IEEE
mode for ARB programs.

gl-1.0-spot-light regressed because it wants ALT mode for 0^0 behavior.

Fixes: dc5dc727d5 iris: Serialize the NIR to a blob we can use for shader cache purposes.
2019-05-21 16:58:54 -07:00
Dylan Baker
601c9bc135 iris: Cache assembly shaders in the on-disk shader cache
This implements storing and retrieving iris_compiled_shader objects
from the on-disk shader cache.

(by Dylan Baker and Kenneth Graunke)
2019-05-21 15:05:38 -07:00
Kenneth Graunke
dc5dc727d5 iris: Serialize the NIR to a blob we can use for shader cache purposes.
We will use a hash of the serialized NIR together with brw_prog_*_key
(for NOS) as the disk cache key, where the disk cache contains actual
assembly shaders.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-21 15:05:38 -07:00
Kenneth Graunke
6ae2caf201 iris: Move iris_uncompiled_shader definition to iris_context.h
It had been internal to iris_program.c, but with the upcoming disk cache
code, the "program module" is going to be spread across a couple source
files.  Into a header it goes!

Now it lives alongside iris_compiled_shader, which makes sense.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-21 15:05:38 -07:00
Kenneth Graunke
646924cfa1 intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8
Our tessellation control shaders can be dispatched in several modes.

- SINGLE_PATCH (Gen7+) processes a single patch per thread, with each
  channel corresponding to a different patch vertex.  PATCHLIST_N will
  launch (N / 8) threads.  If N is less than 8, some channels will be
  disabled, leaving some untapped hardware capabilities.  Conditionals
  based on gl_InvocationID are non-uniform, which means that they'll
  often have to execute both paths.  However, if there are fewer than
  8 vertices, all invocations will happen within a single thread, so
  barriers can become no-ops, which is nice.  We also burn a maximum
  of 4 registers for ICP handles, so we can compile without regard for
  the value of N.  It also works in all cases.

- DUAL_PATCH mode processes up to two patches at a time, where the first
  four channels come from patch 1, and the second group of four come
  from patch 2.  This tries to provide better EU utilization for small
  patches (N <= 4).  It cannot be used in all cases.

- 8_PATCH mode processes 8 patches at a time, with a thread launched per
  vertex in the patch.  Each channel corresponds to the same vertex, but
  in each of the 8 patches.  This utilizes all channels even for small
  patches.  It also makes conditions on gl_InvocationID uniform, leading
  to proper jumps.  Barriers, unfortunately, become real.  Worse, for
  PATCHLIST_N, the thread payload burns N registers for ICP handles.
  This can burn up to 32 registers, or 1/4 of our register file, for
  URB handles.  For Vulkan (and DX), we know the number of vertices at
  compile time, so we can limit the amount of waste.  In GL, the patch
  dimension is dynamic state, so we either would have to waste all 32
  (not reasonable) or guess (badly) and recompile.  This is unfortunate.
  Because we can only spawn 16 thread instances, we can only use this
  mode for PATCHLIST_16 and smaller.  The rest must use SINGLE_PATCH.

This patch implements the new 8_PATCH TCS mode, but leaves us using
SINGLE_PATCH by default.  A new INTEL_DEBUG=tcs8 flag will switch to
using 8_PATCH mode for testing and benchmarking purposes.  We may
want to consider using 8_PATCH mode in Vulkan in some cases.

The data I've seen shows that 8_PATCH mode can be more efficient in
some cases, but SINGLE_PATCH mode (the one we use today) is faster
in other cases.  Ultimately, the TES matters much more than the TCS
for performance, so the decision may not matter much.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:30 -07:00
Kenneth Graunke
dcfca0af7c iris: Set XY Clipping correctly.
I was setting it based off a pipe_rasterizer_state field that appears
to be entirely dead outside of the draw module respecting it.

I should be setting it when the primitive type reaching the SF is
neither points nor lines.  This is, unfortunately, rather dirty,
as we have to look at the rasterizer state, the geometry shader state,
the tessellation evaluation shader state, and the primitive type...
2019-04-29 10:53:23 -07:00
Kenneth Graunke
4c3c417b00 iris: Move iris_debug_recompile calls before uploading.
Order of operations is important, otherwise we'll find the program we
just uploaded as the "old" compile and get confused why nothing is
different between the two keys.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-04-16 09:01:20 -07:00
Kenneth Graunke
04f97eefa3 iris: Print the reason for shader recompiles.
I was lazy earlier and hadn't bothered typing / refactoring this.
Now I'm hitting some extra recompiles and would like to see why.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-04-16 09:01:18 -07:00
Karol Herbst
4a3c04a11f glsl/nir: add support for lowering bindless images_derefs
v2: handle atomics as well
    make use of nir_rewrite_image_intrinsic
v3: remove call to nir_remove_dead_derefs
v4: (Timothy Arceri) dont actually call lowering yet

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Karol Herbst
3b2a9ffd60 nir: move brw_nir_rewrite_image_intrinsic into common code
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Kenneth Graunke
9c46046f79 iris: Silence unused variable warnings in release mode 2019-04-06 15:58:16 -07:00
Dave Airlie
0ea386128b iris: avoid use after free in shader destruction
While playing with compute shaders, I was getting a random crash,
noticed that bind_state was using the old shader info for comparision,
but gallium allows the shader to be deleted while bound, so this could
lead to a use after free.

This can't happen using the cso cache. As it tracks all of this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-05 09:57:44 +10:00
Tapani Pälli
03cbfbd913 iris: initialize num_cbufs
Currently initialized only if 'ish' is non-NULL.

CID: 1444106
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 08:20:09 +02:00
Kenneth Graunke
04ff2e3fbb iris: Fix TES gl_PatchVerticesIn handling.
1. If we switch the TCS for one with a different number of output
   vertices, then the TES's gl_PatchVerticesIn value will change.
   We need to re-upload in this case.  For now, re-emit constants
   whenever the TCS/TES are swapped out.

2. If there is no TCS, then we can't grab gl_PatchVerticesIn from
   the TCS info.  Since it's a passthrough, we can just use the
   primitive's patch count (like the TCS gl_PatchVerticesIn does).

Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-11 14:07:16 -07:00
Kenneth Graunke
2f51cb5e67 iris: Rework default tessellation level uploads
Now that we've added a system value uploading mechanism, we may as well
reuse the same system for default tessellation levels.  This simplifies
the state upload code a bit.

Also fixes:
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-11 14:07:12 -07:00
Kenneth Graunke
fbc51c4c95 iris: Defer uploading sampler state tables until draw time
Gallium might call us multiple times to bind subsets of the samplers,
at which point we'd recreate the table a bunch of times.  It doesn't
really buy us anything to do it here - even if we defer to draw time,
the dirty tracking ensures we'll only do it on the first draw after a
bind_sampler_states() call.

We now use the number of samplers specified by the shader instead of
the binding count.  If this number changes, we flag sampler state as
dirty so we re-upload a table with the right number of entries.

This also fixes a bug where ice->state.need_border_colors was never
unset, so once something needed border colors, the pool would always
be pinned in all future batches.

v2: Explicitly flag sampler states as dirty, rather than assuming that
    bind_sampler_states() will be called if the program texture count
    changes.  While this may be true for st/mesa, it isn't the case for
    Gallium HUD.

Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-07 11:39:27 -08:00
Andre Heider
a4324dcefb iris: add support for tgsi_to_nir
The Gallium Nine state tracker now works on iris.

Also tested with GALLIUM_HUD and Star Wars: Knights of the Old
Republic on WINE (GL_ATI_fragment_shader).

Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-07 00:38:13 -08:00
Jose Maria Casanova Crespo
ffa9082c40 iris: setup EdgeFlag Vertex Element when needed.
If Vertex Shader uses EdgeFlag the hardware request that it is setup
as the last VERTEX_ELEMENT_STATE. If SGVS are add at draw time we
need to also reconfigure the last 3DSTATE_VF_INSTANCING so its
VertexElementIndex points to the new Vertex Element that contains
the EdgeFlag.

So if draw parameters or edgeflag are not used the CSO generated at
iris_create_vertex_element is sent directly in the batches. But if
edge flag is used we adjust last VERTEX_ELEMENT_STATE and
last 3DSTATE_VF_INSTANCING using their alternative edge flag version
we generate at iris_create_vertex_element and store at the CSO.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 22:19:08 +00:00
Jason Ekstrand
e02959f442 nir/lower_doubles: Inline functions directly in lower_doubles
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves.  This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jose Maria Casanova Crespo
4122665dd9 iris: Enable ARB_shader_draw_parameters support
Additional VERTEX_ELEMENT_STATE are used to store basevertex and
baseinstance and drawid updating the DWordLength of the
3DSTATE_VERTEX_ELEMENTS command.

This passes all piglit tests for spec.*draw_parameters.* tests
and VK-GL-CTS KHR-GL45.shader_draw_parameters_tests.* tests.

Now we only mark a dirty_update when parameters are changed or
when we have an indirect draw.

We enable PIPE_CAP_DRAW_PARAMETERS on Iris.

There is no edge flag support in the Vertex Elements setup.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-26 13:28:38 -08:00
Rafael Antognolli
689b590069 iris: Skip msaa16 on gen < 9.
Also needed to add gen information to KEY_INIT.
2019-02-21 10:26:12 -08:00
Kenneth Graunke
fd2038b22a iris: Set program key fields for MCS 2019-02-21 10:26:12 -08:00
Kenneth Graunke
973f01d55a iris: Move create and bind driver hooks to the end of iris_program.c
This just moves the code for dealing with pipe_shader_state /
pipe_compute_state / iris_uncompiled_shader to the end of the file.
Now that those do precompiles, they want to call the actual compile
functions.  Putting them at the end eliminates the need for a bunch
of prototypes.
2019-02-21 10:26:12 -08:00
Kenneth Graunke
bf23e79629 iris: Set HasWriteableRT correctly
A bit of irritating state cross dependency here, but nothing too hard
2019-02-21 10:26:12 -08:00
Kenneth Graunke
15341778ba iris: rework num textures to util_lastbit 2019-02-21 10:26:12 -08:00
Kenneth Graunke
a1ebac3750 iris: Implement ALT mode for ARB_{vertex,fragment}_shader
Fixes gl-1.0-spot-light
2019-02-21 10:26:11 -08:00