Commit graph

115447 commits

Author SHA1 Message Date
Lionel Landwerlin
cdab19fa57 intel/error2aub: annotate buffer with their address space
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-08 11:01:14 +00:00
Lionel Landwerlin
630a72827a intel/error2aub: parse other buffer types
We don't write them in the aub file yet.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-08 11:01:14 +00:00
Lionel Landwerlin
c0ea043888 intel/error2aub: strenghten batchbuffer identifier marker
Found out that some base64 data matched the '---' identifier. We can
avoid this by adding the surrounding spaces.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-08 11:01:14 +00:00
Lionel Landwerlin
650e6e5d33 intel/error2aub: identify buffers by engine
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-08 11:01:14 +00:00
Lionel Landwerlin
a07f5262f0 intel/error2aub: build a list of BOs before writing them
The error state contains several kind of BOs, including the context
image which we will want to write in a later commit. Because it can
come later in the error state than the user buffers and because we
need to write it first in the aub file, we have to first build a list
of BOs and then write them in the appropriate order.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-08 11:01:14 +00:00
Chris Wilson
04ddff1aa4 iris: Wire up EGL_IMG_context_priority
Add the missing PIPE_CAP_CONTEXT_PRIORITY_MASK and parsing of the context
construction flags.

Testcase: piglit/egl-context-priority

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-07 20:27:10 -08:00
Kenneth Graunke
2993088500 iris: Export a copy_region helper that doesn't flush
I'll want to use this for transfer maps, which already do their own
flushing.  This lets us avoid a double flush, and also gives us more
control over the batch which is selected.
2019-03-07 17:08:19 -08:00
Kenneth Graunke
335726fdac iris: Spruce up "are we using this engine?" checks for flushing
We were using batch->contains_draw as a proxy for "are we even using
this engine?"  That isn't quite right, because it only counts regular
draws.  BLORP operations may have also rendered to a resource, which
needs to trigger flushing.  To check for this, we also see if the
render and sometimes depth caches are non-empty.

We can also drop the "but there might already be stale data in the
cache even if we haven't emitted any commands yet" concern in the
comments.  The kernel flushes caches between batches.

This may not be great but it's at least better than what was there.
2019-03-07 17:08:07 -08:00
Timur Kristóf
b0c214ccee radeonsi/nir: Only set window_space_position for vertex shaders.
By mistake, this was previously set for all shaders.
It is a vertex shader property so only makes sense to
set it for vertex shaders.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-By: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-03-08 00:39:45 +00:00
Jason Ekstrand
1664de5924 nir/builder: Add a build_deref_array_imm helper
Unlike most of the cases in which we do this by hand, the new helper
properly handles non-32-bit pointers.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-07 21:20:30 +00:00
Jason Ekstrand
fcf2a0122e nir/builder: Cast array indices in build_deref_follower
There's no guarantee when build_deref_follower is called that the two
derefs have the same bit size destination.  Insert a cast on the array
index in case we have differing bit sizes.  While we're here, insert
some asserts in build_deref_array and build_deref_ptr_as_array.  The
validator will catch violations here but they're easier to debug if we
catch them while building.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-07 21:20:30 +00:00
Jason Ekstrand
cd4c1458ba nir/builder: Emit better code for iadd/imul_imm
Because we already know the immediate right-hand parameter, we can
potentially save the optimizer a bit of work.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-07 21:20:30 +00:00
Rob Clark
ebbb6b8eaa freedreno/a6xx: perfcntrs
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-07 15:33:42 -05:00
Rob Clark
40d8ed5ef3 freedreno/a6xx: fix border-color swizzles
Fixes nearly all of the remaining
dEQP-GLES31.functional.texture.border_clamp.formats.* fails

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-07 15:33:42 -05:00
Rob Clark
f5d80ff2db freedreno/a6xx: refactor fd6_tex_swiz()
We need a version of fd6_tex_swiz() that just returns the composed
swizzle without building part of the TEX_CONST_0 state.  So just
refactor the existing function to build more of the TEX_CONST_0 state,
and leave fd6_tex_swiz() simply composing swizzles.

The small IBO state change (to use LINEAR for smaller sizes/levels) is
to match the state in fd6_tex_const_0().  It seems like maybe tiled
actually works at the smaller sizes but not if minification is in play,
so best just to make images match what we do for textures.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-07 15:33:42 -05:00
Rob Clark
8dc47490c8 freedreno/a6xx: remove astc_srgb workaround
Not used on a6xx, so remove some of the related plumbing that was copied
over from older gens.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-07 15:33:42 -05:00
Rob Clark
45271702ec freedreno: fix ir3_cmdline build
Fixes: 7530d4abfc glsl/freedreno/panfrost: pass gl_context to the standalone compiler
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-07 15:33:20 -05:00
Kenneth Graunke
d53b1b6215 iris: Drop PIPE_CAP_BUFFER_SAMPLER_VIEW_RGBA_ONLY
This cap is mainly for working around a r600 texture swizzle issue,
but it also controls whether ARB_texture_buffer_object (with legacy
formats) is enabled.  I suspect the missing I/L/A/LA faking is why
I had it set in the first place.

Thanks to Ilia for pointing out that I shouldn't be setting this.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-07 11:39:27 -08:00
Kenneth Graunke
809a81ec3a iris: Properly support alpha and luminance-alpha formats
For texturing, we map alpha formats to the corresponding red format,
as many alpha formats are outright missing, and red is more efficient
when sampling anyway.

When rendering to A8_UNORM, we use that format directly, so the image
gets the shader output's .a/.w channel, rather than the .r/.x channel.

All other A* formats are non-renderable, so we can't do much and just
mark them as unsupported for rendering.  Fortunately, GL only requires
rendering to A8_UNORM, so that works out.

According to Andre Heider and Timur Kristóf, this fixes font rendering
in Witcher 1 (via nine).  Andre also reported that it fixes Unigine
Heaven (presumably via nine).

v2: Use the same swizzle for both sampler views and "render targets".
    BLORP expects the read swizzle, and will take the inverse when
    setting up the destination swizzle (and actually applying it in
    the shaders).  We ignore the format swizzle when setting up normal
    rendering SURFACE_STATEs, which is necessary because it would be
    an illegal shader channel select combination.  Thanks to Jason
    Ekstrand for pointing out that BLORP took an inverse swizzle.

Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-07 11:39:27 -08:00
Kenneth Graunke
fbc51c4c95 iris: Defer uploading sampler state tables until draw time
Gallium might call us multiple times to bind subsets of the samplers,
at which point we'd recreate the table a bunch of times.  It doesn't
really buy us anything to do it here - even if we defer to draw time,
the dirty tracking ensures we'll only do it on the first draw after a
bind_sampler_states() call.

We now use the number of samplers specified by the shader instead of
the binding count.  If this number changes, we flag sampler state as
dirty so we re-upload a table with the right number of entries.

This also fixes a bug where ice->state.need_border_colors was never
unset, so once something needed border colors, the pool would always
be pinned in all future batches.

v2: Explicitly flag sampler states as dirty, rather than assuming that
    bind_sampler_states() will be called if the program texture count
    changes.  While this may be true for st/mesa, it isn't the case for
    Gallium HUD.

Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-07 11:39:27 -08:00
Kenneth Graunke
9caabd6c5f iris: Plumb through ISL_SWIZZLE_IDENTITY in buffer surface emitters
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-07 11:39:27 -08:00
Kenneth Graunke
4787bc944a isl: Add a swizzle parameter to isl_buffer_fill_state()
This is necessary for legacy texture buffer object formats, where we'll
need to use a swizzle to fake e.g. luminance.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-07 11:39:27 -08:00
Lionel Landwerlin
575f8e8b60 iris: fix decode_get_bo callback
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: acb50d6b1f ("intel/decoders: handle decoding MI_BBS from ring")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-07 17:39:07 +00:00
Erik Faye-Lund
55e4759c8d virgl: remove unused variable
This variable is now unused, so let's remove it.

Fixes: 9c4930946a (virgl: add encoder functions for new protocol)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-03-07 17:24:54 +00:00
Erik Faye-Lund
44620d4ef7 virgl: remove unused variable
This variable is now unused, so let's remove it.

Fixes: db77573d7b (virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-03-07 17:24:54 +00:00
Erik Faye-Lund
524934586b virgl: remove unused variable
This variable is now unused, so let's remove it.

Fixes: c19aedcf1a (virgl: don't mark unclean after a flush)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-03-07 17:24:54 +00:00
Erik Faye-Lund
af29c93f22 virgl: remove unused variables
These variables are now unused, let's remove them to get rif of a few
warnings.

Fixes: f0e71b1088 (virgl: use transfer queue)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-03-07 17:24:54 +00:00
Lionel Landwerlin
0e269c0ac2 iris: fix decoder call
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: acb50d6b1f ("intel/decoders: handle decoding MI_BBS from ring")
2019-03-07 16:15:03 +00:00
Lionel Landwerlin
0b3871bc7f intel/aub_write: factorize context image/pphwsp/ring creation
We allocate GGTT entries and physical addresses are we create engines
rather than having a fixed layout.

Context images now receive a parameter argument which is used to setup
pml4 & ring buffer addresses.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:32 +00:00
Lionel Landwerlin
c1a2c72e76 intel/aub_write: turn context images arrays into functions
We'll make them more parameterized in a later commit.

As this is just a transitional commit, we allow ourself to leak the
context images allocated in get_context_init(). We'll fix this in the
next commit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:32 +00:00
Lionel Landwerlin
8e14c9b7db intel/aub_write: store the physical page allocator in struct
We want to use this allocator in the next commit for GGTT pages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:32 +00:00
Lionel Landwerlin
0343a3b42b intel/aub_write: log mmio writes
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:32 +00:00
Lionel Landwerlin
6ef46972d9 intel/aub_write: switch to use i915_drm engine classes
Prepare aub write to deal with multiple engine instances. We don't
pass the instance number yet this could be done in the future by
having a 2 dimensional array of struct engine.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:32 +00:00
Lionel Landwerlin
8a81f5c255 intel/aub_write: break execlist write in 2
We want to reuse the execlist submission, but won't need the ring
buffer update.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:32 +00:00
Lionel Landwerlin
69ee5bde4e intel/aub_write: write header in init
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:31 +00:00
Lionel Landwerlin
01443f34b4 intel/aub_write: split comment section from HW setup
In the future we'll want error2aub to reuse the context image saved by
i915 instead of the default one we write in intel_dump_gpu.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:31 +00:00
Lionel Landwerlin
2b42adff14 intel/aub_read: reuse defines from gen_context
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:31 +00:00
Lionel Landwerlin
bf93084f44 intel/decoders: limit number of decoded batchbuffers
IGT has a test to hang the GPU that works by having a batch buffer
jump back into itself, trigger an infinite loop on the command stream.
As our implementation of the decoding is "perfectly" mimicking the
hardware, our decoder also "hangs". This change limits the number of
batch buffer we'll decode before we bail to 100.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:31 +00:00
Lionel Landwerlin
acb50d6b1f intel/decoders: handle decoding MI_BBS from ring
An MI_BATCH_BUFFER_START in the ring buffer acts as a second level
batchbuffer (aka jump back to ring buffer when running into a
MI_BATCH_BUFFER_END).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:31 +00:00
Lionel Landwerlin
ec526d6ba0 intel/decoders: add address space indicator to get BOs
Some commands like MI_BATCH_BUFFER_START have this indicator.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-03-07 15:08:31 +00:00
Eric Engestrom
3e8d5b5ed4 vulkan/overlay: fix missing var rename in previous commit
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-07 13:45:14 +00:00
Eric Engestrom
d141472d0e vulkan/util: use the platform defines in vk.xml instead of hard-coding them
See also: 3d4238d26c "anv: use the platform defines in vk.xml
                                instead of hard-coding them"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-07 11:49:44 +00:00
Andre Heider
a4324dcefb iris: add support for tgsi_to_nir
The Gallium Nine state tracker now works on iris.

Also tested with GALLIUM_HUD and Star Wars: Knights of the Old
Republic on WINE (GL_ATI_fragment_shader).

Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-07 00:38:13 -08:00
Tapani Pälli
8b010f3557 nir: free dead_ctx in case of no progress
Fixes a leak:

  ==7576== 320 (48 direct, 272 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 26
  ==7576==    at 0x4C2EE3B: malloc (vg_replace_malloc.c:309)
  ==7576==    by 0x53EF0E4: ralloc_size (ralloc.c:119)
  ==7576==    by 0x53EF0C2: ralloc_context (ralloc.c:113)
  ==7576==    by 0x5471F64: nir_split_per_member_structs (nir_split_per_member_structs.c:176)
  ==7576==    by 0x51288CF: anv_shader_compile_to_nir (anv_pipeline.c:216)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-03-07 07:40:19 +02:00
Tapani Pälli
4900c0cff4 anv: call blob_finish when done with it
Fixes leaks from anv_device_upload_nir:

  ==7345== 8,192 bytes in 2 blocks are definitely lost in loss record 24 of 24
  ==7345==    at 0x4C2ED78: malloc (vg_replace_malloc.c:308)
  ==7345==    by 0x4C31393: realloc (vg_replace_malloc.c:836)
  ==7345==    by 0x54E0848: grow_to_fit (blob.c:67)
  ==7345==    by 0x54E0BE5: blob_reserve_bytes (blob.c:166)
  ==7345==    by 0x54E0C7C: blob_reserve_intptr (blob.c:186)
  ==7345==    by 0x54704A7: nir_serialize (nir_serialize.c:1091)
  ==7345==    by 0x512F97D: anv_device_upload_nir (anv_pipeline_cache.c:756)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-03-07 07:39:48 +02:00
Tapani Pälli
a9555f37d5 anv: use anv_gem_munmap in block pool cleanup
Use anv_gem_munmap for unmap when softpin in use, this corresponds to
anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind
errors seen for each pool when softpin is in use:

  ==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31
  ==25581==    at 0x50E77E8: anv_gem_mmap (anv_gem.c:96)
  ==25581==    by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543)
  ==25581==    by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477)
  ==25581==    by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920)
  ==25581==    by 0x510B8EB: anv_CreateDevice (anv_device.c:2031)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-07 07:36:28 +02:00
Kenneth Graunke
744b8e1c12 iris: Fix MOCS for blits and clears
I915_MOCS_CACHED is the wrong value.  Expose mocs() and use that.
2019-03-06 18:04:53 -08:00
Timothy Arceri
ecceb076e5 st/glsl: start spilling out common st glsl conversion code
The NIR and TGSI paths are currently intertwined which makes it
not only hard to follow but also makes it hard to take advantage
of the differences in IR.

Here we take the first step to splitting that path apart. With
this we take the opportunity to no longer call the GLSL IR
optimisation passes after the final lowering calls for NIR. We
can instead just use the NIR passes which can produce better code
and should also result in faster compile times.

The speed-up can be measured in some dolphin uber shaders due to
no longer calling lower_if_to_cond_assign() for example
dolphin/ubershaders/120.shader_test goes from ~1.63 -> ~1.53
seconds on my machine.

There are some code changes as a result of not calling
lower_if_to_cond_assign(), this is because it flattens ifs that
contain UBOs where as NIR's peephole select doesn't. This is
were most of the regressions in Max Waves happens with shader-db.

shader-db results (VEGA):

Totals from affected shaders:
SGPRS: 2349056 -> 2349640 (0.02 %)
VGPRS: 1322160 -> 1323300 (0.09 %)
Spilled SGPRs: 21190 -> 21527 (1.59 %)
Spilled VGPRs: 99 -> 99 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 72 -> 72 (0.00 %) dwords per thread
Code Size: 57260904 -> 57270932 (0.02 %) bytes
Compile Time: 1107186 -> 1022942 (-7.61 %) milliseconds
LDS: 786 -> 786 (0.00 %) blocks
Max Waves: 391932 -> 391619 (-0.08 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-06 23:05:20 +00:00
Timothy Arceri
e2fd96a563 radeonsi/nir: stop calling nir_lower_returns()
We now call this for all drivers in glsl_to_nir() instead.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-06 23:05:20 +00:00
Timothy Arceri
673f4f69a8 i965: stop calling nir_lower_returns()
We now call this for all drivers in glsl_to_nir() instead.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-06 23:05:20 +00:00