There's no guarantee when build_deref_follower is called that the two
derefs have the same bit size destination. Insert a cast on the array
index in case we have differing bit sizes. While we're here, insert
some asserts in build_deref_array and build_deref_ptr_as_array. The
validator will catch violations here but they're easier to debug if we
catch them while building.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Because we already know the immediate right-hand parameter, we can
potentially save the optimizer a bit of work.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes nearly all of the remaining
dEQP-GLES31.functional.texture.border_clamp.formats.* fails
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
We need a version of fd6_tex_swiz() that just returns the composed
swizzle without building part of the TEX_CONST_0 state. So just
refactor the existing function to build more of the TEX_CONST_0 state,
and leave fd6_tex_swiz() simply composing swizzles.
The small IBO state change (to use LINEAR for smaller sizes/levels) is
to match the state in fd6_tex_const_0(). It seems like maybe tiled
actually works at the smaller sizes but not if minification is in play,
so best just to make images match what we do for textures.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This cap is mainly for working around a r600 texture swizzle issue,
but it also controls whether ARB_texture_buffer_object (with legacy
formats) is enabled. I suspect the missing I/L/A/LA faking is why
I had it set in the first place.
Thanks to Ilia for pointing out that I shouldn't be setting this.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For texturing, we map alpha formats to the corresponding red format,
as many alpha formats are outright missing, and red is more efficient
when sampling anyway.
When rendering to A8_UNORM, we use that format directly, so the image
gets the shader output's .a/.w channel, rather than the .r/.x channel.
All other A* formats are non-renderable, so we can't do much and just
mark them as unsupported for rendering. Fortunately, GL only requires
rendering to A8_UNORM, so that works out.
According to Andre Heider and Timur Kristóf, this fixes font rendering
in Witcher 1 (via nine). Andre also reported that it fixes Unigine
Heaven (presumably via nine).
v2: Use the same swizzle for both sampler views and "render targets".
BLORP expects the read swizzle, and will take the inverse when
setting up the destination swizzle (and actually applying it in
the shaders). We ignore the format swizzle when setting up normal
rendering SURFACE_STATEs, which is necessary because it would be
an illegal shader channel select combination. Thanks to Jason
Ekstrand for pointing out that BLORP took an inverse swizzle.
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Gallium might call us multiple times to bind subsets of the samplers,
at which point we'd recreate the table a bunch of times. It doesn't
really buy us anything to do it here - even if we defer to draw time,
the dirty tracking ensures we'll only do it on the first draw after a
bind_sampler_states() call.
We now use the number of samplers specified by the shader instead of
the binding count. If this number changes, we flag sampler state as
dirty so we re-upload a table with the right number of entries.
This also fixes a bug where ice->state.need_border_colors was never
unset, so once something needed border colors, the pool would always
be pinned in all future batches.
v2: Explicitly flag sampler states as dirty, rather than assuming that
bind_sampler_states() will be called if the program texture count
changes. While this may be true for st/mesa, it isn't the case for
Gallium HUD.
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is necessary for legacy texture buffer object formats, where we'll
need to use a swizzle to fake e.g. luminance.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This variable is now unused, so let's remove it.
Fixes: 9c4930946a (virgl: add encoder functions for new protocol)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This variable is now unused, so let's remove it.
Fixes: db77573d7b (virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This variable is now unused, so let's remove it.
Fixes: c19aedcf1a (virgl: don't mark unclean after a flush)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
These variables are now unused, let's remove them to get rif of a few
warnings.
Fixes: f0e71b1088 (virgl: use transfer queue)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
We allocate GGTT entries and physical addresses are we create engines
rather than having a fixed layout.
Context images now receive a parameter argument which is used to setup
pml4 & ring buffer addresses.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We'll make them more parameterized in a later commit.
As this is just a transitional commit, we allow ourself to leak the
context images allocated in get_context_init(). We'll fix this in the
next commit.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We want to use this allocator in the next commit for GGTT pages.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Prepare aub write to deal with multiple engine instances. We don't
pass the instance number yet this could be done in the future by
having a 2 dimensional array of struct engine.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
We want to reuse the execlist submission, but won't need the ring
buffer update.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
In the future we'll want error2aub to reuse the context image saved by
i915 instead of the default one we write in intel_dump_gpu.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
IGT has a test to hang the GPU that works by having a batch buffer
jump back into itself, trigger an infinite loop on the command stream.
As our implementation of the decoding is "perfectly" mimicking the
hardware, our decoder also "hangs". This change limits the number of
batch buffer we'll decode before we bail to 100.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
An MI_BATCH_BUFFER_START in the ring buffer acts as a second level
batchbuffer (aka jump back to ring buffer when running into a
MI_BATCH_BUFFER_END).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Some commands like MI_BATCH_BUFFER_START have this indicator.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
See also: 3d4238d26c "anv: use the platform defines in vk.xml
instead of hard-coding them"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The Gallium Nine state tracker now works on iris.
Also tested with GALLIUM_HUD and Star Wars: Knights of the Old
Republic on WINE (GL_ATI_fragment_shader).
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes a leak:
==7576== 320 (48 direct, 272 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 26
==7576== at 0x4C2EE3B: malloc (vg_replace_malloc.c:309)
==7576== by 0x53EF0E4: ralloc_size (ralloc.c:119)
==7576== by 0x53EF0C2: ralloc_context (ralloc.c:113)
==7576== by 0x5471F64: nir_split_per_member_structs (nir_split_per_member_structs.c:176)
==7576== by 0x51288CF: anv_shader_compile_to_nir (anv_pipeline.c:216)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes leaks from anv_device_upload_nir:
==7345== 8,192 bytes in 2 blocks are definitely lost in loss record 24 of 24
==7345== at 0x4C2ED78: malloc (vg_replace_malloc.c:308)
==7345== by 0x4C31393: realloc (vg_replace_malloc.c:836)
==7345== by 0x54E0848: grow_to_fit (blob.c:67)
==7345== by 0x54E0BE5: blob_reserve_bytes (blob.c:166)
==7345== by 0x54E0C7C: blob_reserve_intptr (blob.c:186)
==7345== by 0x54704A7: nir_serialize (nir_serialize.c:1091)
==7345== by 0x512F97D: anv_device_upload_nir (anv_pipeline_cache.c:756)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Use anv_gem_munmap for unmap when softpin in use, this corresponds to
anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind
errors seen for each pool when softpin is in use:
==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31
==25581== at 0x50E77E8: anv_gem_mmap (anv_gem.c:96)
==25581== by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543)
==25581== by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477)
==25581== by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920)
==25581== by 0x510B8EB: anv_CreateDevice (anv_device.c:2031)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The NIR and TGSI paths are currently intertwined which makes it
not only hard to follow but also makes it hard to take advantage
of the differences in IR.
Here we take the first step to splitting that path apart. With
this we take the opportunity to no longer call the GLSL IR
optimisation passes after the final lowering calls for NIR. We
can instead just use the NIR passes which can produce better code
and should also result in faster compile times.
The speed-up can be measured in some dolphin uber shaders due to
no longer calling lower_if_to_cond_assign() for example
dolphin/ubershaders/120.shader_test goes from ~1.63 -> ~1.53
seconds on my machine.
There are some code changes as a result of not calling
lower_if_to_cond_assign(), this is because it flattens ifs that
contain UBOs where as NIR's peephole select doesn't. This is
were most of the regressions in Max Waves happens with shader-db.
shader-db results (VEGA):
Totals from affected shaders:
SGPRS: 2349056 -> 2349640 (0.02 %)
VGPRS: 1322160 -> 1323300 (0.09 %)
Spilled SGPRs: 21190 -> 21527 (1.59 %)
Spilled VGPRs: 99 -> 99 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 72 -> 72 (0.00 %) dwords per thread
Code Size: 57260904 -> 57270932 (0.02 %) bytes
Compile Time: 1107186 -> 1022942 (-7.61 %) milliseconds
LDS: 786 -> 786 (0.00 %) blocks
Max Waves: 391932 -> 391619 (-0.08 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Eric Anholt <eric@anholt.net>
glsl_to_nir() is still missing support for converting certain
functions to NIR, so for those we use the GLSL IR optimisations
to remove the functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: reimplement layer chain info getters (Eric)
v3: make it compile.. (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This will be used by the overlay instead of system installed
validation layers helpers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
If Vertex Shader uses EdgeFlag the hardware request that it is setup
as the last VERTEX_ELEMENT_STATE. If SGVS are add at draw time we
need to also reconfigure the last 3DSTATE_VF_INSTANCING so its
VertexElementIndex points to the new Vertex Element that contains
the EdgeFlag.
So if draw parameters or edgeflag are not used the CSO generated at
iris_create_vertex_element is sent directly in the batches. But if
edge flag is used we adjust last VERTEX_ELEMENT_STATE and
last 3DSTATE_VF_INSTANCING using their alternative edge flag version
we generate at iris_create_vertex_element and store at the CSO.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
You usually want to go find the highest pressure and figure out why you
couldn't spill or what pattern led to a bunch of pressure leading to that
point.
Fixes: 58bcebd987 ("spirv: Allow [i/u]mulExtended to use new nir opcode")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later. This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything. This substantially reduces both fp64 shader compile times
and the resulting code size.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later. This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything. This substantially reduces both fp64 shader compile times
and the resulting code size. On the vs-isnan-dvec test from piglit:
Before this commit:
1684.63s user 17.29s system 99% cpu 28:28.24 total
101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills.
Peak memory usage (according to massif): 1.435 GB
After this commit:
179.64s user 7.75s system 99% cpu 3:07.92 total
57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills.
Peak memory usage (according to massif): 531.0 MB
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>