This avoids the code duplication that caused me to put things in the
wrong place in the previous commit. One used to have extra flushes,
but we moved those out so now these are identical and can be easily
shared.
Commit d6dd57d43c (iris: Add missing depth cache flushes) added the
depth/stencil flushes to the wrong place. I meant to add them to the
iris_upload_dirty_render_state code that emits the packets, but I
accidentally added them to the nearly identical looking code in
iris_restore_render_saved_bos. This meant we missed the actual flushing
at draw time, but instead did pointless flushing on the first draw in a
batch where things are already flushed anyway.
This commit moves them to iris_resolve.c, next to the depth prepares,
similar to what we do for color buffers. i965 does them elsewhere, but
I'm not sure why - this seems like the most consistent place.
1. If we switch the TCS for one with a different number of output
vertices, then the TES's gl_PatchVerticesIn value will change.
We need to re-upload in this case. For now, re-emit constants
whenever the TCS/TES are swapped out.
2. If there is no TCS, then we can't grab gl_PatchVerticesIn from
the TCS info. Since it's a passthrough, we can just use the
primitive's patch count (like the TCS gl_PatchVerticesIn does).
Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Now that we've added a system value uploading mechanism, we may as well
reuse the same system for default tessellation levels. This simplifies
the state upload code a bit.
Also fixes:
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Return immediately if last VS URB entry size is good enough for BLORP
operation
v2: Fix comments (Caio)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Kenneth Graunke<kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: 1) Set IRIS_DIRTY_URB bit (Caio)
2) Get rid of unnecessary function (Caio)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For texturing, we map alpha formats to the corresponding red format,
as many alpha formats are outright missing, and red is more efficient
when sampling anyway.
When rendering to A8_UNORM, we use that format directly, so the image
gets the shader output's .a/.w channel, rather than the .r/.x channel.
All other A* formats are non-renderable, so we can't do much and just
mark them as unsupported for rendering. Fortunately, GL only requires
rendering to A8_UNORM, so that works out.
According to Andre Heider and Timur Kristóf, this fixes font rendering
in Witcher 1 (via nine). Andre also reported that it fixes Unigine
Heaven (presumably via nine).
v2: Use the same swizzle for both sampler views and "render targets".
BLORP expects the read swizzle, and will take the inverse when
setting up the destination swizzle (and actually applying it in
the shaders). We ignore the format swizzle when setting up normal
rendering SURFACE_STATEs, which is necessary because it would be
an illegal shader channel select combination. Thanks to Jason
Ekstrand for pointing out that BLORP took an inverse swizzle.
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Gallium might call us multiple times to bind subsets of the samplers,
at which point we'd recreate the table a bunch of times. It doesn't
really buy us anything to do it here - even if we defer to draw time,
the dirty tracking ensures we'll only do it on the first draw after a
bind_sampler_states() call.
We now use the number of samplers specified by the shader instead of
the binding count. If this number changes, we flag sampler state as
dirty so we re-upload a table with the right number of entries.
This also fixes a bug where ice->state.need_border_colors was never
unset, so once something needed border colors, the pool would always
be pinned in all future batches.
v2: Explicitly flag sampler states as dirty, rather than assuming that
bind_sampler_states() will be called if the program texture count
changes. While this may be true for st/mesa, it isn't the case for
Gallium HUD.
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is necessary for legacy texture buffer object formats, where we'll
need to use a swizzle to fake e.g. luminance.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If Vertex Shader uses EdgeFlag the hardware request that it is setup
as the last VERTEX_ELEMENT_STATE. If SGVS are add at draw time we
need to also reconfigure the last 3DSTATE_VF_INSTANCING so its
VertexElementIndex points to the new Vertex Element that contains
the EdgeFlag.
So if draw parameters or edgeflag are not used the CSO generated at
iris_create_vertex_element is sent directly in the batches. But if
edge flag is used we adjust last VERTEX_ELEMENT_STATE and
last 3DSTATE_VF_INSTANCING using their alternative edge flag version
we generate at iris_create_vertex_element and store at the CSO.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Additional VERTEX_ELEMENT_STATE are used to store basevertex and
baseinstance and drawid updating the DWordLength of the
3DSTATE_VERTEX_ELEMENTS command.
This passes all piglit tests for spec.*draw_parameters.* tests
and VK-GL-CTS KHR-GL45.shader_draw_parameters_tests.* tests.
Now we only mark a dirty_update when parameters are changed or
when we have an indirect draw.
We enable PIPE_CAP_DRAW_PARAMETERS on Iris.
There is no edge flag support in the Vertex Elements setup.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ref: f1374805a8 "drm-uapi: use local files, not system libdrm"
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The Vulkan driver only sets this if color writes are disabled, which
is more conservative - but would require us to inspect blend state.
(If color writes are enabled, we don't need to force anything, because
the internal signal is already correct. But it shouldn't hurt to do so.)
I was misreading i965 - the 3DSTATE_WM::PixelShaderKillsPixel bit from
Gen < 8 needed all of this, but the 3DSTATE_PS_EXTRA bit only needs
prog_data->uses_kill.
In st_nir_lower_uniforms_to_ubo() all UBO access in the shader have
its index incremented to open room for uniforms in constbuf0. So if
we use UBOs, we always need to include the extra binding entry in the
table.
To avoid doing this checks both when compiling the shader and when
assigning binding tables, store the num_cbufs in iris_compiled_shader.
Fixes a bunch of tests from Piglit and CTS that use UBOs but don't use
uniforms or system values. Note that some tests fitting this criteria
were passing because the UBOs were moved to be push
constants (avoiding the problem).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think this was an attempt to work around various sample mask bugs I
had early on. It's not correct. A sample mask of 0 is legal and means
to disable all samples.
Fixes dEQP-GLES31.functional.texture.multisample.*.*sample_mask*
System values are built-in uniforms. We set them up as UBO values, and
might pull or push them. UBO push analysis will take care of that. We
only want to enable push constants if there's an actual range being
pushed. Otherwise, we might get into a scenario where 3DSTATE_PS
enables push constants but 3DSTATE_CONSTANT_PS isn't pushing anything.
This fixes GPU hangs in Broadwell image load store tests which have
unused image param system values but no other uniforms. (We shouldn't
be making those anyway, but that's a separate fix...)
cso_fb->layers is only valid for no-attachment framebuffers. Use the
helper function to get the real value, then stash it so we don't have
to call the helper function on the old value for comparison, or at draw
time for Force Zero RTA Index setting.
This fixes Force Zero RTA Index being set even when attempting layered
rendering.