With merged ESGS shaders, the GS part of a wave may be empty, and the
hardware gets confused if any GS messages are sent from that wave. Since
S_SENDMSG is executed even when EXEC = 0, we have to wrap even
non-monolithic GS shaders in an if-block, so that the entire shader and
hence the S_SENDMSG instructions are skipped in empty waves.
This change is not required for TCS/HS, but applying it there as well
simplifies the logic a bit.
Fixes GL45-CTS.geometry_shader.rendering.rendering.*
v2: ensure that the TCS epilog doesn't run for non-existing patches
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The shader that is used to copy vertex data out of the vs/gs shaders to
the user-specified buffer (streamout or SO shader) was not using the
correct offsets.
Adjust the offsets that are used just for the SO shader:
- Make sure that position is handled in the same special way
as in the vs/gs shaders
- Use the correct offset to be passed in the core
- consolidate register slot mapping logic into one function, since it's
been calculated in 2 different places (one for calcuating the slot mask,
and one for the register offsets themselves
Also make room for all attibutes in the backend vertex area.
Fixes:
- all vtk GL2PS tests
- 18 piglit tests (16 ext_transform_feedback tests,
arb-quads-follow-provoking-vertex and primitive-type gl_points
v2:
- take care of more SGV slots in slot mapping logic
- trim feState.vsVertexSize
- fix GS interface and incorporate GS while calculating vsVertexSize
Note that vsVertexSize is used in the core as the one parameter that
controls vertex size between all stages, so it has to be adjusted appropriately
for the whole vs/gs/fs pipeline.
Also note that GS and SO is not fully implemented. This will be addressed
later.
fixes:
- fixes total of 20 piglit tests
CC: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
gcc prior to 4.9 didn't implement <regex>, causing a startup crash
in the swr knob parameter reading code.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Fixes a drawable leak.
Fixes: bbc29393d3 ("st/mesa: create framebuffer iface hash table per
st manager")
Bugzilla: https://bugs.freedesktop.org/101930
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This ports 72e46c988 to radv.
radeonsi: apply a TC L1 write corruption workaround for SI
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were adding pad to size after creating the object, so we could
submit a CS bigger than the bo created for it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports: da7453666a
radeonsi: don't apply the Z export bug workaround to Hainan
to radv.
Just noticed in passing.
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We already have this little optimization for color clears. Now that
we're actually tracking whether or not a slice has any fast-clear
blocks, it's easy enough to add for depth clears too.
Improves performance of GFXBench 4 TRex at 1920x1080 by:
- Skylake GT4: 0.905932% +/- 0.0620197% (n = 30)
- Apollolake: 0.382434% +/- 0.1134730% (n = 25)
v2: (by Ken) Rebase and drop intel_mipmap_tree.c changes, as they're
no longer necessary (other patches already landed to do that part)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When changing the clear value, we need to resolve any fast cleared data.
Previously, we were performing resolves on every slice with HiZ enabled.
We only need to resolve slices that a) have fast clear data, and b)
aren't about to be cleared to the new color. In the latter case, we
were actually doing a resolve, and then a fast clear - when we could
skip both, causing the existing fast cleared area to be updated to the
new clear value for no additional work.
This patch stops using intel_miptree_prepare_access in favor of a more
optimal open coded loop that knows about our clear operation.
v2: (by Ken) Rebase on islification, write a real commit message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This fixes a bug uncovered by:
2412c4c81e
util: Make CLAMP turn NaN into MIN.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We'll fail to flag an error if the context flags appear after the
no-error attribute in the context attribute list.
Delay the check to after attribute parsing to fix this.
Fixes: 4909519a66 ("egl: Add EGL_KHR_create_context_no_error support")
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: add fixes/stable tags, commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We have a few UUIDs, so lets be more specific.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The number of supported waves per thread group has been reduced to 16
with gfx9. Trying to use 32 waves causes hangs, and barriers might
not work correctly with > 16 waves.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The firmware version numbers for SI were wrong. The new numbers are probably
too conservative (we don't have a definitive answer by the firmware team),
but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on
Tahiti (by Gustaw) and on Verde (by myself).
While this is technically adding a feature, it's a feature we thought we had
for a long time. The change is small enough and we're early enough in the 17.2
release cycle that it should still go in.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs.
However, the maximum allowed value of "Vertex URB Entry Read Length"
in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements.
Because we also need to reserve a vertex buffer to upload
VertexIndex/InstanceIndex and another to upload DrawID when needed,
we can only expose 28.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Here the AUX_USAGE_* mode indicates that we have HiZ, so we will have
a HiZ buffer. But Coverity doesn't know that, so it thinks it might
be NULL because we checked hiz_buf != NULL earlier.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Increase the value, not the pointer to the stack variable.
Caught by Coverity (CID 1415574). Not shipped in a real release.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
NewBufferObj() is called when the shared state is allocated so we
wouldn't get this far if it was NULL.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This keeps the flags out of v3d_decode.c's output. In the generated code,
only the unpack functions see any change (where they now get the
restricted start value), and vc4 doesn't use the unpack functions yet.
I was writing the XML such that the address field overlapped various flags
in the alignment bits, which caused pain when trying to unpack for decode.
Instead, keep the XML matching the docs (address fields don't overlap),
and just infer the appropriate shift value during decode.
During pack, the address is just applied to the appropriate bits
already, ignoring the sub-byte start/end fields.
We simply pick r4 if available (anything else would force a MOV), then
round-robin through accumulators (avoids physical regfile RAW delay
slots), then round-robin through the physical regfile.
The effect on instruction count is pretty impressive:
total instructions in shared programs: 76563 -> 74526 (-2.66%)
instructions in affected programs: 66463 -> 64426 (-3.06%)
and we could probably do better with a little heuristic of "if we're going
to choose a physical reg, and other operands of instructions using this as
a src have the same physical regfile, then use the other regfile".
VC4 has had a tension, similar to pre-Sandybridge Intel, where we want to
use low-numbered registers (more parallelism on Intel, fewer delay slots
on vc4), but in order to give instruction scheduling the most freedom to
avoid delays we want to round-robin between registers of the same cost.
Our two heuristics so far have chosen one end or the other of that
tradeoff.
The callback, instead, hands the driver the set of registers that are
available, and the driver gets to make its own choice. This will be used
in vc4 to round-robin between registers of the same cost, and might be
used in the future for improving bank selection.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All the paths looping over adjacency had guards against considering
themselves (the non-obvious one was ra_any_neighbors_conflict(), which has
in_stack set).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I was going to indent this code another level, and decided it would be
easier to read as a helper.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Without this, a BlitFramebuffer would mark the whole framebuffer as being
changed (so we emit loads/stores of all of it) rather than just the
modified subset.
I don't know how I managed to leave this here for so long. Found when
working on a 1:1 overlapping blit extension for X11.
Cc: mesa-stable@lists.freedesktop.org
This gets us automatic CL decoding to a floating-point value, and drops a
magic number from the emit code. 250x250 shader runner tests now say they
have a center of 125.0 instead of 2000.