I'm not sure how I managed to write the SF merge code
(7d8b79f398) without allowing merges with
NOPs. *Everything* we try to merge with will have a NOP on one or the
other side of the instruction, and that's why that commit showed no
benefit.
total instructions in shared programs: 99347 -> 95128 (-4.25%)
instructions in affected programs: 91906 -> 87687 (-4.59%)
3DMMES performance +2.57105% +/- 0.135276% (n=6,8)
This should be a win for most loops, which tend to have uniform control
flow.
More importantly, it exposes important information to live variables: that
the break/continue here means that our jump target may have access to
values that were live on our input. Previously, we were just setting the
exec mask and letting control flow fall through, so an intervening def
between the break and the end of the loop would appear to live variables
as if it screened off the variable, when it didn't actually.
Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a
perturbing of register allocation caused a live variable to get stomped.
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
This is already supported in genX_state.c, expose the extension string.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In an attempt to fix 3DSTATE_DEPTH_BUFFER for stencil-only cases, I
accidentally kept setting the SurfaceType to 2D in the stencil-only case
thanks to a copy+paste error.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The buffer_size does not take the offset into account. Just add the
offset into the pointer which lines up the structures much better.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The number has to be less than or equal to the max, not just less than.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The components count the number of individual values, not the number of
slots.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
There is no support for resuming streamout. Furthermore, this also
controls glDrawTransformFeedback functionality which requires the same
ability to query how many primitives were sent out of TF.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
We need to take the instance divisor and number of instances into
account for instanced client-side arrays, rather than the vertex
parameters.
Loosely based on the comparable nvc0 logic.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
We now support clearing these, and actually rendering to multiple layers
would require GS support, which will fail in much more spectacular ways
for now. Once that is hooked up, there won't be anything else to do
here.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Since we don't pass a renderTargetArrayIndex in, and the current hot
tile may be for a different index, we may end up loading the RTAI=0 into
the hot tile for no reason.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Internal docs don't mention it, but they also don't mention that the bug
has been fixed (like other CI bugs fixed in VI).
Vulkan does this too.
v2: also update r600_gfx_write_fence_dwords
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
34953f8907 added this bitmask but it wasn't being reset when
a program was relinked. If a stage was removed from the new
program then it could case a crash as we expect the linked shader
for that stage to not be null.
Fixes crashes in:
ESEXT-CTS.tessellation_shader.single.xfb_captures_data_from_correct_stage
ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98917
Looks like immed branch offset size increased again.. making what we
think is a small negative number look to hw like a huge positive number.
And things go badly when shader tries to jump to hyperspace.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Android doesn't build all the files that normal linux/autotools build
does (mainly standalond ir3_compiler).. but possibly we should pull
C_SOURCES + aNxx_SOURCES into a single variable picked up by both
Android.mk and Makefile.am? (Suggested by Rob H.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Set the include paths to consider in-tree headers before out-of-tree
headers.
Avoids the build failing due to stale headers being present in
$prefix. Previosuly 'make -ki install' or something similar was required
to update the out-of-tree headers to allow the build to succeed.
Also avoids having to rebuild the entire thing after every 'make
install'.
Cc: Rob Clark <robdclark@gmail.com>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
On a3xx/a4xx, the SP_VS_VPC_DST_REG.OUTLOCn is offset by 8, so we used
to add this offset into fs->inputs[n].inloc. But a5xx drops this extra
offset-by-8. So instead make inloc zero based and add the offset when
we emit OUTLOCn values (for the gen's that need the offset).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Helps simplify things on a5xx, where pos/psize get added to the vs-out
map. And anyways, simplifies a3xx and a4xx.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Drivers that support this benefit by saving one lowering pass in the
GLSL-to-TGSI conversion.
radeonsi already supports this because all outputs are stored in temporary
variables before the export (except for TCS outputs, which have always
been readable in TGSI anyway due to their special semantics).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With nir_intrinsic_ssbo_atomic_comp_swap we run out of params.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This one was split across two dwords as "Kernel Start Pointer" and
"Kernel Start Pointer High", which looks like it works when the driver
only accesses "Kernel Start Pointer". This breaks, of course, with BO
offsets > 4G.
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When the state fields where shuffled around for gen8, the compare
function enums were downgraded to just uints. Change them to enum
3D_Compare_Function.
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The previous commits got rid of any clashes between #defines and enum
values and we can now emit the genxml enums as debugger friendly C
enums.
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These values were defined both as an enum and as inline values. Remove
the inline values and reference the 3D_Compare_Function enum instead.
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This lets us reference enums in the type attribute of a field.
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>