Otherwise we will end up with an extra instruction to compare the
result of the inot.
On BDW:
total instructions in shared programs: 13060620 -> 13060481 (-0.00%)
instructions in affected programs: 103379 -> 103240 (-0.13%)
helped: 127
HURT: 0
total cycles in shared programs: 256590950 -> 256587408 (-0.00%)
cycles in affected programs: 11324730 -> 11321188 (-0.03%)
helped: 114
HURT: 21
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We turn these from bcsel into inot/b2f combos in order for other
optimisation passes to get further. Once we have finished turn
the ones that remain and are used in more than a single expression
back into a bcsel.
On BDW:
total instructions in shared programs: 13060965 -> 13060297 (-0.01%)
instructions in affected programs: 835701 -> 835033 (-0.08%)
helped: 670
HURT: 2
total cycles in shared programs: 256599536 -> 256598006 (-0.00%)
cycles in affected programs: 114655488 -> 114653958 (-0.00%)
helped: 419
HURT: 240
LOST: 0
GAINED: 1
The 2 HURT is because inserting bcsel creates the only use of
const 1.0 in two shaders from tri-of-friendship-and-madness.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While the below stats are encouraging this pass will also become
very usefull for avoiding regression once
brw_do_channel_expressions() and brw_do_vector_splitting() are
disabled.
On Broadwell:
total instructions in shared programs: 13078787 -> 13060898 (-0.14%)
instructions in affected programs: 1809827 -> 1791938 (-0.99%)
helped: 4527
HURT: 157
total cycles in shared programs: 256562762 -> 256590424 (0.01%)
cycles in affected programs: 159749392 -> 159777054 (0.02%)
helped: 5583
HURT: 2289
total spills in shared programs: 14929 -> 14923 (-0.04%)
spills in affected programs: 62 -> 56 (-9.68%)
helped: 1
HURT: 0
total fills in shared programs: 20144 -> 20141 (-0.01%)
fills in affected programs: 253 -> 250 (-1.19%)
helped: 1
HURT: 3
LOST: 0
GAINED: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I'm going to add a boolean scheduling pass that I want run late, but
after copy propagation and dead code elimination. Yet, I don't want
to have to think about registers. So, move the register conversion
a little later.
No impact on shader-db. Suggested by Jason Ekstrand.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This tries to move comparisons (a common source of boolean values)
closer to their first use. For GPUs which use condition codes,
this can eliminate a lot of temporary booleans and comparisons
which reload the condition code register based on a boolean.
V2: (Timothy Arceri)
- fix move comparision for phis so we dont end up with:
vec1 32 ssa_227 = phi block_34: ssa_1, block_38: ssa_240
vec1 32 ssa_235 = feq ssa_227, ssa_1
vec1 32 ssa_230 = phi block_34: ssa_221, block_38: ssa_235
- add nir_op_i2b/nir_op_f2b to the list of comparisons.
V3: (Timothy Arceri)
- tidy up suggested by Jason.
- add inot/fnot to move comparison list
V4: (Jason Ekstrand)
- clean up move_comparison_source
- get rid of the tuple
- rework phi handling
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is more correct and should also be a tiny bit faster since we're
just comparing pointers instead of calling nir_src_equal.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
before commit f871946594
image_loader_extension was always present in dri2_dpy->extensions,
after that commit it is only present for render nodes.
Its removal broke partial render based on buffer age on (at least)
raspberry pi.
Fixes: f871946594 "egl/dri2: rework dri2_egl_display::extensions storage"
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In parse_identifier, it doesn't stop copying '*pcur'
untill encounter the NULL. As the 'ret' has a
fixed-size buffer, if the '*pcur' has a long string,
there will be a buffer overflow. This patch avoid this.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Fixes building error due to dependency on nir generated headers
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This set context req seq was in the wrong place.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
this is to avoid following compilation error on Android:
error: control may reach end of non-void function [-Werror,-Wreturn-type]
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Geometry and Tessellation stages do handle this as a system value instead.
Fixes:
dEQP-VK.geometry.basic.primitive_id
Reviewed-by: Dave Airlie <ailried@redhat.com>
Sometimes it is useful to disable the "growable" cmdstream buffers for
debugging. (See 419a154d in libdrm)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Now that issues glamor was hitting w/ glsl>=130 (aka missing INSTANCED
bit in vertex attribute state) is fixed, remove hack.
Signed-off-by: Rob Clark <robdclark@gmail.com>
When there are no framebuffer attachments, fb->width and fb->height will
be 0. Subtracting 1 results in 4294967295 which is too large for the
field, causing genxml assertions when trying to create the packet.
In this case, we can just program it to 1.
Caught by dEQP-VK.tessellation.tesscoord.triangles_equal_spacing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In Vulkan, we always have both the TCS and TES available in the same
pipeline, so we can simply use the TCS OutputVertices execution mode
value as the TES PatchVertices built-in.
For GLSL, we handle this in the linker. But we could use this pass
in the case when both TCS and TES are linked together, if we wanted.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
...when the capability bit is set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need to:
- handle the extra array level for per-vertex varyings
- handle the patch qualifier correctly
- assign varying locations
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Use info->tess.
v3: Handle more things in either TCS/TES.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com> [v1]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Annoyingly, SPIR-V lets you specify all of these fields in either the
TCS or TES, which means that we need to be able to store all of them
for either shader stage. Putting them in a union won't work.
Combining both is an easy solution, and given that the TCS struct only
had a single field, it's pretty inexpensive.
This patch renames the combined struct to "tess" to indicate that it's
for tessellation in general, not one of the two stages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
"Function Enable" is what the other stages use.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With shaders using a lot of inputs/outputs, like this (from Gtk+) :
layout(location = 0) in vec2 inPos;
layout(location = 1) in float inGradientPos;
layout(location = 2) in flat int inRepeating;
layout(location = 3) in flat int inStopCount;
layout(location = 4) in flat vec4 inClipBounds;
layout(location = 5) in flat vec4 inClipWidths;
layout(location = 6) in flat ColorStop inStops[8];
layout(location = 0) out vec4 outColor;
we're missing the programming of the input_slots_valid field leading
to an assert further down the backend code.
v2: Use valid slots of the geometry or vertex stage (Jason)
v3: Use helper to find correct vue map (Jason)
v4: Set the valid slots off the previous stages (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix this build error with GCC 4.4.7.
CC nir/nir_opt_copy_prop_vars.lo
nir/nir_opt_copy_prop_vars.c: In function ‘copy_prop_vars_block’:
nir/nir_opt_copy_prop_vars.c:765: error: unknown field ‘deref’ specified in initializer
nir/nir_opt_copy_prop_vars.c:765: warning: missing braces around initializer
nir/nir_opt_copy_prop_vars.c:765: warning: (near initialization for ‘(anonymous).<anonymous>’)
nir/nir_opt_copy_prop_vars.c:765: warning: initialization from incompatible pointer type
Fixes: 62332d139c ("nir: Add a local variable-based copy propagation pass")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This just adds the nir->llvm support, enabling
the extension causes some failures on llvm 3.9 at least,
but this code seems fine.
NIR passes the sampler in src[1].x, and we LLVM/SI requires
it as the last parameters in the coords (coord[2] for 2D,
coord[3] for 2DArray).
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to OpenGL Shading Language 4.50 spec, Section 8.7 "Vector
Relational Functions", functions of this type do not operate on scalar
types, so remove scalar types from signature definitions to make the
behavior consistent with glslangValidator and other drivers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
The foreach loop was called both in the else case and right after. The
indentation seems to indicate that the extra call was from a previous
version with an else section with out curly brackets.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want vue_map->num_slots to be one more than the final slot.
When assigning fixed slots, built-in slots, and non-SSO user varyings,
we do slot++. This leaves "slot" as one past the most recently assigned
slot. But for SSO user varyings, we computed slot based on the varying
location value...and left it at that slot value.
To work around this inconsistency, I made num_slots be "slot + 1" if
separate and "slot" otherwise. The problem is...if there are no user
varyings in SSO mode...then we would have done slot++ when assigning
built-ins, so it would be off by one. This resulted in loops from 0
to vue_map->num_slots hitting a bonus BRW_VARYING_SLOT_PAD at the end.
This used to break the SIMD8 VS/TES backends, but I fixed that in
commit 480d6c1653. It's probably safe
at this point, but we should fix it anyway.
To fix this, do slot++ in all cases. For SSO mode, we overwrite slot
for every varying, so this increment only matters on the last varying.
Because we process varyings in order, this will set slot to 1 more
than the highest assigned slot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
vtn_ssa_value() can produce variable loads, and the cursor might
be after a return statement, causing nir_builder assert failures
about not inserting instructions after a jump.
This fixes:
dEQP-VK.spirv_assembly.instruction.graphics.barrier.in_if
dEQP-VK.spirv_assembly.instruction.graphics.barrier.in_switch
Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes broken depth texturing after:
commit 22639a6e19
Author: Timothy Arceri <timothy.arceri@collabora.com>
Date: Mon Nov 21 00:29:29 2016 +1100
st/mesa: get Version from gl_program rather than gl_shader_program
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Also changed RADV_SHOW_QUEUES to a no compute queue option. That
would make more sense later when the compute queue is established,
but the transfer queue still experimental.
v2: Don't include the trace flag.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>