This patch fixes a classic "confuse the enemy" bug.
_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.
_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"
To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by
one, but missed updating delay-slot calc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It at least happens with some piglit tests, like
$piglit/bin/vp-address-01
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], COLOR
DCL CONST[0..7]
DCL ADDR[0]
0: ARL ADDR[0].x, IN[1].xxxx
1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
2: DP4 OUT[0].x, CONST[4], IN[0]
3: DP4 OUT[0].y, CONST[5], IN[0]
4: DP4 OUT[0].z, CONST[6], IN[0]
5: DP4 OUT[0].w, CONST[7], IN[0]
6: END
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions. This may be slightly conservative, we may only have
this restriction when the first src is also const.
This fixes, for example, +24/-0 of the variable-indexing piglit tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
nir.h is a bit inconsistent about 'typedef struct {} nir_foo' vs
'typedef struct nir_foo {} nir_foo'. But missing struct name tags is
inconvenient when you need a fwd declaration without pulling in all
of nir.
So add missing struct name tag for nir_variable, and a couple other
spots where it would likely be useful.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The ARB has decided that implicit conversions should be performed for
bitwise operators in future language revisions. Implementations of
current language revisions may or may not perform them.
This patch makes Mesa apply implicti conversions even on current
language versions. Applications appear to expect this behavior,
and there's really no downside to doing so.
Fixes shader compilation in Shadow of Mordor.
Bugzilla: https://www.khronos.org/bugzilla/show_bug.cgi?id=1405
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
In the vertex and fragment stages, the hardware is nice to us and leaves
g0.2 zerod out for us so we can use it for headers. However, in compute,
geometry, and tessellation stages, the hardware is not so nice. In
particular, for compute shaders on BDW, the hardware places some debug bits
in 23:15. As it happens, bit 15 is interpreted by the sampler as the alpha
channel mask. This means that if you use a texturing instruction with a
header in a compute shader, you may randomly get the alpha channel
disabled. Since channel masks affect the return length of the sampler
message, this can lead the GPU to expect a different mlen to the one you
specified in the shader and this, in turn, hangs your GPU.
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
BDW adds the following restriction: "When multiplying DW x DW, the dst
cannot be accumulator."
Cc: "11.1,11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The cleaning up was quite a performance hog (making pipe_resource_reference
the number two in profilers on the vertex path, and 3rd overall, with its
cousin pipe_reference_described not far behind) if there were lots
of tiny draw calls (ipers). Now the reason was really that it was blindly
calling this for all potential shader views (so 32 each for vs and gs) even
though the app never touched a single one which could have been fixed,
however I can't come up with a good reason why we refcount these. We've got
references, of course, in the sampler views, which should be quite sufficient
as we do all vertex and geometry shader execution fully synchronous.
(Calling prepare_shader_sampling for all draw calls even if there were no
changes looks quite suboptimal too, but generally we don't really expect vs/gs
shader sampling to be used much with llvmpipe, and there's even an early exit
if there aren't any views to avoid the "null loop" albeit it's now no longer
always trying to loop through all 32 slots. Maybe improve another time...).
Of course, if we manage to make vertex loads run asynchronously some day,
we need references again, but adding that back would be the least of the
problems...
Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the
vertex side depends on it (I suppose we'd really wanted a separate flag in
any case).
(Good for a 3% improvement or so in ipers under the right conditions.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was not really a leak per se, but we were referencing the textures for
longer than intended. If textures were set via llvmpipe_set_sampler_views()
(for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they
were referenced in the setup state. However, the only way to unreference them
was by replacing them with another texture, and not when the texture slot
was replaced with a NULL sampler view. (They were then further also referenced
by the scene too which might have additional minor side effects as we limit
the memory size which is allowed to be referenced by a scene in a rather crude
way.) Only setup destruction (at context destruction time) then finally would
get rid of the references.
Fix this by noting the number of textures the last time, and unreference
things if the new view is NULL (avoiding having to unreference things
always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked).
Found by code inspection, no test...
v2: rename var
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
genX_image_view_init allocates up to 3 separate SURFACE_STATE structures,
and populates each from a single template. Stop mutating the template
between each final SURFACE_STATE.
SF_CLIP_VIEWPORT does not clamp Z values. It only scales and shifts
them. Clamping to VkViewport::minDepth,maxDepth is instead handled by
CC_VIEWPORT.
Fixes dEQP-VK.renderpass.simple.depth on Broadwell.
vkCmdBeginRenderPass, vkCmdNextSubpass, and vkBeginCommandBuffer with
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT, all *setup* the
command buffer for recording commands for some subpass. But only the
first two, vkCmdBeginRenderPass and vkCmdNextSubpass, can *start*
a subpass.
Therefore, calling anv_cmd_buffer_begin_subpass() inside
vkCmdBeginCommandBuffer is misleading. Clarify its purpose by renaming
it to anv_cmd_buffer_set_subpass() and adding comments.
This should improve cache residency for render targets.
Pre-patch, vkCmdBeginRenderPass emitted all the meta clears for
VK_ATTACHMENT_LOAD_OP_CLEAR before any subpass began. Post-patch,
vCmdBeginRenderPass and vkCmdNextSubpass emit only the clears needed for
that current subpass.
This prepares for moving the clear ops from the start of the render pass
into each subpass.
Pipeline N will be used to clear color attachment N of the current
subpass. Currently meta color clears still create a throwaway subpass
with exactly one attachment, so currently only pipeline 0 is used.
This is an ugly hack to workaround the compiler's current inability to
dynamically set the render target index in the render target write
message.
Add anv_graphics_pipeline_create_info::color_attachment_count. If
non-negative, then it overrides the color attachment count in the
pipeline's subpass. Useful for meta. (All the hacks for meta!)
The functions will soon handle clears unrelated to
VK_ATTACHMENT_LOAD_OP_CLEAR, namely vkCmdClearAttachments. So remove
"load" from their name:
emit_load_color_clear -> emit_color_clear
emit_load_depthstencil_clear -> emit_depthstencil_clear
Rewrite anv_cmd_buffer_clear_attachments, which emits the top-of-pass
clears, to use the data provided in anv_cmd_state::attachments. This
prepares for deferring each attachment clear to the first subpass that
uses the attachment.
This array contains attachment state when recording a renderpass instance.
It's populated on each call to anv_cmd_buffer_set_pass.
The data is currently set but unused. We'll use it later to defer each
attachment clear to the subpass that first uses the attachment.
Only modify interpolation type for integer-based varyings or when the
consumer is known and different than fragment shader.
If we are linking separate shader programs and the consumer is unknown,
the consumer could be added later and be a fragment shader. If we
modify the interpolation type in this case, we could read wrong
values in the fragment shader inputs, as shown in bug 93320.
Fixes the following CTS test:
ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate
Fixes the following dEQP tests:
dEQP-GLES31.functional.separate_shader.random.102
dEQP-GLES31.functional.separate_shader.random.111
dEQP-GLES31.functional.separate_shader.random.115
dEQP-GLES31.functional.separate_shader.random.17
dEQP-GLES31.functional.separate_shader.random.22
dEQP-GLES31.functional.separate_shader.random.23
dEQP-GLES31.functional.separate_shader.random.3
dEQP-GLES31.functional.separate_shader.random.32
dEQP-GLES31.functional.separate_shader.random.39
dEQP-GLES31.functional.separate_shader.random.64
dEQP-GLES31.functional.separate_shader.random.73
dEQP-GLES31.functional.separate_shader.random.91
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93320
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This shouldn't hurt anything, and I'm about to introduce a pass that
will want it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes it a pass, hiding the parameter structs and block callbacks
so it's simpler to work with.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can
help with line-wrapping.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
nir_build_ivec4 is more readable and succinct than using nir_build_imm
directly, even if you have C99.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>