Commit graph

82384 commits

Author SHA1 Message Date
Oded Gabbay
529aa8249a llvmpipe: fix arguments order given to vec_andc
This patch fixes a classic "confuse the enemy" bug.

_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.

_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"

To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-17 21:07:27 +02:00
Rob Clark
02ac91d717 freedreno/ir3: fix mad 3rd src delay calc
In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by
one, but missed updating delay-slot calc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-17 12:21:45 -05:00
Rob Clark
2a6ec1e061 freedreno/ir3: better array register allocation
Detect arrays which don't conflict with each other and allow overlapping
register allocation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:52 -05:00
Rob Clark
6a33c5c0df freedreno/ir3: array offset can be negative
It at least happens with some piglit tests, like
$piglit/bin/vp-address-01

  VERT
  DCL IN[0]
  DCL IN[1]
  DCL OUT[0], POSITION
  DCL OUT[1], COLOR
  DCL CONST[0..7]
  DCL ADDR[0]
    0: ARL ADDR[0].x, IN[1].xxxx
    1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
    2: DP4 OUT[0].x, CONST[4], IN[0]
    3: DP4 OUT[0].y, CONST[5], IN[0]
    4: DP4 OUT[0].z, CONST[6], IN[0]
    5: DP4 OUT[0].w, CONST[7], IN[0]
    6: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:20 -05:00
Rob Clark
ddede497b8 freedreno/ir3: workaround bug/feature
Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions.  This may be slightly conservative, we may only have
this restriction when the first src is also const.

This fixes, for example, +24/-0 of the variable-indexing piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:22:43 -05:00
Rob Clark
ebd3a1fc17 ttn: use writemask for store_var
Only user is freedreno, and after array-rework it can cope.  Avoids
generating loads for a store.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:52 -05:00
Rob Clark
fad158a0e0 freedreno/ir3: array rework
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:08 -05:00
Rob Clark
cc7ed34df9 freedreno/ir3: refactor/simplify cp
If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:46 -05:00
Rob Clark
680664dff9 freedreno/ir3: fix incorrect decoding of mov instructions
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:37 -05:00
Rob Clark
2809c87f90 freedreno/ir3: remove unused tgsi tokens ptr
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:59 -05:00
Rob Clark
fc0d2f7e02 freedreno/ir3: bit of ra refactor
Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:47 -05:00
Rob Clark
d430f443de freedreno/ir3: cosmetic de-indent
Collapse two nested if's into one to reduce indent level.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:33 -05:00
Rob Clark
6f0377d651 ttn: add missing writemask on store_output
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-16 13:35:44 -05:00
Rob Clark
683794fd60 nir/print: const_index is signed
Noticed this with $piglit/bin/vp-address-01

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-01-16 13:35:44 -05:00
Rob Clark
211b0644e6 nir: few missing struct names
nir.h is a bit inconsistent about 'typedef struct {} nir_foo' vs
'typedef struct nir_foo {} nir_foo'.  But missing struct name tags is
inconvenient when you need a fwd declaration without pulling in all
of nir.

So add missing struct name tag for nir_variable, and a couple other
spots where it would likely be useful.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-16 13:35:43 -05:00
Ilia Mirkin
32a9fe013b nv50/ir: add saturate support on ex2
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-16 00:10:56 -05:00
Jeff Muizelaar
e5fefe49f2 gallivm: avoid crashing in mod by 0 with llvmpipe
This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-16 03:36:29 +01:00
Kenneth Graunke
d54a70aa18 glsl: Allow implicit int -> uint conversions for bitwise operators (&, ^, |).
The ARB has decided that implicit conversions should be performed for
bitwise operators in future language revisions.  Implementations of
current language revisions may or may not perform them.

This patch makes Mesa apply implicti conversions even on current
language versions.  Applications appear to expect this behavior,
and there's really no downside to doing so.

Fixes shader compilation in Shadow of Mordor.

Bugzilla: https://www.khronos.org/bugzilla/show_bug.cgi?id=1405
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-01-15 17:53:44 -08:00
Jason Ekstrand
61b0cfd84e i965/fs: Always set channel 2 of texture headers in some stages
In the vertex and fragment stages, the hardware is nice to us and leaves
g0.2 zerod out for us so we can use it for headers.  However, in compute,
geometry, and tessellation stages, the hardware is not so nice.  In
particular, for compute shaders on BDW, the hardware places some debug bits
in 23:15.  As it happens, bit 15 is interpreted by the sampler as the alpha
channel mask.  This means that if you use a texturing instruction with a
header in a compute shader, you may randomly get the alpha channel
disabled.  Since channel masks affect the return length of the sampler
message, this can lead the GPU to expect a different mlen to the one you
specified in the shader and this, in turn, hangs your GPU.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-15 16:44:02 -08:00
Jason Ekstrand
9870f798be i965/fs/generator: Take an actual shader stage rather than a string
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-15 16:44:02 -08:00
Jason Ekstrand
0a6811207f i965/vec4: Use UW type for multiply into accumulator on GEN8+
BDW adds the following restriction: "When multiplying DW x DW, the dst
cannot be accumulator."

Cc: "11.1,11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-15 16:44:02 -08:00
Jason Ekstrand
f509a89082 nir/lower_system_values: Lower vertexID to id+base if needed 2016-01-15 16:15:50 -08:00
Jason Ekstrand
6b64dddd71 anv/batch_chain: Remove padding from the BO before emitting BUFFER_END 2016-01-15 15:59:58 -08:00
Jason Ekstrand
67bf74f020 anv/batch_chain: Don't call current_batch_bo() again
We call it once at the top of the function and then hold on to the pointer.
It shouldn't have changed, so there's no reason to query for it again.
2016-01-15 15:49:32 -08:00
Jason Ekstrand
117cac75d0 nir/spirv: Stop trusting the SPIR-V for the number of texture coordinates 2016-01-15 11:13:51 -08:00
Roland Scheidegger
03f66dfb4b llvmpipe: ditch additional ref counting for vertex/geometry sampler views
The cleaning up was quite a performance hog (making pipe_resource_reference
the number two in profilers on the vertex path, and 3rd overall, with its
cousin pipe_reference_described not far behind) if there were lots
of tiny draw calls (ipers). Now the reason was really that it was blindly
calling this for all potential shader views (so 32 each for vs and gs) even
though the app never touched a single one which could have been fixed,
however I can't come up with a good reason why we refcount these. We've got
references, of course, in the sampler views, which should be quite sufficient
as we do all vertex and geometry shader execution fully synchronous.
(Calling prepare_shader_sampling for all draw calls even if there were no
changes looks quite suboptimal too, but generally we don't really expect vs/gs
shader sampling to be used much with llvmpipe, and there's even an early exit
if there aren't any views to avoid the "null loop" albeit it's now no longer
always trying to loop through all 32 slots. Maybe improve another time...).
Of course, if we manage to make vertex loads run asynchronously some day,
we need references again, but adding that back would be the least of the
problems...
Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the
vertex side depends on it (I suppose we'd really wanted a separate flag in
any case).
(Good for a 3% improvement or so in ipers under the right conditions.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-15 20:13:45 +01:00
Roland Scheidegger
2f9a325b6a llvmpipe: fix "leaking" textures
This was not really a leak per se, but we were referencing the textures for
longer than intended. If textures were set via llvmpipe_set_sampler_views()
(for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they
were referenced in the setup state. However, the only way to unreference them
was by replacing them with another texture, and not when the texture slot
was replaced with a NULL sampler view. (They were then further also referenced
by the scene too which might have additional minor side effects as we limit
the memory size which is allowed to be referenced by a scene in a rather crude
way.) Only setup destruction (at context destruction time) then finally would
get rid of the references.
Fix this by noting the number of textures the last time, and unreference
things if the new view is NULL (avoiding having to unreference things
always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked).
Found by code inspection, no test...

v2: rename var

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-15 20:13:45 +01:00
Chad Versace
0e420cb67f anv: Populate SURFACE_STATE more safely
genX_image_view_init allocates up to 3 separate SURFACE_STATE structures,
and populates each from a single template. Stop mutating the template
between each final SURFACE_STATE.
2016-01-15 11:00:22 -08:00
Chad Versace
eab6212efd anv/meta: Stop leaking renderpass and framebuffer 2016-01-15 10:14:07 -08:00
Chad Versace
482a1f5eab anv/meta: Reuse code for vkCmdClear{Color,DepthStencil}Image
The two function bodies were very similar. Move common code to
anv_cmd_clear_image().

Fixes all 'dEQP-VK.renderpass.formats.*' on Skylake.
2016-01-15 07:46:10 -08:00
Chad Versace
1afe33f8b3 anv/gen8: Fix SF_CLIP_VIEWPORT's Z elements
SF_CLIP_VIEWPORT does not clamp Z values. It only scales and shifts
them. Clamping to VkViewport::minDepth,maxDepth is instead handled by
CC_VIEWPORT.

Fixes dEQP-VK.renderpass.simple.depth on Broadwell.
2016-01-14 22:53:05 -08:00
Chad Versace
842b424d3b anv/meta: Implement vkCmdClearDepthStencilImage 2016-01-14 22:53:05 -08:00
Chad Versace
e4b17a2e1a anv/meta: Implement vkCmdClearAttachments 2016-01-14 22:53:05 -08:00
Chad Versace
0038ae2e4a anv/meta: Add VkClearRect param to emit_clear()
Prepares for vkCmdClearAttachments.
2016-01-14 22:53:05 -08:00
Chad Versace
11f5433715 anv: Distinguish between subpass setup and subpass start
vkCmdBeginRenderPass, vkCmdNextSubpass, and vkBeginCommandBuffer with
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT, all *setup* the
command buffer for recording commands for some subpass.  But only the
first two, vkCmdBeginRenderPass and vkCmdNextSubpass, can *start*
a subpass.

Therefore, calling anv_cmd_buffer_begin_subpass() inside
vkCmdBeginCommandBuffer is misleading. Clarify its purpose by renaming
it to anv_cmd_buffer_set_subpass() and adding comments.
2016-01-14 22:53:05 -08:00
Chad Versace
deb8dd89b5 anv: Emit load clears at start of each subpass
This should improve cache residency for render targets.

Pre-patch, vkCmdBeginRenderPass emitted all the meta clears for
VK_ATTACHMENT_LOAD_OP_CLEAR before any subpass began. Post-patch,
vCmdBeginRenderPass and vkCmdNextSubpass emit only the clears needed for
that current subpass.
2016-01-14 22:53:05 -08:00
Chad Versace
0679bef49f anv/meta: Create 8 pipelines for color clears
This prepares for moving the clear ops from the start of the render pass
into each subpass.

Pipeline N will be used to clear color attachment N of the current
subpass. Currently meta color clears still create a throwaway subpass
with exactly one attachment, so currently only pipeline 0 is used.

This is an ugly hack to workaround the compiler's current inability to
dynamically set the render target index in the render target write
message.
2016-01-14 22:53:05 -08:00
Chad Versace
2997b0da4a anv: Allow override of pipeline color attachment count
Add anv_graphics_pipeline_create_info::color_attachment_count. If
non-negative, then it overrides the color attachment count in the
pipeline's subpass. Useful for meta. (All the hacks for meta!)
2016-01-14 22:53:05 -08:00
Chad Versace
13610c03a7 anv/meta: Name the nir shaders
The names appear in debug output.
2016-01-14 22:53:05 -08:00
Chad Versace
6a1a760e3c anv: Move MAX_* defs to top of anv_private.h
Because I need to use MAX_RTS in struct anv_meta_state.
2016-01-14 22:53:05 -08:00
Chad Versace
4c2bafb9bf anv: Define zero() macro
zero(x) memsets x to zero. Eliminates bugs due to errors in memset's
size param.
2016-01-14 22:53:05 -08:00
Chad Versace
f2700d665c anv/meta: Rename emit_load_*_clear funcs
The functions will soon handle clears unrelated to
VK_ATTACHMENT_LOAD_OP_CLEAR, namely vkCmdClearAttachments. So remove
"load" from their name:

    emit_load_color_clear -> emit_color_clear
    emit_load_depthstencil_clear -> emit_depthstencil_clear
2016-01-14 22:53:05 -08:00
Chad Versace
356f952f87 anv/meta: Use anv_cmd_state::attachments for clears
Rewrite anv_cmd_buffer_clear_attachments, which emits the top-of-pass
clears, to use the data provided in anv_cmd_state::attachments. This
prepares for deferring each attachment clear to the first subpass that
uses the attachment.
2016-01-14 22:53:05 -08:00
Chad Versace
a4b045ca44 anv: Add anv_cmd_state::attachments
This array contains attachment state when recording a renderpass instance.
It's populated on each call to anv_cmd_buffer_set_pass.

The data is currently set but unused. We'll use it later to defer each
attachment clear to the subpass that first uses the attachment.
2016-01-14 22:53:05 -08:00
Samuel Iglesias Gonsálvez
781d2787bc glsl: restrict consumer stage condition to modify interpolation type
Only modify interpolation type for integer-based varyings or when the
consumer is known and different than fragment shader.

If we are linking separate shader programs and the consumer is unknown,
the consumer could be added later and be a fragment shader. If we
modify the interpolation type in this case, we could read wrong
values in the fragment shader inputs, as shown in bug 93320.

Fixes the following CTS test:
   ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate

Fixes the following dEQP tests:

dEQP-GLES31.functional.separate_shader.random.102
dEQP-GLES31.functional.separate_shader.random.111
dEQP-GLES31.functional.separate_shader.random.115
dEQP-GLES31.functional.separate_shader.random.17
dEQP-GLES31.functional.separate_shader.random.22
dEQP-GLES31.functional.separate_shader.random.23
dEQP-GLES31.functional.separate_shader.random.3
dEQP-GLES31.functional.separate_shader.random.32
dEQP-GLES31.functional.separate_shader.random.39
dEQP-GLES31.functional.separate_shader.random.64
dEQP-GLES31.functional.separate_shader.random.73
dEQP-GLES31.functional.separate_shader.random.91

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93320
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-15 07:06:41 +01:00
Jason Ekstrand
5d1c2736b6 i965/fs/generator: Change a comment as per jordan's suggestion 2016-01-14 22:03:15 -08:00
Kenneth Graunke
3657cbf24f i965: Apply add_const_offset_to_base for vec4 VS inputs too.
This shouldn't hurt anything, and I'm about to introduce a pass that
will want it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-14 21:32:59 -08:00
Kenneth Graunke
a3500f943e i965: Make add_const_offset_to_base() work at the shader level.
This makes it a pass, hiding the parameter structs and block callbacks
so it's simpler to work with.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-14 21:32:59 -08:00
Kenneth Graunke
824d82025d i965: Make an is_scalar boolean in brw_compile_vs().
Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can
help with line-wrapping.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-14 21:32:59 -08:00
Kenneth Graunke
bb6612f06b nir/builder: Add a nir_build_ivec4() convenience helper.
nir_build_ivec4 is more readable and succinct than using nir_build_imm
directly, even if you have C99.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-14 21:32:59 -08:00