Commit graph

77353 commits

Author SHA1 Message Date
Jason Ekstrand
0079523a0d nir/spirv: Add support for OpSpecConstantOp 2016-01-13 15:18:36 -08:00
Jason Ekstrand
8c408b9b81 nir/spirv/alu: Factor out the opcode table 2016-01-13 15:18:36 -08:00
Jason Ekstrand
9b7e08118b anv/pipeline: Pass through specialization constants 2016-01-13 15:18:36 -08:00
Jason Ekstrand
c95c3b2c21 nir/spirv: Add initial support for specialization constants 2016-01-13 15:18:36 -08:00
Matt Turner
74cff779eb nir: Change bfm's semantics to match Intel/AMD/SM5.
Intel/AMD's hardware instructions do not handle arguments of 32.
Constant evaluation should not produce a result different from the
hardware instruction.

The s/1ull/1u/ change is intentional: previously we wanted defined
behavior for the "1 << 32" case, but we're making this case undefined so
we can make it 1u and save ourselves a 64-bit operation.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-13 11:22:40 -08:00
Matt Turner
a5fcff6628 glsl: Fix undefined shifts.
Shifting into the sign bit is undefined, as is shifting by 32.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-13 11:22:11 -08:00
Matt Turner
966a0dd720 glsl: Handle failure of Python codegen scripts.
If a Python codegen script failed, it would write a zero-byte file,
which on subsequent invocations of make would trick it into thinking the
file was appropriately generated.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-13 10:35:12 -08:00
Kenneth Graunke
84d6130c21 glsl, nir: Make ir_triop_bitfield_extract a vectorized operation.
We would like to be able to combine

   result.x = bitfieldExtract(src0.x, src1.x, src2.x);
   result.y = bitfieldExtract(src0.y, src1.y, src2.y);
   result.z = bitfieldExtract(src0.z, src1.z, src2.z);
   result.w = bitfieldExtract(src0.w, src1.w, src2.w);

into a single ivec4 bitfieldInsert operation.  This should be possible
with most drivers.

This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN.  The type of all three operands will be the same,
for simplicity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-13 10:35:12 -08:00
Kenneth Graunke
b4e198f47f glsl, nir: Make ir_quadop_bitfield_insert a vectorized operation.
We would like to be able to combine

   result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
   result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
   result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
   result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);

into a single ivec4 bitfieldInsert operation.  This should be possible
with most drivers.

This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN.  The type of all four operands will be the same,
for simplicity.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-13 10:35:12 -08:00
Kenneth Graunke
b85a229e1f glsl: Delete the ir_binop_bfm and ir_triop_bfi opcodes.
TGSI doesn't use these - it just translates ir_quadop_bitfield_insert
directly.  NIR can handle ir_quadop_bitfield_insert as well.

These opcodes were only used for i965, and with Jason's recent patches,
we can do this lowering in NIR (which also gains us SPIR-V handling).
So there's not much point to retaining this GLSL IR lowering code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-13 10:35:12 -08:00
Matt Turner
92f1773869 nir: Fix constant evaluation of bfm.
NIR's bfm, like Intel/AMD's hardware instructions and GLSL IR's
ir_binop_bfm takes <bits> as src0 and <offset> as src1.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-13 10:35:12 -08:00
Matt Turner
7dc2e5f940 i965/fs: Skip assertion on NaN.
A shader in Unreal4 uses the result of divide by zero in its color
output, producing NaN and triggering this assertion since NaN is not
equal to itself.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93560
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-13 10:32:53 -08:00
Matt Turner
64800933b8 i965/fs: Add debugging to constant combining pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-01-13 10:32:53 -08:00
Brian Paul
9638c03a4e meta: remove const qualifier on _mesa_meta_fb_tex_blit_begin()
To silence a compiler warning about a const/non-const mismatch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-13 08:02:25 -07:00
Brian Paul
235a299534 st/mesa: fix incorrect buffer token passed to _mesa_BindFramebuffer()
I added this code right at the end, and got it wrong.
Only used by the WGL_ARB_render_texture code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-13 08:01:56 -07:00
Emil Velikov
2065ffb4cf docs: add news item and link release notes for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-13 15:27:50 +02:00
Emil Velikov
183b5ff109 docs: add sha256 checksums for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 4b2d9f29e9)
2016-01-13 15:25:32 +02:00
Emil Velikov
8f16739528 docs: add release notes for 11.1.1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 330aa44a0d)
2016-01-13 15:25:31 +02:00
Neil Roberts
cda886a485 i965/gen9: Don't allow the RGBX formats for texturing/rendering
The RGBX surface formats aren't renderable so we internally remap them
to RGBA when rendering. They are retained as RGBX when used as
textures. However since the previous patch fast clears are disabled
for surfaces that use a different format for rendering than for
texturing. To avoid this situation we can just pretend not to support
RGBX formats at all. This will cause the upper layers of mesa to pick
an RGBA format internally instead. This should be safe because we
always override the alpha component to 1.0 for RGBX in the texture
swizzle anyway. We could also do this for all gens except that it's a
bit more difficult when the hardware doesn't support texture
swizzling. Gens using the blorp have further problems because that
doesn't implement this swizzle override.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2016-01-13 12:16:31 +00:00
Marek Olšák
4ea0febcb0 radeonsi: move POSITION and FACE fragment shader inputs to system values
And FACE becomes integer instead of float.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-13 12:27:28 +01:00
Marek Olšák
caf3c2abea radeonsi: simplify gl_FragCoord behavior
It will become a system value, not an input.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-13 12:27:28 +01:00
Samuel Iglesias Gonsálvez
69c4c75264 glsl: add image_format check in cross_validate_globals()
Fixes CTS test:

ES31-CTS.shader_image_load_store.negative-linkErrors

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93410

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-13 07:01:55 +01:00
Tapani Pälli
e937fd779f mesa: do not validate io of non-compute and compute stage
Fixes regression on SSO tests that have both non-compute and
compute programs in a program pipeline.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93532
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
2016-01-13 07:31:57 +02:00
Tapani Pälli
6b0706b2aa glsl: add packed varyings for outputs with single stage program
Commit 8926dc8 added a check where we add packed varyings of output
stage only when we have multiple stages,  however duplicates are already
handled by changes in commit 0508d950 and we want to add outputs also in
case where we have only one stage.

Fixes regression caused by 8926dc8 for following test:
   ES31-CTS.program_interface_query.separate-programs-vertex

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-13 07:30:46 +02:00
Roland Scheidegger
38cdcb000d llvmpipe: (trivial) use cast wrapper for __m128d to __m128 casts
some compiler was unhappy.
2016-01-13 04:48:41 +01:00
Roland Scheidegger
49ec647c3b llvmpipe: avoid most 64 bit math in rasterization
The trick here is to recognize that in the c + n * dcdx calculations,
not only can the lower FIXED_ORDER bits not change (as the dcdx values
have those all zero) but that this means the sign bit of the calculations
cannot be different as well, that is
sign(c + n*dcdx) == sign((c >> FIXED_ORDER) + n*(dcdx >> FIXED_ORDER)).
That shaves off more than enough bits to never require 64bit masks.
A shifted plane c value could still easily exceed 32 bits, however since we
throw out planes which are trivial accept even before binning (and similarly
don't even get to see tris for which there was a trivial reject plane)) this
is never a problem.
The idea isnt't all that revolutionary, in fact something similar was tried
ages ago (9773722c2b) back when the values were
only 32 bit anyway. I believe now it didn't quite work then because the
adjustment needed for testing trivial reject / partial masks wasn't handled
correctly.
This still keeps the separate 32/64 bit paths for now, as the 32 bit one still
looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled
from setup which would be a good reason to ditch the 32 bit path, we'd need to
change the special-purpose rasterization functions for small tris).

This passes piglit triangle-rasterization (-fbo -auto -max_size
-subpixelbits 8) and triangle-rasterization-overdraw (with some hacks
to make it work correctly with large sizes) easily (full piglit as
well of course, but most tests wouldn't use triangles large enough to
be affected, that is tris with a bounding box over 128x128).
The profiler says indeed time spent in rast_tri functions is reduced
substantially, BUT of course only if the tris are large. I measured a 3%
improvement in mesa gloss demo when supersized to twice the screen size...

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-13 03:50:57 +01:00
Roland Scheidegger
16530fdc82 llvmpipe: scale up bounding box planes to subpixel precision
Otherwise some planes we get in rasterization have subpixel precision, others
not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports
with subpixel accuracy, so could even do bounding box calcs with that).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-13 03:34:59 +01:00
Roland Scheidegger
0298f5aca7 llvmpipe: add sse code for fixed position calculation
This is quite a few less instructions, albeit still do the 2 64bit muls
with scalar c code (they'd need way more shuffles, plus fixup for the signed
mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed
scalar muls natively just fine after all (even on 32bit).

(This still doesn't have a very measurable performance impact in reality,
although profiler seems to say time spent in setup indeed has gone down by
10% or so overall. Maybe good for a 3% or so improvement in openarena.)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-13 03:34:09 +01:00
Roland Scheidegger
9422999e40 draw: fix key comparison with uninitialized value
Discovered by accident, valgrind was complaining (could have possibly caused
us to create redundant geometry shader variants).

v2: convinced by Brian and Jose, just use memset for both gs and vs keys,
just as easy and less error prone.
2016-01-13 02:43:04 +01:00
Jason Ekstrand
610aa00cdf nir/spirv: Add support for OpQuantize 2016-01-12 15:36:38 -08:00
Jason Ekstrand
282a837317 i965: Implement nir_op_fquantize2f16 2016-01-12 15:35:00 -08:00
Jason Ekstrand
15a56459d7 nir: Add a fquantize2f16 opcode
This opcode simply takes a 32-bit floating-point value and reduces its
effective precision to 16 bits.
2016-01-12 15:33:02 -08:00
Timothy Arceri
6143e2d651 mesa: print the invalid enum when CreateShader fails
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-13 09:46:56 +11:00
Jason Ekstrand
aee970c844 anv/device: Bump the max program size again
No one will ever need more than 128K, right?
2016-01-12 13:49:05 -08:00
Kenneth Graunke
c034dbeda8 glsl: Make read_from_write_only_variable_visitor ignore .length().
.length() on an unsized SSBO variable doesn't actually read any data
from the SSBO, and is allowed on variables marked 'writeonly'.

Fixes compute shader compilation in Shadow of Mordor.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-12 12:20:02 -08:00
Kenneth Graunke
9095847c25 i965: Mark TCS URB writes as having side effects.
This adds barrier dependencies around TCS_OPCODE_URB_WRITE, preventing
reads and writes from being incorrectly scheduled.

Fixes rendering in GFXBench 4.0's tessellation demo.

For some reason, we haven't ever listed URB writes as having
side-effects.  This hasn't been a problem because in most stages, we
never read from the URB, and only write to each location once.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93526
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-01-12 12:19:47 -08:00
Kristian Høgsberg Kristensen
d7a193327b vk: Implement workaround for occlusion queries
We have an issue with occlusion queries (PIPE_CONTROL depth writes)
after using the pipeline with the VS disabled. We work around it by
using a depth cache flush PIPE_CONTROL before doing a depth write.

Fixes dEQP-VK.query_pool.*
2016-01-12 11:50:36 -08:00
Jason Ekstrand
6fc278ae4f anv/UpdateDescriptorSets: Respect write.dstArrayElement 2016-01-12 11:45:12 -08:00
Kristian Høgsberg Kristensen
af422fe9b3 Merge ../mesa into vulkan
Merge master again to get the brw_device_info with the
correct slice counts for KBL.
2016-01-12 10:54:26 -08:00
Kristian Høgsberg Kristensen
7df20f0c14 vk: Support SpvBuiltInViewportIndex 2016-01-12 10:53:59 -08:00
Kristian Høgsberg Kristensen
2b4bacb84b vk: Use the correct stride for CC_VIEWPORT structs 2016-01-12 10:53:59 -08:00
Tom St Denis
56fc2986d5 st/omx: Avoid segfault in deconstructor if constructor fails
If the constructor fails before the LIST_INIT calls the pointers
will be null and the deconstructor will segfault.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-01-12 19:13:19 +01:00
Christian König
6f898f740c vl: use preferred format for deinterlacing
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:42 +01:00
Christian König
5fdd4a5aef vl: improve motion adaptive deinterlacer
Handle other formats than YV12 as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:39 +01:00
Christian König
e945235aed st/va: add BOB deinterlacing v2
Tested with MPV.

v2: correctly handle compositor deinterlacing as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:35 +01:00
Christian König
3949cf0e02 st/va: add NV12 -> NV12 post processing v2
Usefull for mpv and GStreamer.

v2: use common functionality for size adjustment.

Signed-off-by: Indrajit-kumar Das <Indrajit-kumar.Das@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:28 +01:00
Christian König
9f644295dc st/va: use vl_video_buffer_adjust_size
Use the new helper function instead of open coding it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:24 +01:00
Christian König
da39637764 st/vdpau: use vl_video_buffer_adjust_size
Use the new helper function instead of open coding it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:21 +01:00
Christian König
52ca9a9b8b vl/buffers: extract vl_video_buffer_adjust_size helper
Useful for the state trackers as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2016-01-12 13:28:16 +01:00
Christian König
8479782361 st/va: make the implementation thread safe v2
Otherwise we might crash with MPV.

v2: minor cleanups suggested on the list.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-01-12 13:26:24 +01:00