Commit graph

73893 commits

Author SHA1 Message Date
Kenneth Graunke
b94cdcdada i965/fs: Properly check for PAD in fragment shaders with > 16 varyings.
Commit 268008f98c changed unused VUE map
slots to be initialized with BRW_VARYING_SLOT_PAD, not COUNT.  I missed
updating this.  It also means that commit message was wrong, as some
code *did* rely slots being initialized to COUNT.

This may fix a bug with SSO programs with > 16 FS input varyings.
I think we probably just emitted extra pointless code, but probably
didn't break anything.  We might also just have no tests for that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Kenneth Graunke
6ae47a3eb4 i965: Update stale comment about unused VUE map slots.
I changed this from COUNT to PAD in commit 268008f98c.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-10-28 22:05:08 -07:00
Ilia Mirkin
5227e91580 nv50/ir: adapt to new method for passing in cull/clip distance masks
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 00:44:22 -04:00
Ilia Mirkin
a5bae7b31d nvc0: share shaders between contexts and build immediately
Avoid deferring building shaders until draw time, should hopefully
reduce any stuttering, as well as enable shader-db style analysis.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 00:44:22 -04:00
Ilia Mirkin
b75fff70d8 nvc0: do upload-time fixups for interpolation parameters
Unfortunately flatshading is an all-or-nothing proposition on nvc0,
while GL 3.0 calls for the ability to selectively specify explicit
interpolation parameters on gl_Color/gl_SecondaryColor which would
override the flatshading setting. This allows us to fix up the
interpolation settings after shader generation based on rasterizer
settings.

While we're at it, we can add support for dynamically forcing all
(non-flat) shader inputs to be interpolated per-sample, which allows
st/mesa to not generate variants for these.

Fixes the remaining failing glsl-1.30/execution/interpolation piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-29 00:44:22 -04:00
Kenneth Graunke
77f58c04cc nir: Copy "patch" flag from ir_variable to nir_variable.
This was introduced in GLSL IR after NIR development had branched.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-28 21:15:29 -07:00
Kenneth Graunke
9c8208f2c1 nir: Add intrinsics for tessellation shader system values.
nir_intrinsic_load_patch_vertices_in corresponds to gl_PatchVerticesIn,
a special input in both the TCS and TES stages.

nir_intrinsic_load_tess_coord corresponds to gl_TessCoord, a special
tessellation evaluation shader input.

nir_intrinsic_load_tess_level_outer/inner correspond to the
gl_TessLevelOuter[] and gl_TessLevelInner[] evaluation shader inputs,
which we treat as system values because they're stored specially.
(These intrinsics are only for the TES - the TCS uses output variables.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-10-28 21:14:53 -07:00
Kenneth Graunke
bf05af3f0e i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.
Consider the case of two nearly identical GLSL fragment shaders:

   out vec4 color;
   void main() { color = vec4(1); }

and

   layout(early_fragment_tests) in;
   out vec4 color;
   void main() { color = vec4(1); }

These shaders compile to the exact same assembly, but have distinct
values for brw_wm_prog_data::early_fragment_tests.

Since these are two independent GLSL shaders, they have different
program keys - notably, brw_wm_prog_key::program_string_id differs.

When uploading the second, brw_upload_cache will find an existing copy
of the assembly in the cache BO, which means matching_data will be
non-NULL.  Although we create a second cache item (with the new key
and prog_data), we set item->offset to the existing copy and avoid
re-uploading duplicate assembly.

However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if
item->offset differed from the supplied offset.  With reuse, both
programs have the same offset, but prog_data changed.  We have to
flag it, but failed to.

To fix this, we simply need to check if the aux (prog_data) pointer
changed.  If either the assembly or the prog_data differs, flag it.

This fixes a regression since 1bba29ed40,
where Topi fixed brw_upload_cache() to actually reuse identical
assembly.  Prior to that, reuse basically never happened due to bugs.
Unfortunately, this code apparently wasn't prepared to handle reuse!

Fixes GPU hangs in Dolphin on Broadwell.

Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this
and helping track down the real issue.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Pierre Bourdon <delroth@gmail.com>
2015-10-28 21:13:54 -07:00
Laurent Carlier
37402014e8 clover: fix building fix clang-3.8
https://bugs.freedesktop.org/show_bug.cgi?id=92705

v2.1: use Linker::Flags::None instead of 0 and emplace_back()

Signed-off-by: Laurent Carlier <lordheavym@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-10-29 12:35:37 +09:00
Ilia Mirkin
d0693d7515 nv50: add ARB_copy_image support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-28 20:53:30 -04:00
Ilia Mirkin
ebbd7b41c0 nvc0: add ARB_copy_image support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-28 20:42:59 -04:00
Julien Isorce
3bbb8715ac nvc0: fix crash when nv50_miptree_from_handle fails
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-10-28 18:26:20 +01:00
Brian Paul
2bf224b3f9 vbo: replace assertion with conditional in vbo_compute_max_verts()
With just the right sequence of per-vertex commands and state changes,
it's possible for this assertion to fail (such as with viewperf11's
lightwave-06-1 test).  Instead of asserting, return 0 so that the
caller knows the VBO is full and needs to be flushed.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2015-10-28 11:03:27 -06:00
Brian Paul
8e9c3070bf mesa: minor formatting fix in get_tex_rgba_compressed() 2015-10-28 11:03:21 -06:00
Marek Olšák
f04f13622f st/mesa: implement ARB_copy_image
I wonder if the craziness was worth it.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-28 11:52:17 +01:00
Marek Olšák
ce9db16e1c gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS
For ARB_copy_image.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-28 11:52:17 +01:00
Marek Olšák
e82c527f1f radeonsi: allow copying between compatible compressed and uncompressed formats
which is where a block in src maps to a pixel in dst and vice versa.
e.g. DXT1 <-> R32G32_UINT
     DXT5 <-> R32G32B32A32_UINT

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-28 11:52:17 +01:00
Marek Olšák
6a4dc1ad49 mesa: set TargetIndex in VDPAURegister*SurfaceNV (v2)
We initialized Target, but not TargetIndex.
This is required since 7d7dd18711.

v2: do it in the right place. Noticed by Brian Paul.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92645

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-28 11:52:17 +01:00
Emil Velikov
bfc73ff10e i965: remove unneeded src_reg copy in emit_shader_time_write
The variable is already of type src_reg. creating a new instance only to
destroy it seems unnecessary.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-28 02:28:38 -07:00
Emil Velikov
0325f68228 i965: remove cache_aux_free_func array
There is only one function that can be called, which is well known at
compilation time.

The abstraction used here seems unnecessary, so let's use a direct call
to brw_stage_prog_data_free() when appropriate, cut down the size of
struct brw_cache.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-28 02:28:38 -07:00
Samuel Iglesias Gonsalvez
74fcc4c41f main: fix GL_MAX_NUM_ACTIVE_VARIABLES value for shader storage blocks
The maximum number of active variables for shader storage blocks should
take into account the specific rules for shader storage blocks, i.e. for
an active shader storage block member declared as an array, an entry
will be generated only for the first array element, regardless of its type.

Fixes 3 dEQP-GLES31.functional.* tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.block_array

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-28 08:40:51 +01:00
Boyuan Zhang
03c92ffbf6 st/vdpau: disable RefPicList for Vdpau HEVC
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Boyuan Zhang
ad2752e94b st/va: add VAAPI HEVC decode support
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Boyuan Zhang
38c3d7cfc4 radeon/uvd: implement and add flag for VAAPI HEVC decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Boyuan Zhang
231605d14d vl: add RefPicList defines for VAAPI HEVC decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2015-10-27 19:09:55 -04:00
Marta Lofstedt
16c49da63a mesa: Draw indirect is not allowed if the default VAO is bound.
From OpenGL ES 3.1 specification, section 10.5:
"DrawArraysIndirect requires that all data sourced for the
command, including the DrawArraysIndirectCommand
structure,  be in buffer objects,  and may not be called when
the default vertex array object is bound."

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-27 12:16:23 +01:00
Marek Olšák
93eb4f9287 winsys/amdgpu: remove the dcc_enable surface flag
dcc_size is sufficient and doesn't need a further comment in my opinion.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marek Olšák
3aebc596b3 radeonsi: add debug flags that disable DCC and DCC fast clear
For debugging, bug reports, etc.
This is not in the radeonsi directory, but it is about radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marek Olšák
235d38584c radeonsi: properly check if DCC is enabled and allocated
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marek Olšák
5bc5dca0cb radeonsi: simplify DCC handling in si_initialize_color_surface
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-10-27 10:49:24 +01:00
Marta Lofstedt
3daa7e5147 mesa: Draw indirect is not allowed when xfb is active and unpaused
OpenGL ES 3.1 specification, section 10.5:
"An INVALID_OPERATION error is generated if
transform feedback is active and not paused."

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-27 08:49:21 +01:00
Marta Lofstedt
2c91e08656 mesa: Draw Indirect return wrong error code on unalinged
From OpenGL 4.4 specification, section 10.4 and
Open GL Es 3.1 section 10.5:
"An INVALID_VALUE error is generated if indirect is not a multiple
of the size, in basic machine units, of uint."

However, the current code follow the ARB_draw_indirect:
https://www.opengl.org/registry/specs/ARB/draw_indirect.txt
"INVALID_OPERATION is generated by DrawArraysIndirect and
DrawElementsIndirect if commands source data beyond the end
of a buffer object or if <indirect> is not word aligned."

V2: After discussions on the list, it was suggested to
only keep the INVALID_VALUE error.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-10-27 08:49:21 +01:00
Samuel Iglesias Gonsalvez
4565b6f4fb main: Remove interface block array index for doing the name comparison
From ARB_program_query_interface spec:

"uint GetProgramResourceIndex(uint program, enum programInterface,
                                   const char *name);
 [...]
 If <name> exactly matches the name string of one of the active resources
 for <programInterface>, the index of the matched resource is returned.
 Additionally, if <name> would exactly match the name string of an active
 resource if "[0]" were appended to <name>, the index of the matched
 resource is returned. [...]"

"A string provided to GetProgramResourceLocation or
 GetProgramResourceLocationIndex is considered to match an active variable
 if:
[...]
   * if the string identifies the base name of an active array, where the
     string would exactly match the name of the variable if the suffix
     "[0]" were appended to the string;
[...]
"

Fixes the following two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Add AoA support (Timothy)
- Apply it too for GetUniformLocation(), GetUniformName() and others
  because ARB_program_interface_query says that they are equivalent
  to GetProgramResourceLocation() and GetProgramResourceName() (Tapani)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-10-27 08:10:04 +01:00
Eric Anholt
3359ad6cda vc4: Add support for copy propagation with unpack flags present.
total instructions in shared programs: 89251 -> 87862 (-1.56%)
instructions in affected programs:     52971 -> 51582 (-2.62%)
2015-10-26 16:48:34 -07:00
Eric Anholt
01ca4f207e vc4: Rewrite the pack instructions as a MOV with a dst pack flag
Another step in reducing the special-casing of instructions.
2015-10-26 16:48:34 -07:00
Eric Anholt
72fa2ae20b vc4: Move dst pack setup out to a helper function with more asserts. 2015-10-26 16:48:34 -07:00
Eric Anholt
99a9a5a345 vc4: Switch the unpack ops to being unpack flags on a mov.
This paves the way for copy propagating our unpacks.  We end up with a
small change on shader-db:

total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs:     19041 -> 18902 (-0.73%)

which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.
2015-10-26 16:48:34 -07:00
Eric Anholt
548b05d53f vc4: Drop some confused code about pack/unpack handling.
At one point I thought packs and unpacks were in the same field of the
instruction.  They aren't.  These instructions therefore never cause a
pack.

total instructions in shared programs: 89472 -> 89390 (-0.09%)
instructions in affected programs:     15261 -> 15179 (-0.54%)
2015-10-26 16:48:34 -07:00
Eric Anholt
a7b424e835 vc4: Reduce MOV special-casing in QIR-to-QPU.
I'm going to introduce some more types of MOV, which also want the elision
of raw MOVs.
2015-10-26 16:48:34 -07:00
Eric Anholt
652a864b25 vc4: Fix up the test for whether the unpack can be from r4.
We can do 16a/16b from float as well.  No difference on shader-db.
2015-10-26 16:48:34 -07:00
Eric Anholt
3d7a088608 vc4: Don't try to follow MOVs across a pack. 2015-10-26 16:48:34 -07:00
Eric Anholt
6eb0760f48 vc4: Only copy propagate raw MOVs.
No problems being fixed, but needed for the new unpack changes.
2015-10-26 16:48:34 -07:00
Eric Anholt
0ccacfa017 vc4: If a QIR source has an unpack set, print it.
Not used yet, but will be.
2015-10-26 16:48:34 -07:00
Kenneth Graunke
8034e7d6f1 glsl: Convert TES gl_PatchVerticesIn into a constant when using a TCS.
When a TCS is present, the TES input gl_PatchVerticesIn is actually a
constant - it's simply the # of output vertices specified by the TCS
layout qualifiers.  So, we can replace the system value with a constant,
which may allow further optimization, and will likely be more efficient.

If the TCS is absent, we can't do this optimization.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-10-26 16:37:07 -07:00
Ian Romanick
8f84a8e257 i965: Add missing close-parenthesis in error messages
Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2015-10-26 16:15:55 -07:00
Ian Romanick
7070c8879a i965: Fix is-renderable check in intel_image_target_renderbuffer_storage
Previously we could create a renderbuffer with format
MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage,
then FAIL to convert the EGLImage back to a renderbuffer because
reasons.  Just use the same check in
intel_image_target_renderbuffer_storage that brw_render_target_supported
uses.

There are more checks in brw_render_target_supported, but I don't think
they are necessary here.  A different approach would be to refactor
brw_render_target_supported to take rb->Format and rb->NumSamples as
parameters (instead of a gl_renderbuffer) and use the new function here.

Fixes:

    ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476
Cc: "10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>
2015-10-26 16:15:55 -07:00
Timothy Arceri
a3d0359aff glsl: keep track of intra-stage indices for atomics
This is more optimal as it means we no longer have to upload the same set
of ABO surfaces to all stages in the program.

This also fixes a bug where since commit c0cd5b var->data.binding was
being used as a replacement for atomic buffer index, but they don't have
to be the same value they just happened to end up the same when binding is 0.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175
2015-10-27 07:03:05 +11:00
Roland Scheidegger
711489648b gallivm: disable f16c when not using AVX
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919a it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-26 16:45:49 +01:00
Julien Isorce
a61be1a798 st/va: pass picture desc to begin and decode
At least vl_mpeg12_decoder uses the picture
desc in begin_frame and decode_bitstream.

https://bugs.freedesktop.org/show_bug.cgi?id=92634

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-10-26 13:53:10 +01:00
Tapani Pälli
8ae4317c36 mesa: add additional checks for uniform location query
Patch adds additional check to make sure we don't return locations for
structures or arrays of structures.

From page 79 of the OpenGL 4.2 spec:
    "A valid name cannot be a structure, an array of structures, or any
    portion of a single vector or a matrix."

v2: use without-array() to simplify code (Timothy)

No Piglit or CTS regressions observed.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-10-26 12:52:17 +02:00