Commit graph

82384 commits

Author SHA1 Message Date
Brian Paul
5c85c3be26 tgsi: simplify tgsi_shader_info::is_msaa_sampler checking
We assert that fullinst->Instruction.Texture != 0 above so no need to
check it in the conditional.  We also have the fullinst->Texture.Texture
value in a local variable, so use it.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:46 -06:00
Brian Paul
86e1768c13 tgsi: collect texture sampler target info in tgsi_scan_shader()
Texture sample instructions specify a sampler unit and texture target
such as "1D", "2D", "CUBE", etc.  Sampler view declarations also specify
the sampler unit and texture target.

This patch checks that the texture instructions agree with the declarations
and collects the texture target type for each sampler unit.

v2: only compare instruction's texture target to the sampler view declaration
target if the instruction is a TEX instruction, not a SAMPLE instruction.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:46 -06:00
Brian Paul
6775268b61 gallium/docs: s/gven/given/ 2016-03-29 18:13:46 -06:00
Brian Paul
75b713455c xlib: add support for GLX_ARB_create_context
This adds the glXCreateContextAttribsARB() function for the xlib/swrast
driver.  This allows more piglit tests to run with this driver.

For example, without this patch we get:
$ bin/fbo-generatemipmap-1d -auto
piglit: error: waffle_config_choose failed due to WAFFLE_ERROR_UNSUPPORTED_
ON_PLATFORM: GLX_ARB_create_context is required in order to request an OpenGL
version not equal to the default value 1.0
piglit: error: Failed to create waffle_config for OpenGL 2.0 Compatibility Context
piglit: info: Failed to create any GL context
PIGLIT: {"result": "skip" }

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:45 -06:00
Brian Paul
d8d029f22b st/mesa: simplify st_generate_mipmap()
The whole st_generate_mipmap() function was overly complicated.  Now
we just call the new _mesa_prepare_mipmap_levels() function to prepare
the texture mipmap memory, then call the generate function which fills
in the texture images.

This fixes a failed assertion in llvmpipe/softpipe which is hit with the
new piglit generatemipmap-base-change test.  Also fixes some device errors
(format mismatches) with the VMware svga driver.

v2: fix a comment typo, per Sinclair

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-03-29 18:13:45 -06:00
Brian Paul
105fe52784 mesa: new _mesa_prepare_mipmap_levels() function for mipmap generation
Simplifies the loops in generate_mipmap_uncompressed() and
generate_mipmap_compressed().  Will be used in the state tracker too.
Could probably be used in the meta code.  If so, some additional
clean-ups can be done after that.

v2: use unsigned types instead of GLuint, per Ian

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-29 18:13:45 -06:00
Kenneth Graunke
d4a5a61d44 i965: Don't use CUBE wrap modes for integer formats on IVB/BYT.
There is no linear filtering for integer formats, so we should always
be using CLAMP_TO_EDGE mode.

Fixes 46 dEQP cases on Ivybridge (which were likely broken by commit
0faf26e6a0).

This workaround doesn't appear to be necessary on any other hardware;
I haven't found any documentation mentioning errata in this area.

v2: Only apply on Ivybridge/Baytrail to avoid regressing GLES3.1 tests.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
2016-03-29 15:43:18 -07:00
Kenneth Graunke
f8c69fbb54 Revert "i965: Set address rounding bits for GL_NEAREST filtering as well."
This reverts commit 60d6a8989a.

It's pretty sketchy, and apparently regressed a bunch of dEQP tests
on Sandybridge.
2016-03-29 15:35:07 -07:00
Rovanion Luckey
7087e0ab27 gallium: Format code in pb_buffer_fenced.c according to style guide.
This is a tiny housekeeping patch which does the following:

  * Replaced tabs with three spaces.
  * Formatted oneline and multiline code comments. Some doxygen
    comments weren't marked as such and some code comments were marked
    as doxygen comments.
  * Spaces between if- and while-statements and their parenthesis.

According to the mesa coding style guidelines.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-29 13:44:11 -06:00
Charmaine Lee
2d8df0306b svga: emit sampler declarations in the helper function for non vgpu10
With commit dc9ecf58c0,
we are now getting the sampler target from the sampler view
declaration. But since a sampler view declaration can be defined
after a sampler declaration, we need to emit the
sampler declarations in the pre-helpers function, otherwise,
the sampler target might not have defined yet for the sampler declaration.

Fixes viewperf maya-03 and various gl trace regressions in hwv11.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-29 13:35:09 -06:00
Brian Paul
96e0894106 svga: avoid freeing non-malloced memory
svga_shader_expand() will fall back to using non-malloced memory for
emit.buf if malloc fails. We should check if the memory is malloced
before freeing it in the error path of svga_tgsi_vgpu9_translate.

Original patch by Thomas Hindoe Paaboel Andersen <phomes@gmail.com>.
Remove trivial svga_destroy_shader_emitter() function, by BrianP.

Signed-off-by: Brian Paul <brianp@vmware.com>
2016-03-29 13:35:08 -06:00
Samuel Pitoiset
9d57c84994 nvc0/ir: move load/store lowering pass to handleLDST()
Having all this code in a big switch is not really a good pratice.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-29 19:55:51 +02:00
Christian König
cc68dc2b5e st/mesa: implement new DMA-buf based VDPAU interop v2
Avoid using internal structures from another API.

v2: rebase and moved includes so they don't cause problem when VDPAU isn't installed.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:22 +02:00
Christian König
bdeb22b7b6 st/vdpau: implement the new DMA-buf based interop v2
That should allow us to get away from passing internal structures around.

v2: rebased

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:18 +02:00
Christian König
0042aa508e st/vdpau: move FormatRGBAToPipe into the interop
We are going to need that in the Mesa state tracker as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:14 +02:00
Christian König
faba96bc60 st/vdpau: add new interop interface
Use DMA-buf for the VDPAU interop interface instead of using
internal structures.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:29:10 +02:00
Christian König
d180de3532 st/vdpau: use linear layout for output surfaces
Works around a bug in radeonsi and tiling is actually
not very beneficial in this use case.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-03-29 17:28:43 +02:00
Christian König
7eb5e5b8b4 radeonsi: ignore PIPE_BIND_LINEAR in si_is_format_supported v2
Linear layout should work for all not compressed or depth/stencil formats.

v2: restrict it a bit more

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-29 17:28:35 +02:00
Ilia Mirkin
9286cbdd1e st/mesa: enable OES_texture_buffer when all components available
OES_texture_buffer combines bits from a number of desktop extensions.
When they're all available, turn it on.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-29 10:15:21 -04:00
Adam Jackson
5e1aec6db0 glapi/glx: Mark the indirect swapped dispatch functions _X_COLD
A modest size savings:

   text	   data	    bss	    dec	    hex	filename
 264143	  15608	    232	 279983	  445af libglx.so.before
 254303	  15608	    232	 270143	  41f3f libglx.so.after

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-03-29 10:10:57 -04:00
Adam Jackson
ea0f62e45e glapi/glx: Sync some additional error checking from xserver
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2016-03-29 10:10:57 -04:00
Jordan Justen
f56f538ce4 anv/gen7: Fix command parser version test with indirect dispatch
Caught-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 22:30:33 -07:00
Alejandro Piñeiro
dcd41ca87a glsl: raise warning when using uninitialized variables
v2:
 * Take into account out varyings too (Timothy Arceri)
 * Fix style (Timothy Arceri)
 * Use a new ast_expression variable, instead of an
   ast_expression::hir new parameter (Timothy Arceri)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94129

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-29 07:28:57 +02:00
Alejandro Piñeiro
8568d02498 glsl: add is_lhs bool on ast_expression
Useful to know if a expression is the recipient of an assignment
or not, that would be used to (for example) raise warnings of
"use of uninitialized variable" without getting a false positive
when assigning first a variable.

By default the value is false, and it is assigned to true on
the following cases:
 * The lhs assignments subexpression
 * At ast_array_index, on the array itself.
 * While handling the method on an array, to avoid the warning
   calling array.length
 * When computed the cached test expression at test_to_hir, to
   avoid a duplicate warning on the test expression of a switch.

set_is_lhs setter is added, because in some cases (like ast_field_selection)
the value need to be propagated on the expression tree. To avoid doing the
propatagion if not needed, it skips if no primary_expression.identifier is
available.

v2: use a new bool on ast_expression, instead of a new parameter
    on ast_expression::hir (Timothy Arceri)

v3: fix style and some typos on comments, initialize is_lhs default value
    on constructor, to avoid a c++11 feature (Ian Romanick)

v4: some tweaks on comments (Timothy Arceri)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94129

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-29 07:28:57 +02:00
Jason Ekstrand
35e2e96b30 nir: Add a helper for getting the current block from a cursor
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
be98c47528 nir/lower_out_to_temp: Add an "entrypoint" parameter
Previously, the pass assumed that the entrypoint would be whatever function
happened to have the name "main".  We really shouldn't trust in the
function names.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
31a5bec93f nir/lower_out_to_temp: Steal the output's constant initializer
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
38de85f9a5 nir: Add a helper for getting the unique function in a shader
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
49be812be6 nir/sweep: Sweep function parameters
They are no longer in the list of local variables so we need to explicitly
sweep them.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
1be4c61c95 nir/builder: Add a helper for creating undefs
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
6a2479d618 nir/builder: Add a helper for storing to variable derefs
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
77e2ac1da7 nir/builder: Add a helper for building fdot instructions
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
da422663a6 nir: Add a variable_foreach_safe helper
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand
731870fbe3 nir/Makefile: Fix alphabetization
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Ilia Mirkin
b4c0c514b1 mesa: add OES_texture_buffer and EXT_texture_buffer support
Allow ES 3.1 contexts to access the texture buffer functionality.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-28 20:29:29 -04:00
Ilia Mirkin
720670a615 glsl: add OES_texture_buffer and EXT_texture_buffer support
Expose the samplerBuffer/imageBuffer types, and allow the various
functions to operate on them.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-28 20:20:49 -04:00
Ilia Mirkin
74b76c08a3 mesa: add OES_texture_buffer and EXT_texture_buffer extension to table
We need to add a new bit since the GL ES exts require functionality from
a combination of texture buffer extensions as well as images (for
imageBuffer) support. Additionally, not all GPUs support all the texture
buffer functionality (e.g. rgb32 isn't supported by nv50).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-03-28 20:19:14 -04:00
Ilia Mirkin
659beca666 mesa: properly return GetTexLevelParameter queries for buffer textures
This fixes all failures with dEQP tests in this area. While
ARB_texture_buffer_object explicitly says that GetTexLevelParameter & co
should not be supported, GL 3.1 reverses this decision and allows all of
these queries there.

Conversely, there is no text that forbids the buffer-specific queries
from being used with non-buffer images.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-28 20:18:46 -04:00
Kenneth Graunke
4ed4a2af86 glsl: Delete initialized field from uniform storage test.
Timothy deleted this field.  Fixes "make check".

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-03-28 17:02:00 -07:00
Jordan Justen
8dbfa265a4 anv/gen7: DispatchIndirect requires cmd parser 5
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 17:01:35 -07:00
Jordan Justen
1a3adae84a anv/gen7: Save kernel command parser version
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 17:01:35 -07:00
Jordan Justen
f60683b32a anv: Invalidate state cache before L3 partitioning set-up.
Port 10d84ba9f0 to anv.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 17:01:35 -07:00
Jordan Justen
5879cb0251 anv: Fix cache pollution race during L3 partitioning set-up.
Port 0aa4f99f56 to anv.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 17:01:35 -07:00
Timothy Arceri
86d87d1047 mesa: remove initialized field from uniform storage
The only place this was used was in a gallium debug function that
had to be manually enabled.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-29 09:59:03 +11:00
Samuel Pitoiset
b8b3af2932 nvc0: use a different offset for buffers and surfaces
To not overwrite buffers and surfaces information, we need to use
a different offset in the driver constant buffer. Currently, OP_SUQ
is only supported for buffers but this will be slightly updated for
images support.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-29 00:47:28 +02:00
Kenneth Graunke
60d6a8989a i965: Set address rounding bits for GL_NEAREST filtering as well.
Yuanhan Liu decided these were useful for linear filtering in
commit 76669381 (circa 2011).  Prior to that, we never set them;
it seems he tried to preserve that behavior for nearest filtering.

It turns out they're useful for nearest filtering, too: setting
these fixes the following dEQP-GLES3 tests:

functional.fbo.blit.rect.nearest_consistency_mag
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_dst_y
functional.fbo.blit.rect.nearest_consistency_min
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_dst_y

Apparently, BLORP has always set these bits unconditionally.

However, setting them unconditionally appears to regress tests using
texture projection, 3D samplers, integer formats, and vertex shaders,
all in combination, such as:

functional.shaders.texture_functions.textureprojlod.isampler3d_vertex

Setting them on Gen4-5 appears to regress Piglit's
tests/spec/arb_sampler_objects/framebufferblit.

Honestly, it looks like the real problem here is a lack of precision.
I'm just hacking around problems here (as embarassing as it is).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-28 15:28:58 -07:00
Kenneth Graunke
0faf26e6a0 i965: Always use BRW_TEXCOORDMODE_CUBE when seamless filtering.
When using seamless cube map mode and NEAREST filtering, we explicitly
overrode the wrap modes to CLAMP_TO_EDGE.  This was to implement the
following spec text:

   "If NEAREST filtering is done within a miplevel, always apply apply
    wrap mode CLAMP_TO_EDGE."

However, textureGather() ignores the sampler's filtering mode, and
instead returns the four pixels that would be blended by LINEAR
filtering.  This implies that we should do proper seamless filtering,
and include pixels from adjacent cube faces.

It turns out that we can simply delete the NEAREST -> CLAMP_TO_EDGE
overrides.  Normal cube map sampling works by first selecting the
face, and then nearest filtering fetches the closest texel.  If the
nearest texel was on a different face, then that face would have been
chosen.  So it should always be within the face anyway, which
effectively performs CLAMP_TO_EDGE.

Fixes 86 dEQP-GLES31.texture.gather.basic.cube.* tests.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-28 15:25:04 -07:00
Kenneth Graunke
72473658c5 i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs.
Our driver uses the brw_render_cache mechanism to track buffers we've
rendered to and are about to sample from.

Previously, we did a single PIPE_CONTROL with the following bits set:
- Render Target Flush
- Depth Cache Flush
- Texture Cache Invalidate
- VF Cache Invalidate
- Instruction Cache Invalidate
- CS Stall

This combined both "top of pipe" invalidations and "bottom of pipe"
flushes, which isn't how the hardware is intended to be programmed.

The "top of pipe" invalidations may happen right away, without any
guarantees that rendering using those caches has completed.  That
rendering may continue altering the caches.  The "bottom of pipe"
flushes do wait for the rendering to complete.  The CS stall also
prevents further work from happening until data is flushed out.

What we wanted to do was wait for rendering complete, flush the new
data out of the render and depth caches, wait, then invalidate any
stale data in read-only caches.  We can accomplish this by doing the
"bottom of pipe" flushes with a CS stall, then the "top of pipe"
flushes as a second PIPE_CONTROL.  The flushes will wait until the
rendering is complete, and the CS stall will prevent the second
PIPE_CONTROL with the invalidations from executing until the first
is done.

Fixes dEQP-GLES3.functional.texture.specification.teximage2d_pbo
subtests on Braswell and Skylake.  These tests hit the meta PBO
texture upload path, which binds the PBO as a texture and samples
from it, while rendering to the destination texture.  The tests
then sample from the texture.

For now, we leave Gen4-5 alone.  It probably needs work too, but
apparently it hasn't even been setting the (G45+) TC invalidation
bit at all...

v2: Add Sandybridge post-sync non-zero workaround, for safety.

Cc: mesa-stable@lists.freedesktop.org
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-28 15:23:56 -07:00
Kenneth Graunke
de505f7d7b i965: Whack UAV bit when FS discards and there are no color writes.
dEQP-GLES31.functional.fbo.no_attachments.* draws a quad with no
framebuffer attachments, using a shader that discards based on
gl_FragCoord.  It uses occlusion queries to inspect whether pixels
are rendered or not.

Unfortunately, the hardware is not dispatching any pixel shaders,
so discards never happen, and the full quad of pixels increments
PS_DEPTH_COUNT, making the occlusion query results bogus.

To understand why, we have to delve into the WM_INT internal
signalling mechanism's formulas.

The "WM_INT::Pixel Shader Kill Pixel" signal is defined as:

    3DSTATE_WM::ForceKillPixel == ON ||
    (3DSTATE_WM::ForceKillPixel != Off &&
     !WM_INT::WM_HZ_OP &&
     3DSTATE_WM::EDSC_Mode != PREPS &&
     (WM_INT::Depth Write Enable || WM_INT::Stencil Write Enable) &&
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     (3DSTATE_PS_EXTRA::PixelShaderKillsPixels ||
      3DSTATE_PS_EXTRA:: oMask Present to RenderTarget ||
      3DSTATE_PS_BLEND::AlphaToCoverageEnable ||
      3DSTATE_PS_BLEND::AlphaTestEnable ||
      3DSTATE_WM_CHROMAKEY::ChromaKeyKillEnable))

Because there is no depth or stencil buffer, writes to those buffers
are disabled.  So the highlighted condition is false, making the whole
"Kill Pixel" condition false.  This then feeds into the following
"WM_INT::ThreadDispatchEnable" condition:

    3DSTATE_WM::ForceThreadDispatch != OFF &&
    !WM_INT::WM_HZ_OP &&
    3DSTATE_PS_EXTRA::PixelShaderValid &&
    (3DSTATE_PS_EXTRA::PixelShaderHasUAV ||
     WM_INT::Pixel Shader Kill Pixel ||
     WM_INT::RTIndependentRasterizationEnable ||
     (!3DSTATE_PS_EXTRA::PixelShaderDoesNotWriteRT &&
      3DSTATE_PS_BLEND::HasWriteableRT) ||
     (WM_INT::Pixel Shader Computed Depth Mode != PSCDEPTH_OFF &&
      (WM_INT::Depth Test Enable || WM_INT::Depth Write Enable)) ||
     (3DSTATE_PS_EXTRA::Computed Stencil && WM_INT::Stencil Test Enable) ||
     (3DSTATE_WM::EDSC_Mode == 1 && (WM_INT::Depth Test Enable ||
                                     WM_INT::Depth Write Enable ||
                                     WM_INT::Stencil Test Enable)))

Given that there's no depth/stencil testing, no writeable render target,
and the hardware thinks kill pixel doesn't happen, all of these
conditions are false.  We have to whack some bit to make PS invocations
happen.  There are many options.

Curro suggested using the UAV bit.  There's some precedence in doing
that - we set it for fragment shaders that do SSBO/image/atomic writes
when no color buffer writes are enabled.  We can simply include discard
here too.

Fixes 64 dEQP-GLES31.functional.fbo.no_attachments.* tests.

v2: Add a comment suggested and written by Jason Ekstrand.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-28 14:36:47 -07:00
Jason Ekstrand
433cf90650 nir/spirv: Remove the NoContraction hack
NIR now just handles this for us by not fusing if the multiply is marked as
exact.
2016-03-28 13:07:39 -07:00