Commit graph

74032 commits

Author SHA1 Message Date
Bas Nieuwenhuizen
6529daca39 radeonsi: Implement DCC fast clear.
Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
still works. Furthermore, with DCC compression we can directly clear
to a limited set of colors such that we do not need a postprocessing step.

v2 Marek: check dcc_buffer && dirty_level_mask in set_sampler_view

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 17:46:08 +02:00
Roland Scheidegger
205a3ce5c1 gallivm: fix tex offsets with mirror repeat linear
Can't see why anyone would ever want to use this, but it was clearly broken.
This fixes the piglit texwrap offset test using this combination.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Roland Scheidegger
71ff5af5dd gallivm: fix sampling with texture offsets in SoA path
When using nearest filtering and clamp / clamp to edge wrapping results could
be wrong for negative offsets. Fix this by adding the offset before doing
the conversion to int coords (could also use floor instead of trunc int
conversion but probably more complex on "typical" cpu).

This fixes the piglit texwrap offset failures with this filter/wrap combo
(which only leaves the linear/mirror repeat combination broken).

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Roland Scheidegger
fb586e1edb softpipe: fix using non-zero layer in non-array view from array resource
For vertex/geometry shader sampling, this is the same as for llvmpipe - just
use the original resource target.
For fragment shader sampling though (which does not use first-layer based mip
offsets) adjust the sampling code to use first_layer in the non-array cases.
While here also fix up some code which looked wrong wrt buffer texel fetch
(no piglit change).

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Roland Scheidegger
fe707c0373 llvmpipe: fix using non-zero layer in non-array view from array resource
Just need to use resource target not view target when calculating
first-layer based mip offsets. (This is a gl specific problem since
d3d10 does not distinguish between non-array and array resources neither
at the resource nor view level, only at the shader level.)
Fixes new piglit arb_texture_view sampling-2d-array-as-2d-layer test.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-10-24 03:00:33 +02:00
Alex Deucher
830e57b82d radeonsi: add Stoney to si_init_gs_info()
This patch was originally written before stoney support
was merged.  Add stoney.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-10-23 18:56:45 -04:00
Bas Nieuwenhuizen
48b5f104ac radeonsi: Enable DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:30 +02:00
Bas Nieuwenhuizen
81ebd6a882 radeonsi: Add FLUSH_AND_INV_CB_DATA_TS for DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:28 +02:00
Bas Nieuwenhuizen
bb77467df9 radeonsi: Disable operations that do not work with DCC.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:24 +02:00
Bas Nieuwenhuizen
afa357c3b0 radeonsi: Allocate buffers for DCC.
As the alignment requirements can be 32 KiB or more, also adding
an aligned buffer creation function.

DCC is disabled for textures that can be shared as sharing the
DCC buffers has not been implemented yet.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-10-24 00:42:01 +02:00
Marek Olšák
edf6a4537c radeonsi: only apply the SNORM blit workaround to *8_SNORM
Like the comment says. This fixes DCC, which doesn't like blitting RG16
as RGBA8.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
e1c098f238 util/format: add helper util_format_is_snorm8
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
06083046a4 radeonsi: add another requirement for PARTIAL_ES_WAVE
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
0d2cb35f68 radeonsi: merge two ifs setting WD_SWITCH_ON_EOP
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
ca18f12dbb radeonsi: make PARTIAL_ES_WAVE globally dependent on SWITCH_ON_EOI
This catches the other cases that enable SWITCH_ON_EOI.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
2070af2fb1 radeonsi: add one more SWITCH_ON_EOI requirement for Hawaii and VI
The VI condition depends on geometry shaders and MAX_PRIMGRP_IN_WAVE.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
a6b5684e99 radeonsi: only apply the instancing bug workaround to Bonaire
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
96d5879d38 radeonsi: add SWITCH_ON_EOI requirement for 4 SE parts
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
7e056f872f radeonsi: remove unnecessary PARTIAL_VS_WAVE setting for streamout
hardware does this automatically

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
3a157e6e68 radeonsi: allow unbinding vertex shaders
Draw calls without a vertex shader are skipped.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
07b3cc6ecf radeonsi: allow unbinding pixel shaders and remove the dummy shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
50bb2decf7 radeonsi: add draw_vbo check for a NULL pixel shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
ed95cb3a31 radeonsi: add checks for a NULL pixel shader
This will allow removing the dummy PS.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
d842d2f251 gallium/util: add a test for NULL fragment shaders
Just to validate that radeonsi doesn't crash.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-10-24 00:01:20 +02:00
Marek Olšák
dd05824b89 st/mesa: don't load state parameters if there are none
Out of 7063 shaders from my shader-db:
- 6564 (93%) shaders don't have any state parameters.
- 347 (5%) shaders have 1 state parameter for WPOS lowering.
- The remaining 2% have more state parameters, usually matrices.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-10-24 00:01:20 +02:00
Samuel Li
98546bfd03 radeonsi: add Stoney pci ids
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-10-23 17:53:48 -04:00
Samuel Li
bf0d0ce0d5 radeonsi: add support for Stoney asics (v3)
v2 (agd): rebase on mesa master, split pci ids to
separate commit
v3 (agd): use carrizo for llvm processor name for
llvm 3.7 and older

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-10-23 17:53:14 -04:00
Ilia Mirkin
e05021ff72 nvc0: respect edgeflag attribute width
The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with
plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for
the edgeflag in the pointer case.

Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind
complaints about reading uninitialized memory).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-10-23 16:43:06 -04:00
Jose Fonseca
ea421e919a gallivm: Explicitly disable unsupported CPU features.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92214
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-10-23 20:25:19 +01:00
Eric Anholt
70b06fb5d5 vc4: Convert blending to being done in 4x8 unorm normally.
We can't do this all the time, because you want blending to be done in
linear space, and sRGB would lose too much precision being done in 4x8.
The win on instructions is pretty huge when you can, though.

total uniforms in shared programs: 32065 -> 32168 (0.32%)
uniforms in affected programs:     327 -> 430 (31.50%)
total instructions in shared programs: 92644 -> 89830 (-3.04%)
instructions in affected programs:     15580 -> 12766 (-18.06%)

Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
2015-10-23 18:11:21 +01:00
Eric Anholt
8e701fda49 vc4: Add QIR/QPU support for the 8-bit vector instructions. 2015-10-23 18:11:21 +01:00
Eric Anholt
817a7eb588 vc4: Don't try to CSE non-SSA instructions.
This can happen when we're doing destination packing -- we don't know
what's in the rest of the register.

Signed-off-by: Eric Anholt <eric@anholt.net>
2015-10-23 18:11:21 +01:00
Eric Anholt
5b2fb138bc nir: Add opcodes for saturated vector math.
This corresponds to instructions used on vc4 for its blending inside of
shaders.  I've seen these opcodes on other architectures before, but I
think it's the first time these are needed in Mesa.

v2: Rename to 'u' instead of 'i', since they're all 'u'norm (from review
    by jekstrand)
2015-10-23 18:11:21 +01:00
Eric Anholt
1066a372d8 vc4: Add dumping of VC4_PACKET_GL_INDEXED_PRIMITIVE. 2015-10-23 18:11:21 +01:00
Eric Anholt
7d7fbcdf4e vc4: Add a workaround for HW-2116 (state counter wrap fails).
I haven't proven that this happens (I've got other GPU hangs in the
way), but the closed driver also does this and it's documented as an
errata.
2015-10-23 18:11:21 +01:00
Eric Anholt
73f6104532 vc4: Fix missing \n in a perf_debug(). 2015-10-23 18:11:21 +01:00
Kristian Høgsberg Kristensen
8f60dc83f7 i965/fs: Allow copy propagating into new surface access opcodes
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
0cb7d7b4b7 i965/fs: Optimize ssbo stores
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Write groups of enabled components together.

Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
feff21d1a6 i965/fs: Drop offset_reg temporary in ssbo load
Now that we don't read each component one-by-one, we don't need the
temoprary vgrf for the offset. More importantly, this register was type
UD while the nir source was type D. This broke copy propagation and left
a redundant MOV in the generated code.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
0a5a738252 i965/fs: Avoid scalar destinations in emit_uniformize()
The scalar destination registers break copy propagation. Instead compute
the results to a regular register and then reference a component when we
later use the result as a source.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
a19bf6d3cc i965/fs: Don't uniformize surface index twice
The emit_untyped_read and emit_untyped_write helpers already uniformize
the surface index argument. No need to do it before calling them.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
aedc0aab19 i965/fs: Use unsigned immediate 0 when eliminating SHADER_OPCODE_FIND_LIVE_CHANNEL
The destination for SHADER_OPCODE_FIND_LIVE_CHANNEL is always a UD
register.  When we replace the opcode with a MOV, make sure we use a UD
immediate 0 so copy propagation doesn't bail because of non-matching
types.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
24a3a697e5 i965/fs: Read all components of a SSBO field with one send
Instead of looping through single-component reads, read all components
in one go.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Kristian Høgsberg Kristensen
de5a450bd3 i965: Don't use message headers for untyped reads
We always set the mask to 0xffff, which is what it defaults to when no
header is present. Let's drop the header instead.

v2: Only remove header for untyped reads. Typed reads always need the
    header.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
2015-10-23 09:42:28 -07:00
Alejandro Piñeiro
2f1bc1da86 i965/vec4: check opcode on vec4_instruction::reads_flag(channel)
Commit f17b78 added an alternative reads_flag(channel) that returned
if the instruction was reading a specific channel flag. By mistake it
only took into account the predicate, but when the opcode is
VS_OPCODE_UNPACK_FLAGS_SIMD4X2 there isn't any predicate, but the flag
are used.

That mistake caused some regressions on old hw. More information on
this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=92621

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-10-23 18:11:09 +02:00
Eric Anholt
fb064901e9 vc4: Use Rob's NIR-based user clip lowering. 2015-10-23 14:30:15 +01:00
Eric Anholt
b3797a8f88 vc4: Also dump the decimation mode for resolved stores. 2015-10-23 14:30:15 +01:00
Eric Anholt
7516cbd261 vc4: Use VC4_GET_FIELD and other defines in dumping VC4_RENDER_CONFIG. 2015-10-23 14:30:15 +01:00
Eric Anholt
b0963ce758 vc4: Add a sentinel after simulator buffers for buffer overflow detection.
This is a little bit like the mprotect-based fencing I've experimented
with, but it's simple and low overhead.  The downside is that only catches
writes, not reads.

It didn't catch any bad writes on a current piglit run, but may be useful
in the future.
2015-10-23 14:29:07 +01:00
Samuel Iglesias Gonsalvez
f408a13dd3 glsl: fix shader storage block member rules when adding program resources
Commit f24e5e did not take into account arrays of named shader
storage blocks.

Fixes 20 dEQP-GLES31.functional.ssbo.* tests:

dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.per_block_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.shared_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.packed_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std140_instance_array
dEQP-GLES31.functional.ssbo.layout.single_nested_struct_array.single_buffer.std430_instance_array
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.2
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.29
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.33
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.3

V2:
- Rename some variables (Timothy)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-10-23 13:12:43 +02:00