Commit graph

11465 commits

Author SHA1 Message Date
Marek Olšák
a882067d74 Revert "r300g: allow HiZ with a 16-bit zbuffer"
This reverts commit 631c631cbf.

https://bugs.freedesktop.org/show_bug.cgi?id=66921

Cc: mesa-stable@lists.freedesktop.org
2013-07-15 23:46:01 +02:00
Marek Olšák
7969b567bd r300g/swtcl: fix a lockup in MSAA resolve
Cc: mesa-stable@lists.freedesktop.org
2013-07-15 23:45:22 +02:00
Marek Olšák
22427640b2 r300g/swtcl: fix geometry corruption by uploading indices to a buffer
The splitting of a draw call into several draw commands was broken, because
the split sometimes took place in the middle of a primitive. The splitting
was supposed to be dealing with the case when there are more indices than
the maximum size of a CS.

This commit throws that code away and uses a real index buffer instead.

https://bugs.freedesktop.org/show_bug.cgi?id=66558

Cc: mesa-stable@lists.freedesktop.org
2013-07-15 23:45:16 +02:00
Chia-I Wu
62c546bbf8 ilo: skip 3DSTATE_INDEX_BUFFER when possible
When only the offset to the index buffer is changed, we can skip the
3DSTATE_INDEX_BUFFER if we always use 0 for the offset, and add
(offset / index_size) to Start Vertex Location in 3DPRIMITIVE.
2013-07-14 05:59:52 +08:00
Vinson Lee
b0c3c955ae r600g/sb: Initialize ra_constraint::cost.
Fixes "Uninitialized scalar field" reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2013-07-13 06:57:26 +04:00
Chia-I Wu
8d4ac98549 ilo: move a santiy check into its assert()
The compiler does not know that ilo_3d_pipeline_estimate_size() is pure and
can be eliminated in a release build in gen6_pipeline_end().  Move the call
into the assert().
2013-07-13 07:27:28 +08:00
Chia-I Wu
bf9670270f ilo: mark some states dirty when they are really changed
The checks may seem redundant because cso_context handles them, but
util_blitter does not have access to cso_context.
2013-07-13 06:43:53 +08:00
Chia-I Wu
9047598a8d ilo: clean up ilo_blitter_pipe_begin()
Document why certain states need to be saved, and fix a bug when blitting with
scissor enabled.
2013-07-13 06:43:53 +08:00
Alex Deucher
e0a7565832 r600g: don't use the CB/DB CP COHER logic on r6xx
There are hw bugs.  Flush and inv event is sufficient.

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=66837

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-12 18:07:56 -04:00
Brian Paul
bf86e0e050 nv30: fix KILL_IF breakage
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66858
2013-07-12 10:00:18 -06:00
Brian Paul
46205ab8cc tgsi: rename the TGSI fragment kill opcodes
TGSI_OPCODE_KIL and KILP had confusing names.  The former was conditional
kill (if any src component < 0).  The later was unconditional kill.
At one time KILP was supposed to work with NV-style condition
codes/predicates but we never had that in TGSI.

This patch renames both opcodes:
  TGSI_OPCODE_KIL -> KILL_IF   (kill if src.xyzw < 0)
  TGSI_OPCODE_KILP -> KILL     (unconditional kill)

Note: I didn't just transpose the opcode names to help ensure that I
didn't miss updating any code anywhere.

I believe I've updated all the relevant code and comments but I'm
not 100% sure that some drivers had this right in the first place.
For example, the radeon driver might have llvm.AMDGPU.kill and
llvm.AMDGPU.kilp mixed up.  Driver authors should review their code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-12 08:32:51 -06:00
Brian Paul
919236f3a2 softpipe: silence some MSVC warnings 2013-07-12 08:19:52 -06:00
Christian König
1681bd7f2b radeon/uvd: fall back to shader based decoding for MPEG2 on UVD 2.x v2
UVD 2.x doesn't support hardware decoding of MPEG2, just use shader
based decoding for those chipsets.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66450

v2: fix interlacing as well

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-07-12 10:52:27 +02:00
Christoph Bumiller
9974593dfb r600g: x/y coordinates must be divided by block dim in dma blit
Note: this is a candidate for the 9.1 branch.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-11 19:11:44 -04:00
Chih-Wei Huang
1d9271a95c r600g/sb: Fix Android build v2
Add the sb CXX files to the Android Makefile and also stop using some
c++11 features.

v2 (Vadim Girlin): use &bc[0] instead of bc.begin()
2013-07-12 01:11:04 +04:00
Vadim Girlin
758ac6f918 r600g/sb: improve math optimizations v2
This patch adds support for some math optimizations that are generally
considered unsafe, that's why they are currently disabled for compute
shaders.

GL requirements are less strict, so they are enabled for
for GL shaders by default. In case of any issues with
applications that rely on higher precision than guaranteed by GL,
'sbsafemath' option in R600_DEBUG allows to disable them.

v2 - always set proper src vector size for transformed instructions
   - check for clamp modifier in the expr_handler::fold_assoc

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-07-11 23:01:01 +04:00
Chia-I Wu
79bc245c01 ilo: reduce PIPE_CAP_MAX_TEXTURE_CUBE_LEVELS to 12
So that there are at most (2^22 * 6) texels, lower than the 2^26 limit.
2013-07-11 08:03:27 +08:00
Chia-I Wu
29af29b8dc ilo: correctly initialize undefined registers in fs
Initialize all 4 channels of undefined registers (that is, TEMPs that are used
before being assigned) in FS.
2013-07-11 07:01:51 +08:00
Michel Dänzer
a06ee5a09e radeonsi: Handle TGSI_OPCODE_DDX/Y using local memory
16 more little piglits.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-07-10 18:40:32 +02:00
Michel Dänzer
a6b83c0f23 radeonsi: Handle TGSI_OPCODE_TXD
One more little piglit.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2013-07-10 12:16:38 +02:00
Chia-I Wu
045bf0db52 ilo: honor surface padding requirements
The PRM specifies several padding requirements that we failed to honor.
2013-07-10 12:40:22 +08:00
Zack Rusin
63386b2f66 util: treat denorm'ed floats like zero
The D3D10 spec is very explicit about treatment of denorm floats and
the behavior is exactly the same for them as it would be for -0 or
+0. This makes our shading code match that behavior, since OpenGL
doesn't care and on a few cpu's it's faster (worst case the same).
Float16 conversions will likely break but we'll fix them in a follow
up commit.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-07-09 23:30:55 -04:00
Marek Olšák
1faa375573 r600g: improve the mechanism for recognizing an empty CS
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
287b2fa115 r600g: explicitly flush caches for streamout-based buffer copying & clearing
It's done automatically for vertex buffers, but not for constant buffers,
textures, and colorbuffers.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
7948ed1250 r600g: only flush the caches that need to be flushed during CP DMA operations
This should increase performance if constant uploads are done with the CP DMA,
because only the cache that needs to be flushed is flushed.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
1b40398d02 r600g: split INVAL_READ_CACHES into vertex, tex, and const cache flags
also flushing any cache in evergreen_emit_cs_shader seems to be superfluous
(we don't flush caches when changing the other shaders either)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Alex Deucher
098316211c r600g: adjust flush flags (v3)
1. flush SH with read caches
2. add flag for DB flushes
3. add flag for CB flushes

v2: flush all CBs, remove redundant emit_state variable.
v3: Marek: also set the new flags in r600_context_flush, the CP dma functions,
    and texture_barrier, and rename them

Signed-off-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
862f69fbe1 r600g: don't call buffer_wait in buffer_mmap_sync_with_rings
The winsys should do this, because it measures how much time we spend
in buffer_map doing synchronization, which can be viewed with the gallium
HUD.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
94d294137e r600g: don't read back the MSAA depth buffer if the read flag is not set
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
141b892620 r600g: don't flush the context in texture_transfer_map
the winsys does this automatically

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
ae87aae0c4 r600g: fix texture offset computation for mapped MSAA depth buffers
It was wrong, because the offset shouldn't be applied to MSAA depth buffers.
This small cleanup should prevent such issues in the future.

This fixes a lockup in "piglit/fbo-depthstencil default_fb -samples=n".

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
a3263cca59 r600g: fix color resolve for RGBX8 and RGBX16 integer formats
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
b1a061b81e r600g: enable fast MSAA color clear for array/3D/cube textures
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Marek Olšák
87669c3654 r600g: implement fast MSAA color clear for integer textures
this also fixes the fast clear with multiple colorbuffers and each having
a different format

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-08 20:25:18 +02:00
Christian König
085c695488 r600/uvd: fix check for UVD 2.x
Signed-off-by: Christian König <christian.koenig@amd.com>
2013-07-08 19:51:20 +02:00
Ben Skeggs
c29c6b2b2e nvc0: enable very initial support for nvf0 (GK110)
Shaders need a lot of work still.  Basic stuff generally works, so this
is basically just fine for gnome-shell, OA etc at this point.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2013-07-05 14:15:04 +10:00
Roland Scheidegger
f3bbf65929 gallivm: do per-pixel lod calculations for explicit lod
d3d10 requires per-pixel lod calculations for explicit lod, lod bias and
explicit derivatives, and we should probably do it for OpenGL too - at least
if they are used from vertex or geometry shaders (so doesn't apply to lod
bias) this doesn't just affect neighboring pixels.
Some code was already there to handle this so fix it up and enable it.
There will no doubt be a performance hit unfortunately, we could do better
if we'd knew we had a real vector shift instruction (with variable shift
count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu).
Don't do anything for lod bias and explicit derivatives yet, though
no special magic should be needed for them neither.
Likewise, the size query is still broken just the same.

v2: Use information if lod is a (broadcast) scalar or not. The idea would be
to base this on the actual value, for now just pretend it's a scalar in fs
and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same
code is generated for fs as before).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-04 19:42:04 +02:00
Marek Olšák
30c3e8718d mesa,glsl,gallium: remove GLSLSkipStrictMaxVaryingLimitCheck and dependencies
Not needed with do_dead_builtin_varyings.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-07-02 17:02:14 +02:00
Zack Rusin
1c2e5c223d draw/translate: fix instancing
We were incorrectly computing the buffer offset when using the
instances. The buffer offset is always equal to:
start_instance * stride + (instance_num / instance_divisor) *
stride
We were completely ignoring the start instance quite
often producing instances that completely wrong, e.g. if
start instance = 5, instance divisor = 2, then on the first
iteration it should be:
5 * stride, not (5/2) * stride as we'd have currently, and if
start instance = 1, instance divisor = 3, then on the first
iteration it should be:
1 * stride, not 0 as we'd have.
This fixes it and adjusts all the code to the changes.

Signed-off-by: Zack Rusin <zackr@vmware.com>
2013-06-28 05:21:20 -04:00
Maarten Lankhorst
bf95ca7de0 nvc0: allow frame dropping in h264
The only reason the checks existed were paranoia, when I first
wrote the code I wasn't sure it was correct. Now that I am,
the asserts triggered when XBMC was dropping frames, so remove it.

NOTE: This is a candidate for the 9.1 branch.
2013-07-01 08:47:49 +02:00
Tom Stellard
24fa43675f r300g/compiler: Prevent regalloc from swizzling texture operands v2
https://bugs.freedesktop.org/show_bug.cgi?id=63520

NOTE: This is a candidate for the stable branches.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-06-30 21:38:57 -07:00
Tom Stellard
e2c3640540 r300g/compiler/tests: Add an assembly parser
The assembly parser can be used to load r300 assembly dumps
and run them through any of the r300 compiler passes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-06-30 21:38:57 -07:00
Tom Stellard
ab40d8d56f r300g: Fix make check
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-06-30 21:24:55 -07:00
Grigori Goronzy
30004b20c2 r600g: implement fast color clears for MSAA on evergreen+
Allows MSAA colorbuffers, which have a CMASK automatically and don't
need any further special handling, to be fast cleared. Instead
of clearing the buffer, set the clear color and the CMASK to the
cleared state.

Fast clear is used only when all bound colorbuffers fulfill certain
conditions: a CMASK is required, we have to be able to create a clear
color value for the format and the texture mustn't contain multiple
images. Technically, it should be possible to support array textures
and cubemaps if all images are attached to the framebuffer,
but this does not appear to be common.

v2: fix fast clear check
v3: Marek: - disable fast clear with 128-bit formats, which are unsupported
           - set tex->dirty_level_mask in r600_clear, so that the driver knows
             the resource must be decompressed/expanded
           - return early from r600_clear if there's nothing else to do

Signed-off-by: Marek Olšák <maraeo@gmail.com>
2013-07-01 03:02:43 +02:00
Marek Olšák
b1693194ee r600g/compute: disable unused colorbuffer slots
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2013-07-01 03:02:43 +02:00
Roland Scheidegger
7d430bfab9 llvmpipe: fix timer query if there's no bins
b04a295a4a removed seemingly unnecessary
code in get_query. Turns out this code could in fact be reached - while
timestamps are always binned, if there are no bins (which happens if fb
size is 0) then the rasterization query code filling this in is still
never executed.
So fix this up by filling in some timestamp, but do it at EndQuery time
not GetQuery time which should be more appropriate.
Makes piglit arb_timer_query-timestamp-get happy again.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-29 16:58:02 +02:00
Alex Deucher
d669992e35 radeonsi: disable 2D tiling on CIK for now
Causes GPU hangs.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-06-28 15:17:10 -04:00
Alex Deucher
1357624abc radeonsi: add llvm processor names for CIK
Requires updated llvm.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-06-28 15:17:00 -04:00
Alex Deucher
234d81e6b2 radeonsi: emit PA_SC_RASTER_CONFIG[_1] on cik
Use the golden values for each asic.

Todo: update Kabini and Kaveri.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-06-28 15:16:53 -04:00
Alex Deucher
9d8ad222c6 radeonsi: PA_CL_ENHANCE is privileged on CIK
Needs to be and is set by the kernel.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-06-28 15:16:46 -04:00