Commit graph

27608 commits

Author SHA1 Message Date
Ilia Mirkin
c1e4a6bfbf nv50,nvc0: handle SQRT lowering inside the driver
First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to
find out whether the input is less than 0). Secondly the current
approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced
instead of inf.

Instead we switch to the less accurate rcp(rsq(x)) method - this behaves
nicely for all valid inputs. We still don't do this for DSQRT since the
RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson
steps right now. Eventually we should have a separate library function
for DSQRT that does it more precisely (and perhaps move this lowering to
the post-opt phase).

This fixes a number of dEQP precision tests that were expecting better
behavior for infinite inputs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-13 13:17:24 -04:00
Ilia Mirkin
b3e7fb5234 nv50/ir: avoid folding mul + add if the mul has a dnz
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-13 13:17:24 -04:00
Ilia Mirkin
a651bc027d nvc0: fix blit triangle size to fully cover FB's > 8192x8192
The idea is that a single triangle will cover the whole area being
drawn, allowing the blit shader to do its work. However the max fb size
is 16384x16384, which means that the triangle we draw needs to be twice
that in order to cover the whole area fully. Increase the size of the
triangle to 32768x32768.

This fixes a number of dEQP tests that were failing because a blit was
involved which would miss some of the resulting texture.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-13 13:17:24 -04:00
Rob Clark
01b071d530 freedreno: OUT_RELOC vs OUT_RELOCW fixes
Make sure we use OUT_RELOCW() in cases where the buffer is written to.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
f68c6951b8 freedreno/a4xx: hw binning
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
b3fe196e21 freedreno/a4xx: use generated headers for draw initiator
No need to open-code this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
2224ba5976 freedreno/a4xx: remove RB_RENDER_CONTROL patching
Bitfields where shuffled around for the better on a4xx, so we don't need
any patching on this one.  It appears to be something we set entirely in
the gmem code so no conflict between tiling and render state like we had
in a3xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
8824a765a2 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
476551a21f freedreno/a3xx: move where we deal w/ binning FS
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
dd9135c452 freedreno/a4xx: move where we deal w/ binning FS
Move where we pick dummy FS for binning pass, so the whole driver sees
the same dummy/no-op FS stage.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
09b3447344 freedreno/a3xx: constify the shader variants
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:41 -04:00
Rob Clark
5b955f09f7 freedreno/a4xx: constify the shader variants
Most of the driver just needs read-only access, so constify..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:40 -04:00
Rob Clark
d9395e4ed8 freedreno/a3xx: remove duplicate mark of end of binning cmds
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-03-13 12:23:40 -04:00
Nicolai Hähnle
28d2a7e67c radeonsi: avoid crash when a sampler state is bound for a buffer texture
Sampler states don't really make sense with buffer textures, but they
can be set anyway, so we need to be defensive here. This bug was lurking
for a while and was finally noticed due to PBO uploads setting sampler
states.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94284
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Tested-by: Shawn Starr <shawn.starr@rogers.com>
2016-03-13 09:37:23 -05:00
Boyuan Zhang
6cf120ec77 st/va: add HEVC main 10 profile
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-11 22:33:56 -05:00
Boyuan Zhang
06c862d67d radeon/video: enable HEVC main 10 decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-11 22:33:56 -05:00
Boyuan Zhang
8be9efcce7 radeon/uvd: handle HEVC main 10 decode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-03-11 22:33:56 -05:00
Bas Nieuwenhuizen
417b6721a0 radeonsi: Lazily re-set sampler views after disabling DCC
Clear DCC flags if necessary when binding a new sampler view.

v2: Do not reset DCC flags of bound sampler views.
v3: Check that we have a real texture (Nicolai)

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-11 11:51:15 -05:00
Nicolai Hähnle
e502801d98 r600g: clear compressed_depthtex/colortex_mask when binding buffer texture
Found by inspection of the source based on a bisected bug report.

This bug has been in the code for a long time, but the more recent PBO upload
feature exposed it because it leads to more uses of buffer textures.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94388
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
2016-03-11 08:00:15 -05:00
Ilia Mirkin
a8819fb1ff nvc0: add support for TGSI FMA ops
This will allow the nouveau backend to not try and split up ops that are
fused in GLSL.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-10 22:34:28 -05:00
Nicolai Hähnle
59c5508b9a radeonsi: update compressed_colortex_masks when a cmask is created or disabled
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:52 -05:00
Nicolai Hähnle
da68a9b215 radeonsi: move si_decompress_textures to si_blit.c
Since it is all about calling into blitter functions, it makes more
sense here. This change also reduces the size of the interfaces between
.c files.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:49 -05:00
Nicolai Hähnle
f03c9e5692 r600g: update compressed_colortex_masks when a cmask is created or disabled
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:46 -05:00
Nicolai Hähnle
784269aa40 gallium/radeon: notify all contexts when cmasks are enabled/disabled
There is an annoying corner case that I stumbled across while looking into
piglit's arb_shader_image_load_store/execution/load-from-cleared-image.shader_test
(which can be easily adapted to demonstrate the bug without the
ARB_shader_image_load_store extension)

When we bind a texture and then clear it using glClear (by attaching it
to the current framebuffer) for the first time, we allocate a separate
cmask for the texture to do fast clear, but the corresponding bit in
compressed_colortex_mask is not set. Subsequent rendering will use
incorrect data.

Conversely, when a currently bound texture with an existing cmask is
exported leading to that cmask being disabled, the compressed_colortex_mask
bit will remain set, leading to an assertion later on in debug builds.

Since iterating through all contexts and/or remembering where every
texture is bound would be costly, and cmask enable/disable should be
rare, we will maintain a global counter to signal contexts that they
must update their compressed_colortex_masks.

This patch introduces the global counter, and subsequent patches will
do the mask update.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-10 18:22:00 -05:00
Tim Rowley
84f857bef7 gallium/swr: remove use of BYTE from swr driver
Remove use of a win32-style type leaked from the swr rasterizer.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-03-10 11:20:58 -06:00
Samuel Pitoiset
dad3e5f4ef nvc0: expose SM35 perf counters to AMD_performance_monitor
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:40 +01:00
Samuel Pitoiset
0e511400de nvc0: add driver metrics for SM35 (GK110)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:38 +01:00
Samuel Pitoiset
bf840aa523 nvc0: add MP performance counters for SM35 (GK110)
Because compute support is not enabled by default for these chipsets,
NVF0_COMPUTE=1 needs to be used, along with GALLIUM_HUD to enable
performance counters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:35 +01:00
Samuel Pitoiset
f289e99dee nvc0: explode config of Kepler hardware SM events
This is really verbose but most of the configuration will be reused
for SM35 (GK110).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:32 +01:00
Samuel Pitoiset
a0ce8536b3 nvc0: rework the driver metrics infrastructure
This follows the same design as MP perf counters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:29 +01:00
Samuel Pitoiset
41fb87249a nvc0: rework the MP counters infrastructure
This mainly improves how we define the different list of queries.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-10 18:20:26 +01:00
Vinson Lee
d46feee697 nouveau: Fix clang reserved-user-defined-literal error.
CXX      codegen/nv50_ir.lo
In file included from codegen/nv50_ir.cpp:28:
./nouveau_debug.h:19:30: error: invalid suffix on literal; C++11 requires a space between literal and identifier
      [-Wreserved-user-defined-literal]
   fprintf(stderr, "%s:%d - "fmt, __FUNCTION__, __LINE__, ##args)
                             ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-03-09 23:00:45 -08:00
Emil Velikov
3dc2630e45 gallium/radeon: use explicit drm_major, drm_minor check
Just like everywhere else in the radeon codebase.

v2: Don't forget about drm_major == 3 (Alex)

Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 17:25:22 +00:00
Emil Velikov
373f118c6c gallium: do not wrap header inclusion in
Add one missing extern C guard within include/pipe/p_video_enums.h, and
remove the wrapping throughout gallium.

On Haiku one could even use the gallium debug_printf() although
that's another topic.

v2: Leave dbghelp.h as is (Jose)

Cc: Jose Fonseca <jfonseca@vmware.com>
Cc: Brian Paul <brianp@vmware.com>
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-03-09 17:21:39 +00:00
Emil Velikov
3ffab9a89c winsys/amdgpu/addrlib: do not wrap header inclusion in extern "C"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 17:16:51 +00:00
Nicolai Hähnle
4eb416bd9d gallivm: special case TGSI_OPCODE_STORE
This instruction has the resource (buffer or image) as a destination to
represent the writemask for SSBO writes. However, this is obviously not
a "real" destination for the purpose of emitting LLVM IR.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 11:39:55 -05:00
Nicolai Hähnle
10b2b584ee tgsi: set correct output mode for RESQ
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-03-09 11:39:43 -05:00
Marek Olšák
dcb2b77823 gallium: add CAPs returning PCI device location
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-09 15:02:28 +01:00
Marek Olšák
737b6ed13e winsys/amdgpu: get PCI info
This will be queried by the OpenCL stack using an interop call.

I have tested that the values match lspci.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:28 +01:00
Marek Olšák
ec74deeb24 radeonsi: set amdgpu metadata before exporting a texture
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:28 +01:00
Nicolai Hähnle
ff7e9412be radeonsi: extract the texture descriptor computation into its own function
This will allow this code to be re-used for shader images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-09 15:02:27 +01:00
Nicolai Hähnle
1197c69bdd radeonsi: extract the buffer descriptor computation into its own function
This will allow it to be re-used for shader image descriptors.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-09 15:02:27 +01:00
Nicolai Hähnle
2bf8ee34b8 radeonsi: remove resource field from si_sampler_view
view->resource is redundant with view->base.texture, so get rid of it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
2dec5e09e1 radeonsi: accept pipe_resource in si_sampler_view_add_buffer
and rename .._buffers -> .._buffer

Based loosely on Nicolai's patch. This will make it easier to cherry-pick
Nicolai's patches from his image support branch.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
f18fc70d6f radeonsi: disable DCC on handle export if expecting write access
This should be okay except that sampler views and images are not re-set.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Bas Nieuwenhuizen
1e48ec7571 radeonsi: add DCC decompression (v2)
This is currently not needed but will be necessary when we have
features that do not work with DCC enabled, such as image stores
and sharing non-scanout surfaces.

v2: Marek: rebase, remove decompression from si_flush_resource (not needed)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
b744ac9f44 radeonsi: allocate DCC in the same backing buffer as the texture
To allow sharing textures with DCC enabled.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
60c08aa90b gallium/radeon: disable CMASK on handle export if sharing doesn't allow it (v2)
v2: remove the list of all contexts

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
970b979da1 gallium/radeon: eliminate fast color clear before sharing
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00
Marek Olšák
abac6bf67a gallium/radeon: don't use fast color clear if sharing doesn't allow it
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-03-09 15:02:27 +01:00