Commit graph

27608 commits

Author SHA1 Message Date
Brian Paul
1a48326a84 svga: use more VGPU10 formats
We always want to prefer the VGPU10 formats over the VGPU9 ones when
we have VGPU10 support.

Original patch by Jose and updated by Brian.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-18 09:16:12 -07:00
Brian Paul
1a90e3e1e3 svga: add/use new svga_sampler_format() function
This is important for the case of sampling from a depth texture.  In
that case, we need to sample the texture as if it were a single-channel
color texture.  For other/color formats, we can use the format as-is.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2015-11-18 09:15:54 -07:00
Nicolai Hähnle
27ce75ed12 radeon: count cs dwords separately for query begin and end
This will be important for perfcounter queries.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
ffd01b7781 radeon: expose r600_query_hw functions for reuse
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
50f0f938e3 radeon: implement r600_query_hw_get_result via function pointers
We will need the clear_result override for the batch query implementation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
c207c55fc0 radeon: split hw query buffer handling from cs emit
The idea here is that driver queries implemented outside of common code
will use the same query buffer handling with different logic for starting
and stopping the corresponding counters.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:13 +01:00
Nicolai Hähnle
1d10b3d01e radeon: convert hardware queries to the new style
Move r600_query and r600_query_hw into the header because we will want to
reuse the buffer handling and suspend/resume logic outside of the common
radeon code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
019106760d radeon: convert software queries to the new style
Software queries are all queries that do not require suspend/resume
and explicit handling of result buffers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
829a9808a9 radeon: add query handler function pointers
The goal here is to be able to move the implementation details of hardware-
specific queries (in particular, performance counters) out of the common code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
50cab4788d radeon: move R600_QUERY_* constants into a new query header file
More query-related structures will have to be moved into their own
header file to support hardware-specific performance counters.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
c56e83e518 radeon: cleanup driver query list
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:12 +01:00
Nicolai Hähnle
e117e74baf radeon: move get_driver_query_info to r600_query.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-11-18 12:27:11 +01:00
Eric Anholt
dd05ffebfc vc4: Don't bother lowering uniforms when the same value is used twice.
DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get
the absolute value would get lowered.  The lowering doesn't bother to try
to restrict the lifetime of the lowered uniforms, so we'd end up register
allocation failng due to this on 5 of the tests (More tests still fail in
RA, which look like we'll need to reduce lowered uniform lifetimes to
fix).

No changes on shader-db, though fewer extra MOVs are generated on even
glxgears (MOVs pair well enough that it ends up being the same instruction
count).
2015-11-17 17:45:23 -08:00
Eric Anholt
dffe7260cd vc4: Fix uniform reordering to support reading the same uniform twice.
This does actually happen in the wild (particularly fabs of a uniform), so
we'd like to support it.
2015-11-17 17:45:23 -08:00
Eric Anholt
d18d1ba587 vc4: Fix documentation on vc4_qir_lower_uniforms.c. 2015-11-17 17:45:23 -08:00
Eric Anholt
a4bf28178f vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.
It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
2015-11-17 17:45:23 -08:00
Alex Deucher
00f554abba radeonsi: enable optimal raster config setting for fiji (v2)
Requires proper kernel tiling configuration so check the tiling
config registers.

v2: send the right version of the patch

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-16 10:09:47 -05:00
Alex Deucher
5b37d8b50c radeonsi: use proper GRBM_GFX_INDEX offset for CI+
The offset is different on CI and newer.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-11-16 10:09:34 -05:00
Emil Velikov
1780a562bc nv50: add missing header into the sources list
Otherwise it won't end up in the tarball.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2015-11-16 10:49:14 +00:00
Ilia Mirkin
ff17b3ccf4 nv50,nvc0: disable render condition around clear_* functions
Only the regular "clear" call is supposed to respect the render
condition. The rest should ignore it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 20:15:22 -05:00
Samuel Pitoiset
848fa3101d nv50: add support for performance metrics on G84+
Currently only one metric is exposed but more will be added later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:42:46 +01:00
Samuel Pitoiset
6a9c151dbb nv50: add compute-related MP perf counters on G84+
These compute-related MP performance counters have been reverse
engineered using CUPTI which is part of NVIDIA CUDA.

As for nvc0, we use a compute kernel to read out those performance
counters, and the command stream to configure them. Note that Tesla
only exposes 4 MP performance counters, while Fermi has 8.

Only G84+ is supported because G80 is an old and weird card.

Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64,
xonotic-glx, heaven and valley.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:42:42 +01:00
Samuel Pitoiset
ff72440b40 nv50: implement a basic compute support
This adds the ability to launch simple compute kernels like the one I
will use to read out MP performance counters in the upcoming patch.

This compute support is based on the work of Francisco Jerez (aka curro)
that he did as part of his EVoC project in 2011/2012 to get OpenCL
working on Tesla. His original work can be found here:
https://github.com/curro/mesa/commits/nv50-compute

I did some improvements on the original code, like fixing using both 3D
and COMPUTE simultaneously, improving global buffers binding, and making
the code closer to what nvc0 already does. This compute support has been
tested by Pierre Moreau and myself with some compute kernels. This is a
step towards OpenCL.

Speaking about this, it seems like compute programs overlap fragment
programs when they are used both. To fix this, we need to re-validate
fragment programs when binding compute programs and vice versa.

Note that, textures, samplers and surfaces still need to be implemented.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:42:15 +01:00
Samuel Pitoiset
7167a058ba nv50: free interpolation parameters in nv50_program_destroy()
As for nvc0, we need to free memory allocated by interpolation
parameters. This fixes a memory leak spotted by valgrind.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-11-14 23:16:12 +01:00
Samuel Pitoiset
69271bba06 nvc0: reduce the number of GPR used when reading MP perf counters
No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2015-11-14 17:38:57 +01:00
Ilia Mirkin
f94e1d9738 nouveau: don't expose HEVC decoding support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-11-14 10:32:10 -05:00
Marek Olšák
3694d58e6c radeonsi: remove dead code after ES-GS linkage change
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
d79a3449a7 radeonsi: link ES-GS just like LS-HS
This reduces the shader key for ES.

Use a fixed attrib location based on (semantic name,  index).

The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
b1c5f3faa9 radeonsi: calculate optimal GS ring sizes to fix GS hangs on Tonga
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.

There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.

This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
2f5d911ba2 radeonsi: rename si_update_gs_rings
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
4acd856088 radeonsi: calculate ESGS_RING_ITEMSIZE in create_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
a0cf589961 radeonsi: move maximum gs stream calculation into create_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
3ab0c49f04 radeonsi: clean up small duplication in si_shader_gs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
eb0d3e8a90 gallium/radeon: shorten render_cond variable names
and ..._cond -> ..._invert

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
70c40cc989 gallium/radeon: remove predicate_drawing flag
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
12596cfd4c gallium/radeon: atomize render condition (SET_PREDICATION)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
3521907622 gallium/radeon: simplify restoring render condition after flush
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:42 +01:00
Marek Olšák
600e212d87 gallium/radeon: don't use PREDICATION_OP_CLEAR
Not setting the predication bit is sufficient.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
6eff5415e4 gallium/radeon: simplify disabling render condition for u_blitter
just disable it by not setting the predication bit

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
8dd1ee6ff3 r600g: don't set predication on non-draw packets
This has no effect.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
6cc8f6c6a7 gallium/radeon: inline the r600_rings structure
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
3d963abc81 radeonsi: prevent recursion in si_context_gfx_flush
The recursion can only occur if you modify need_cs_space to always flush.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
8569f9a87e gallium/radeon: remove the IB flushing flag
Not needed anymore. A similar flag will be introduced in the next commit,
which will be private in radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
81d412e02c gallium/radeon: move GFX/DMA flushing from add_to_buffer_list to need_cs_space
need_cs_space isn't invoked so often and is called before all commands too.
This is a lot cleaner. The code in radeon_add_to_buffer_list always seemed
dodgy to me.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
c6012a6650 radeonsi: rename cache flushing flags once more
KCACHE, TC L1 and TC L2 are renamed to:
- SMEM L1
- VMEM L1
- GLOBAL L2

You can easily tell what they are used for now.
Shaders must deal with coherency issues between both L1s manually,
e.g. by setting GLC=1 or by using s_dcache_*.

BOTH_ICACHE_KCACHE was an unused definition.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
10130ccd8c radeonsi: set the DISABLE_WR_CONFIRM flag on CI-VI as well
I missed this in commit c3e527f93d
    radeonsi: only enable write confirmation on the last CP DMA packet

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
40912dd91e radeonsi: initialize SX_PS_DOWNCONVERT to 0 on Stoney
otherwise the SX or CB blocks can go bananas

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-11-13 19:54:41 +01:00
Marek Olšák
f7757100f2 radeonsi: add glClearBufferSubData acceleration
8-bit and 16-bit clears which are not aligned to dwords are done in software.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
19773f9805 radeonsi: add SI_SAVE_FRAGMENT_STATE blitter flag
Buffer clears via transform feedback won't set this.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00
Marek Olšák
19a9c1ecc7 gallium/u_blitter: add support for multi-dword clear values in clear_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2015-11-13 19:54:41 +01:00