Commit graph

31705 commits

Author SHA1 Message Date
Lucas Stach
ff490eb8fd etnaviv: use padded width/height for resource copies
When copying a resource fully we can just blit the whole level. This allows
to use the RS even for level sizes not aligned to the RS min alignment. This
is especially useful, as etna_copy_resource is part of the software fallback
paths (used in etna_transfer), that are used for doing unaligned copies.

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Lucas Stach
2a6183d416 etnaviv: don't try RS blit if blit region is unaligned
If the blit region is not aligned to the RS min alignment don't try
to execute the blit, but fall back to the software path.

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-16 15:26:23 +02:00
Jan Vesely
d96a210842 r600g,compute: provide local copy of functions from ac_binary.c
This is a verbatim copy of the code. The functions can be cleaned up since
r600 does not use all the stuff that gcn does.
The symbol names have been changed since we still use ac_binary.h header
(for struct definition)

v2: Add ifdef guard around r600_binary_clean call (Aaron)
    Remove stray comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-By: Aaron Watry <awatry@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-16 12:41:44 +01:00
Jan Vesely
d41b7b0104 r600: android: amdgpu_common is only required when building OpenCL
v2: split off Android changes

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-16 12:41:44 +01:00
Thomas Hellstrom
9d81ab7376 svga: Relax the format checks for copy_region_vgpu10 somewhat
The new generic checks were actually more restrictive than the previous svga-
specific tests and not vice versa. So bypass the common format checks for
copy_region_vgpu10.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
a37eede540 svga: Fix incorrect format conversion blit destination
The blit.dst.resource member that was used as destination was
modified earlier in the function, effectively making us try to blit
the content onto itself. Fix this and also add a debug printout when the
format conversion blits fail.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
5732ac3ecc svga: Fix srgb copy_region regression
This fixes a tf2 srgb copy_region regression from
"svga: Rework the blit and resource_copy_region functionality v3"

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
14f888a2ba svga: Prefer accelerated blits over cpu copy region
This reduces the number of cpu copy_region fallbacks on a Nvidia system
running the piglit command

./publish/bin/piglit run  -1 -t copy -t blit tests/quick

from 64789 to 780

Previously this has caused a regression in piglit test
spec@!opengl 1.0@gl-1.0-scissor-copypixels, but I'm currently not able to
reproduce that regression.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
4c3e8f141b svga: Support accelerated conditional blitting
The blitter has functions to save and restore the conditional rendering state,
but we currently don't save the needed info.

Since also the copy_region_vgpu10 path supports conditional blitting,
we instead use the same function as the clearing routines and move
that function to svga_pipe_query.c

Note that we still haven't implemented conditional blitting with
the software fallbacks.

Fixes piglit nv_conditional_render::copyteximage

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
71f857d6ab svga: Use utility functions to help determine whether we can use copy_region
It seems like the SVGA tests are in general more stringent than the utility
tests, but they also miss some blitter features like filters and window
rectangles, and if new blitter features are added in the future, it might
be possible that we forget adding tests for those.

So in addition to the SVGA tests, use the utility tests to restrict the
situations where we can use copy_region.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 08:40:26 +02:00
Thomas Hellstrom
f4c2d4bd4a svga: Rework the blit and resource_copy_region functionality v3
This work was initially trigged by the fact that imported surfaces may
be backed by other SVGA3D formats than the default. Therefore some fixes were
needed to avoid using the copy_region_vgpu10() functionality for incompatible
SVGA3D formats where the pipe formats were OK. This situation happens when
using dri3.

Also in some situations, for example where a R8G8_UNORM surface is backed by
an SVGA3D_NV12 format, we can't use the copy_region functionality at all and
thus need to fall back to the quad blitter also for the resource_copy_region
function. This situation doesn't happen currently, but will if we start using
video textures.

The patch makes the blit- and copy_region paths similar and the decision whether
to use a certain gpu command should now be easy to locate. Probably the
resource_copy_region path will suffer from a minor additional cpu overhead,
but on the other hand there are more cases now that we accelerate, since
we try harder before falling back to cpu copies / blits.

v2: Addressed review comments and fixed up piglit failures by sometimes
preferring cpu_copy_region() over blit().
v3: Removed a stray test statement. Updated commit message.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-16 08:40:26 +02:00
Brian Paul
c8f344ed2d draw: check for line_width != 1.0f in validate_pipeline()
We shouldn't use the wide line stage if the line width is 1.
This check isn't strictly needed because all drivers are (now)
specifying a line wide threshold of at least 1.0 pixels, but
let's play it safe.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-15 13:53:00 -06:00
Brian Paul
c2b92dada0 svga: clamp device line width to at least 1 to fix HWv8 line stippling
The line stipple fallback code for virtual HW version 8 didn't work.

With HW version 8, we were getting zero when querying the max line
widths (AA and non-AA).  This means we were setting the draw module's
wide line threshold to zero.  This caused the wide line stage to always
get enabled.  That caused the line stipple module to fall because the
wide line stage was clobbering the rasterization state with a state
object setting the line stipple pattern to 0xffff.

Now the wide_lines variable in draw's validate_pipeline() will not
be incorrectly set.

Also improve debug output.

BTW, also this fixes several other piglit tests: polygon-mode,
primitive- restart-draw-mode, and line-flat-clip-color since they
all use the draw module fallback.

See VMware bug 1895811.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-06-15 13:53:00 -06:00
Brian Paul
c9f4e069ba draw: whitespace and formatting fixes
Trivial.
2017-06-15 13:53:00 -06:00
Eric Anholt
7029ec05e2 gallium: Add renderonly-based support for pl111+vc4.
This follows the model of imx (display) and etnaviv (render): pl111 is a
display-only device, so when asked to do GL for it, we see if we have a
vc4 renderer, make the vc4 screen, and have vc4 call back to pl111 to do
scanout allocations.

The difference from etnaviv is that we share the same BO between vc4 and
pl111, rather than having a vc4 bo and a pl11 bo and copies between the
two.  The only mismatch between their requirements is that vc4 requires
4-pixel (at 32bpp) stride alignment, while pl111 requires that stride
match width.  The kernel will reject any modesets to an incorrect stride,
so the 3D driver doesn't need to worry about that.

v2: Rebase on Android rework, drop unused include.
v3: Fix another Android bug, from Rob Herring's build-testing.

Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-15 11:41:22 -07:00
Eric Anholt
7a17191305 etnaviv: Only use renderonly_get_handle for GEM handles.
Note that for requests for Prime FDs or flink names, we return handles to
the etanviv BO, not the scanout BO.  This is at least better than previous
behavior of returning GEM handles for a request for an FD or flink name.

And add an assert that renderonly_get_handle is only used for getting the
GEM handle.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-06-15 11:41:22 -07:00
Mauro Rossi
d5a9608076 android: r600/eg: add support for tracing IBs after a hang.
The rules to generate egd_tables.h are added in Android makefile

Fixes: f42fb00 "r600/eg: add support for tracing IBs after a hang."
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-15 15:46:44 +01:00
Mauro Rossi
d5523d912c svga: fix git_sha1.h include path in Android.mk (v3)
Adds libmesa_git_sha1 static (dummy) library to generate git_sha1.h
with some polishing to header dependency on .git/HEAD and scripted rules.

The now redundant generation rules are removed from Android.gen.mk
libmesa_git_sha1 whole static depedency is added to libmesa_pipe_svga,
libmesa_dricore and libmesa_st_mesa modules

Fixes the following building error:

external/mesa/src/gallium/drivers/svga/svga_screen.c:26:10:
fatal error: 'git_sha1.h' file not found
         ^
1 error generated.

Fixes: 1ce3a27 ("svga: Add the ability to log messages to
vmware.log on the host.")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-15 15:46:44 +01:00
Samuel Pitoiset
e8df89d2c5 gallium/radeon: fix initialization of new resource bindless fields
r600_resource objects are not calloc'd.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-15 11:56:21 +02:00
Michel Dänzer
176e761513 gallium/util: Break recursion in pipe_resource_reference
It calling itself recursively prevented it from being inlined, resulting
in a copy being generated in every compilation unit referencing it. This
bloated the text segment of the Gallium mega-driver *_dri.so by ~4%,
and might also have impacted performance.

Fixes: ecd6fce261 ("mesa/st: support lowering multi-planar YUV")
v2:
* Add comment above pipe_resource_next_reference [Samuel Pitoiset]
v3:
* Use loop to unreference the full chain of resources referenced via
  the next members [Timothy Arceri]
v4:
* Stop chasing ->next chain at the first sub-resource which isn't
  destroyed [Nicolai Hähnle]

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-15 11:24:59 +09:00
Aaron Watry
e4d06e4c53 radeon/winsys: Limit max allocation size to 70% of VRAM
The CL CTS queries the max allocation size, and then attempts to
allocate buffers of that size. If not enough contiguous RAM/VRAM is
available, this causes errors in the radeon kernel module due to
inability to allocate the required memory.

It's a bit of a hack, but experimentally on my system, I can use ~3/4
of the card's VRAM for a single global/constant buffer allocation given
current GUI/compositor use.

For a 1GB Pitcairn (HD7850) this gets me from the reported clinfo values of:
Global memory size                              2143076352 (1.996GiB)
Max memory allocation                           1500153446 (1.397GiB)
Max constant buffer size                        1500153446 (1.397GiB)

To:
Global memory size                              2143076352 (1.996GiB)
Max memory allocation                           751619276 (716MiB)
Max constant buffer size                        751619276 (716MiB)

Fixes: OpenCL CTS test/conformance/api/min_max_mem_alloc_size,
       OpenCL CTS test/conformance/api/min_max_constant_buffer_size

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 19:38:55 -05:00
Samuel Pitoiset
5f8b654b47 tgsi/scan: add missing 'static' to tgsi_is_bindless_image_file()
This should fix compilation errors in some situations.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101418
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 15:30:39 +02:00
Samuel Pitoiset
65d1e4d1eb radeonsi: enable ARB_bindless_texture
This has only been tested on RX480.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
285ec4463b radeonsi: add support for loading bindless images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
950b5ffa31 radeonsi: add support for loading bindless samplers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
0c2834c5b2 radeonsi: invalidate buffers which are made resident if needed
When a buffer becomes resident, check if it has been invalidated,
if so update the descriptor and the dirty flag.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
811756dfd0 radeonsi: upload new descriptors when resident buffers are invalidated
When texture buffers are invalidated the addr in the resident
descriptor has to be updated but we can't create a new descriptor
because the resident handle has to be the same.

Instead, use the WRITE_DATA packet which allows to update memory
directly but graphics/compute have to be idle in case the GPU is
reading the descriptor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
48fe8a6210 radeonsi: only decompress resident textures/images when used
When the current bound shaders don't use any bindless textures
or images, it's useless to decompress the resident resources.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
2c3a7d5840 radeonsi: track use of bindless samplers/images from tgsi_shader_info
This adds some new helper functions to know if the current draw
call (or dispatch compute) is using bindless samplers/images,
based on TGSI analysis.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
e1813a8635 radeonsi: decompress resident textures/images before graphics/compute
Similar to the existing decompression code path except that it
loops over the list of resident textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
d7e1a66bb5 radeonsi: decompress DCC for resident textures/images
Analogous to bound textures/images. We should also update the
resident descriptors and disable COMPRESSION_EN for avoiding
useless DCC fetches, but I postpone this optimization for a
separate series.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
a45e198e2d radeonsi: only add descriptors in presence of resident handles
This won't help much except for applications that use a ton
of resident handles. Though, this will reduce the winsys
overhead a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
333c8f65cf radeonsi: add all resident buffers to the current CS
Resident buffers have to be added to every new command stream.
Though, this could be slightly improved when current shaders
don't use any bindless textures/images but usually applications
tend to use bindless for almost every draw call, and the winsys
thread might help when buffers are added early.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
9cc328eef6 radeonsi: implement ARB_bindless_texture
This implements the Gallium interface. Decompression of resident
textures/images will follow in the next patches.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
77bbdcdfcd radeonsi: add a slab allocator for bindless descriptors
For each texture/image handles, we need to allocate a new
buffer for the bindless descriptor. But when the number of
buffers added to the current CS becomes high, the overhead
in the winsys (and in the kernel) is important.

To reduce this bottleneck, the idea is to suballocate the
bindless descriptors using a slab similar to the one used
in the winsys.

Currently, a buffer can hold 1024 bindless descriptors but
this limit is arbitrary and could be changed in the future
for some reasons. Once a slab is allocated the "base" buffer
is added to a per-context list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
86d7b7f01a radeonsi: add si_set_shader_image_desc() helper
To share some common code between bound and bindless images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
410b4ec06d radeonsi: add si_set_sampler_view_desc() helper
To share some common code between bound and bindless textures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
2ce04d7c1a radeonsi: add si_init_descriptor_list() helper
This will be used in order to initialize resident descriptors
for bindless textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
6447abf373 tgsi/scan: record bindless samplers/images usage
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
556f70b404 tgsi/ureg: accept TGSI_FILE_{CONSTANT,INPUT} for dst registers
For example, TGSI_OPCODE_STORE for bindless images might use
a constant buffer or a shader input.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
ed4fbb84d1 tc: add ARB_bindless_texture support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
e53e374b26 trace: add ARB_bindless_texture support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
02743d63cc ddebug: add ARB_bindless_texture support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
8a68b4de08 gallium: add ARB_bindless_texture interface
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
973822bcee gallium: add PIPE_CAP_BINDLESS_TEXTURE
Whether bindless texture operations are supported by the
underlying driver.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-14 10:04:36 +02:00
Aaron Watry
d364ab4a61 clover/device: Get device/host unified memory from pipe driver
clinfo no longer reports my discrete GCN card as unified memory

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-06-13 21:26:09 -05:00
Henri Verbeet
1307ed430a gallium/radeon: Include the family name in the renderer string if it's not equal to the marketing name.
The "family" name is often more informative than the "marketing" name. More
importantly, applications, like for example Wine, may recognise GPUs based on
the existing "family" names.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
2017-06-13 19:23:18 +02:00
Brian Paul
def8d1d23f gallium/docs: clarify TGSI_SEMANTIC_SAMPLEMASK, again
I've since discovered the fragment shader sample mask system value (which
corresponds to gl_SampleMaskIn).

v2: It's a system value, not a shader input.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-06-13 08:02:43 -06:00
Brian Paul
e5eb9b4363 gallium/util: whitespace, formatting fixes in u_upload_mgr.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-13 07:52:54 -06:00
Jose Fonseca
f2d71df0ca softpipe: Match pipe_context::render_condition prototype.
To silence compiler warnings.  Trivial.
2017-06-13 11:53:16 +01:00