Commit graph

39979 commits

Author SHA1 Message Date
Alyssa Rosenzweig
4df80cab40 panfrost/midgard: Implement integer downsize ops
Oh, dear. No turning back now.

We begin implementing non-32-bit types, using downsizing integer type
conversions as the initial instructions. We implement them naively as
type-converting moves; substantially more efficient operation is
possible by copypropping the type conversion modifier, but this
optimization is not implemented here.

Size converting modifiers on Midgard allow an instruction to write to a
destination 1/2 the size, or to read from a source 1/2 the size. If we
need an extreme conversion (32-bit to 8-bit, for instance), multiple
type converting ops are chained together, which here is handled via an
algebraic pass.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
dc69d3bf8f panfrost/midgard: Move scale from MIR to NIR
This begins the process of removing blend shader specific MIR into a
more general NIR lowering pass for formats.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
d151319a3d panfrost/midgard: Passthrough nir_lower_framebuffer
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
8e4e46794e panfrost: Extend clear colour packing
Eventually, this will allow packing clear colours for all formats,
including floating-point framebuffers, pure integer buffers, and special
formats. Currently, a few of these formats are supported, and many more
are handled through a generic Gallium colour packing path (which is not
a perfect fit for the hardware, but works for many formats and is a sane
default for the moment.)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
21c863a695 panfrost/mfbd: Include codes for float framebuffers
We see the hardware doesn't actually support float framebuffers in the
native sense -- rather, it just allows higher bpp framebuffers and lets
a blend shader / additional clear_color fields sort out the formats.
This will be.. interesting.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
36b3e7ea90 panfrost: Prepare some code for MRT
Full MRT support is a while away, but in the mean time, we can remove
code that explicitly assumes nr_cbufs <= 0, to minimize the obstacles
we'll face later when we add the whole thing.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig
7c82dfba8f panfrost: Use standard ALIGN_POT/INFINITY macros
We had vendored duplicates from pre-Mesa days; clean that up.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-07-10 06:12:03 -07:00
Karol Herbst
62362a4abb nv50/ir/nir: implement load/store_global
required by OpenCL

v2: fix setting globalAccess

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2019-07-10 13:23:00 +02:00
Karol Herbst
33a9b9fce5 nv50/ir/nir: handle kernel inputs
required by OpenCL

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2019-07-10 13:22:40 +02:00
Karol Herbst
2617c78fe2 nv50/ir/nir: don't assert on !main
required for OpenCL

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2019-07-10 13:22:21 +02:00
Karol Herbst
fa6bd3c639 nv50/ir/nir: parse system values first and stop for compute shaders
required by OpenCL

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2019-07-10 13:20:13 +02:00
Mauro Rossi
2434fb3e8e android: radeonsi/gfx10: generate gfx10_format_table.h (v2)
Fix Android building rules for gfx10_format_table.h generated header

(v2) Add LOCAL_C_INCLUDES += $(intermediates)/radeonsi to fix error:

external/mesa/src/gallium/drivers/radeonsi/si_state.c:46:10:
fatal error: 'gfx10_format_table.h' file not found
         ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 0ffa229 ("radeonsi/gfx10: generate gfx10_format_table.h")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2019-07-10 09:03:46 +02:00
Chih-Wei Huang
0d394f1734 android: virgl: remove unnecessary LOCAL_C_INCLUDES
The path could be imported automatically.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
2019-07-10 08:56:47 +02:00
Chia-I Wu
b44bb8bded virgl: remove virgl_transfer_queue_lists
COMPLETED_LIST is always empty.  We only need one list.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
48aefcbd6b virgl: simplify virgl_transfer_queue_extend
We can reuse virgl_transfer_queue_find_pending.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
eae4527551 virgl: remove transfer after transfer_write
Now that virgl_transfer_queue_is_queued does not search
COMPLETED_LIST, we don't need to move transfers to that list.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
bec2a85c48 virgl: improve virgl_transfer_queue_is_queued
Search only the pending list and return immediately on the first
hit.

When the transfer queue was introduced, the function was used to
deal with

  write transfer -> draw -> write transfer

sequence.  It was used to tell if the second transfer intersects
with the first transfer. If yes, the transfer queue avoided
reordering the second transfer to before the draw (by flushing) in
case the draw uses the transferred data.

With the recent changes to the transfer code, the function is used
to deal with

  write transfer -> readback transfer

We want to avoid reordering the readback transfer to before the
first transfer (also by flushing).

In the old code, we needed to track the compeleted transfers as well
to avoid reordering.  But in the new code, a readback transfer is
guaranteed to see the data from the completed transfers (in other
words, it cannot be reoderered to before the already completed
transfers).  We don't need to search the COMPLETED_LIST.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
5f6aab2ee2 virgl: fix transfers_intersect for mipmaps
We never use transfers_intersect with textures, but fix it anyway to
avoid confusion.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Chia-I Wu
6ca1bbabbe virgl: fix some false positives in transfers_overlap
Rewrite the function and check z/depth more carefully.  We
intentionally avoid u_box_test_intersection_2d because it returns
true when two boxes touch but do not intersect and can be confusing.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-07-09 14:26:55 -07:00
Marek Olšák
2b2093961e radeonsi/gfx10: enable primitive binning by default
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
9f68367d19 radeonsi/gfx10: implement primitive binning
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
4e56a2aaa8 radeonsi: simplify primitive binning enablement
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
3521297251 radeonsi: set primitive binning tunables for dGPUs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
d7e80ba1e7 radeonsi: set FLUSH_ON_BINNING_TRANSITION when needed
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
9dbe63ceea radeonsi/gfx10: use the new scan converter when binning is disabled
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
80b3f4b4bd radeonsi/gfx9: fix an oversight in primitive binning code
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
1f53a3e766 radeonsi: use BREAK_BATCH instead of FLUSH_DFSM when CB_TARGET_MASK changes
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
605900d7dd radeonsi/gfx10: don't expose unimplemented PIPE_CAP_QUERY_SO_OVERFLOW
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
270a8ab648 radeonsi/gfx10: launch 2 compute waves per CU before going onto the next CU
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
ab1f36a1d3 radeonsi/gfx10: set more registers and fields
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
9b65f6618c radeonsi/gfx10: enable LATE_ALLOC_GS
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
4985c3ee22 radeonsi/gfx10: set HS/GS/CS.WGP_MODE
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
329406ec9c radeonsi/gfx10: set GE_PC_ALLOC
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
9d1483de3b radeonsi/gfx10: enable 1D textures
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
1d3bffaf9c radeonsi/gfx10: enable image stores with DCC
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
5b50fb9b7f radeonsi/gfx10: no need to invalidate L2 for framebuffer -> texture coherency
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
fbf781e401 radeonsi/gfx10: support pixel shaders without exports
It only works if there are not color and no Z exports.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
2adc8e2736 radeonsi/gfx10: enable vertex shaders without param space allocation
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
07fe51156d radeonsi: update DCC settings from PAL
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
4002913f8d radeonsi: reorder shader IO indices for better IO space usage for tess and GS
The highest used index determines the stride for shader outputs in shaders
that use LDS or memory for outputs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
1c99a13f89 radeonsi: decrease maximum supported GENERIC varying index from 42 to 31
This can decrease LDS and/or memory usage for shader outputs when geometry
shaders or tessellation is used.

Only PS inputs support higher indices and those aren't eliminated by
kill_outputs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
6335cc6a58 radeonsi: cosmetic cleanup in si_shader_io_get_unique_index
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
3be4ed2fe1 radeonsi: fix and clean up shader_type passing
- don't pass it via a parameter if it can be derived from other parameters
- set shader_type for ac_rtld_open
- use enum pipe_shader_type instead of unsigned

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák
37b26671a7 radeonsi: enable RB+ for pixel shaders with no/non-contiguous color outputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie airlied@redhat.com
2019-07-09 17:24:16 -04:00
Marek Olšák
5058d62b05 radeonsi: don't set READ_ONLY for const_uploader to fix bindless texture hangs
Bindless textures can update descriptors with WRITE_DATA.

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie airlied@redhat.com
2019-07-09 17:24:16 -04:00
Alyssa Rosenzweig
6074eae753 gallium: Add util_format_is_unorm8 check
Useful for formats that would work with the same driver code path as
RGBA8 UNORM but that don't meet the util_format_is_rgba8_variant
criteria due to a smaller channel count.

v2: Use simpler logic (suggested by Iago).

v3: Fix spelling erorr. boolean->bool (thank you airlied).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-07-09 21:17:47 +00:00
Pratik Vishwakarma
177a3df7b0 radeonsi: Expose support for 10-bit VP9 decode
Fix si_vid_is_format_supported to expose support
for 10-bit VP9 decode using P016 format. Without
this change, 10-bit decode will be exposed only
for HEVC even though newer hardware support
10-bit decode for VP9.

Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2019-07-09 15:26:54 -04:00
Karol Herbst
a110a8090d nvc0: remove nvc0_program.tp.input_patch_size
right now that's dead code

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-07-09 12:41:54 +02:00
Alejandro Piñeiro
71446bf8e3 v3d: Early return with handle 0 when getting a bo on the simulator
Until now we were just asking entries on the bo hash table, and don't
worry if the handle was NULL, as we were just expecting to get a NULL
in return. It seems that now the hash table assert with some reserverd
pointers, included NULL. This commit just early returns with handle 0.

This change fixes several crashes on vk-gl-cts GLES tests when using
the v3d simulator, like:
KHR-GLES3.core.internalformat.copy_tex_image.*

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-09 08:40:35 +02:00
Timothy Arceri
6b60cfd079 radeonsi: update function name in comment
This was missed in 2361558eb7

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-09 10:00:23 +10:00