Commit graph

32114 commits

Author SHA1 Message Date
Ben Crocker
57c8ead0cd llvmpipe: lp_build_gather_elem_vec BE fix for 3x16 load
Fix loading of a 3x16 vector as a single 48-bit load
on big-endian systems (PPC64, S390).

Roland Scheidegger's commit e827d91756
plus Ray Strode's patch reduce pre-Roland Piglit failures from ~4000 to ~2000.  This patch fixes
three of the four regressions observed by Ray:

- draw-vertices
- draw-vertices-half-float
- draw-vertices-half-float_gles2

One regression remains:
- draw-vertices-2101010

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613
Cc: "17.2" "17.1" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-09-01 01:20:07 +02:00
Ray Strode
75cb6e3617 gallivm: correct channel shift logic on big endian
lp_build_fetch_rgba_soa fetches a texel from a texture.
Part of that process involves first gathering the element
together from memory into a packed format, and then breaking
out the individual color channels into separate, parallel
arrays.

The code fails to account for endianess when reading the packed
values.

This commit attempts to correct the problem by reversing the order
the packed values are read on big endian systems.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613
Cc: "17.2" "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ray Strode <rstrode@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-09-01 01:19:13 +02:00
Christian König
214b565bc2 winsys/amdgpu: set AMDGPU_GEM_CREATE_VM_ALWAYS_VALID if possible v2
When the kernel supports it set the local flag and
stop adding those BOs to the BO list.

Can probably be optimized much more.

v2: rename new flag to AMDGPU_GEM_CREATE_VM_ALWAYS_VALID

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-31 14:55:38 +02:00
Marek Olšák
8b3a257851 radeonsi: set a per-buffer flag that disables inter-process sharing (v4)
For lower overhead in the CS ioctl.
Winsys allocators are not used with interprocess-sharable resources.

v2: It shouldn't crash anymore, but the kernel will reject the new flag.
v3 (christian): Rename the flag, avoid sending those buffers in the BO list.
v4 (christian): Remove setting the kernel flag for now

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-31 14:55:21 +02:00
Brian Paul
5610911fed svga: include sample count in surface_size() computation
Use MAX2() because sampleCount will be zero for non-MSAA surfaces.
No Piglit regressions.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-08-30 13:59:14 -06:00
Samuel Pitoiset
0d9117b7bd winsys/amdgpu: add BO to the global list only when RADEON_ALL_BOS is set
Only useful when that debug option is enabled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-30 09:33:59 +02:00
Samuel Pitoiset
59101e771d radeonsi: update dirty_level_mask before dispatching
This fixes a rendering issue with Hitman when bindless textures
are enabled.

Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-30 09:33:55 +02:00
Brian Paul
88cdf16871 llvmpipe: initialize llvmpipe->dirty with LP_NEW_SCISSOR
If llvmpipe_set_scissor_states() is never called, we still need to be sure
that derived scissor/clip state is updated.  As of commit 743ad599a9
that function might not be called.

Fixes regressed Piglit gl-1.0-scissor-offscreen -fbo -auto test.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101709
Fixes: 743ad599a9 ("st/mesa: don't set 16 scissors and 16 viewports
if they're unused")
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
2017-08-29 20:49:36 -06:00
Bas Nieuwenhuizen
46dd30d08f ac/debug: Support multiple trace ids for nested IBs.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-08-29 23:05:59 +02:00
Timothy Arceri
0168d1f449 radeonsi: stop leaking nir
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-29 09:46:29 +10:00
Marek Olšák
9c92e82b32 radeonsi: rewrite late alloc VS limit computation
This is still very simple, but it's better than before.

Loosely ported from Vulkan.
2017-08-28 21:45:33 +02:00
Marek Olšák
39205f216e gallium/radeon: set EVENT_WRITE_EOP.INT_SEL = wait for write confirmation
Ported from Vulkan.
Not sure what this is good for.. maybe write confirmation from L2 flushes?

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-28 21:45:33 +02:00
Marek Olšák
61187c1689 gallium/u_threaded: rename IGNORE_VALID_RANGE -> NO_INFER_UNSYNCHRONIZED
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-28 21:45:33 +02:00
Marek Olšák
28c4c55810 gallium/u_threaded: disallow discard_range if map_buffer is unsynchronized
The discard range codepath takes precedence, so if we get both
unsynchronized and discard_range, choose unsynchronized.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-28 21:45:32 +02:00
Marek Olšák
f173efe916 radeonsi: correct maximum wave count per SIMD
v2: don't special-case Tonga and Iceland.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-28 16:33:24 +02:00
Gwan-gyeong Mun
c261bc11e6 gallium/docs: Fix an inequality sign of TGSI_SEMANTIC_SUBGROUP_LT_MASK
A previous expression presents same as TGSI_SEMANTIC_SUBGROUP_GT_MASK.
It fixes a direction of an inequality for TGSI_SEMANTIC_SUBGROUP_LT_MASK.

before:
  bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION

after:
  bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-28 12:05:44 +02:00
Gwan-gyeong Mun
db91b8536e gallium/docs: fix a typo
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-28 10:33:42 +02:00
Eduardo Lima Mitev
1d8111ebac i915g: Remove a few unused variables
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-08-28 08:59:50 +02:00
Marek Olšák
d500c9b060 Revert "radeonsi: get the raster config from AMDGPU on SI"
This reverts commit fc99cb3c9e.

"The performance went down from 64.7 to 51.4 fps in Valley and from 30.8 to
25.1 fps in Heaven on Radeon HD 7970. Other games seem to have also a 10-25%
performance decrease."

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102429

It looks like we can't use the raster config values from the kernel.
2017-08-27 22:27:23 +02:00
Christian Gmeiner
67fc3e37a7 etnaviv: use correct param for etna_compatible_rs_format(..)
Found by code inspection.

Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-08-26 17:20:39 +02:00
Brian Paul
d819b1fcec gallium/vbuf: fix buffer reference bugs
In two places we called pipe_resource_reference() to remove a reference
to a vertex buffer resource.  But we neglected to check if the buffer was
a user buffer and not a pipe_resource.  This caused us to pass an invalid
pipe_resource pointer to pipe_resource_reference().

Instead of calling pipe_resource_reference(&vbuf->resource, NULL), use
pipe_vertex_buffer_unreference(&vbuf) which checks the is_user_buffer
field and does the right thing.

Also, explicity set the is_user_buffer field to false after setting the
vbuf->resource pointer to out_buffer.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102377
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-08-25 20:26:52 -06:00
Marek Olšák
ddc9b4e823 gallium/u_threaded: fix a typo 2017-08-25 15:40:28 +02:00
Ilia Mirkin
f623e1742f a2xx: fix DST_ALPHA blending for non-alpha formats
If we're rendering to a format without alpha, convert DST_ALPHA blend to
a ONE so that factors are properly computed. This same workaround is
done on a3xx+ as well.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-08-25 00:18:34 -04:00
Ilia Mirkin
f3bde890cd a2xx: set constant blend color
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-08-25 00:18:33 -04:00
Timothy Arceri
f40908f2d1 radeonsi: set IF_THRESHOLD to 4
In 74e39de932 it was set to 3 and it was reported that 4 caused
tesseract to start spilling VGPRs. This no longer seems to be the
case.

Totals:
SGPRS: 2787844 -> 2787764 (-0.00 %)
VGPRS: 1713121 -> 1712717 (-0.02 %)
Spilled SGPRs: 7532 -> 7532 (0.00 %)
Spilled VGPRs: 49 -> 33 (-32.65 %)
Private memory VGPRs: 2060 -> 2060 (0.00 %)
Scratch size: 2200 -> 2180 (-0.91 %) dwords per thread
Code Size: 79265520 -> 79248360 (-0.02 %) bytes
LDS: 436 -> 436 (0.00 %) blocks
Max Waves: 670535 -> 670608 (0.01 %)
Wait states: 0 -> 0 (0.00 %)

Before:
 VGPR SPILLING APPS   Shaders SpillVGPR  PrivVGPR ScratchSize
 EffectsCaveDemo          301         0       256       264
 ReflectionsSubwayDemo    264         0       256       264
 VehicleGame              295         0       128       132
 bioshock-infinite       1140         0       448       516
 dirt-showdown            453        33         0        28
 gang-beasts              364         0       500       496
 kerbal-space-program    1228         0       472       480
 tomb-raider-ultra       1199        16         0        20

After:
 VGPR SPILLING APPS   Shaders SpillVGPR  PrivVGPR ScratchSize
 EffectsCaveDemo          301         0       256       264
 ReflectionsSubwayDemo    264         0       256       264
 VehicleGame              295         0       128       132
 bioshock-infinite       1140         0       448       516
 dirt-showdown            453        33         0        28
 gang-beasts              364         0       500       496
 kerbal-space-program    1228         0       472       480

The only change in VGPR spills is the elimination of all spills
in Tomb Raider at Ultra settings. Closer examination shows that
the shaders go over the limit because they contain three
expressions a mul, rcp and ubo load. The ubo load is actually
used elsewhere and is therefore stored in a temp already in IR
such as tgsi but glsl ir counts it agaist the if cost.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-08-25 14:09:32 +10:00
Timothy Arceri
ea2515d780 glsl: pass shader source keys to the disk cache
We don't actually write them to disk here. That will happen in the
following commit.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-25 13:20:29 +10:00
Marek Olšák
fc99cb3c9e radeonsi: get the raster config from AMDGPU on SI
Not sure yet if we wanna do this on CIK and VI too.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-24 23:54:55 +02:00
Marek Olšák
28d5c30179 radeonsi: clean up setting GRBM_GFX_INDEX
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-24 23:54:55 +02:00
Marek Olšák
0b50f0915b radeonsi: move PA_SC_RASTER_CONFIG emission into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-24 23:54:55 +02:00
Brian Paul
a7e65a443f gallivm: remove unused variable
Trivial.
2017-08-24 07:36:10 -06:00
Brian Paul
f883ede949 pipe-loader: use MAYBE_UNUSED to silence warning
Trivial.
2017-08-24 07:30:22 -06:00
Ilia Mirkin
96be442b77 nv50/ir: properly set sType for TXF ops to U32
All of the coordinates and LOD args are integers for TXF. This mostly
doesn't matter, except for converting into a levelZero=true operation by
removing an explicit zero LOD. For the comparison against zero to work
properly, the sType of the instruction has to be set correctly.

Fixes: KHR-GL45.robust_buffer_access_behavior.texel_fetch
Reported-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-08-24 08:41:57 -04:00
Leo Liu
5ff97f2644 st/va: exclude the buffer reallocation for encode case
Since encoder only support de-interlaced buffers.

v2: move to parameter call to tell dec/enc

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-08-23 14:51:12 -04:00
Tim Rowley
f0602dc920 swr: limit pipe_draw_info->restart_index usage
Only copy this value when in restart drawing mode.

Eliminates valgrind errors when running trivial programs.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-08-23 11:37:50 -05:00
Samuel Pitoiset
7fb4b6f270 radeonsi: fix wrong assertion in si_init_bindless_descriptors()
Bad mistake, sorry.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-08-23 17:13:44 +02:00
Leo Liu
89f75c9483 radeon/video: Return false explicitly for HEVC if not the case
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-23 10:51:14 -04:00
Gwan-gyeong Mun
9649c6acce gallium/docs: Fix the math formula of U2I64
before:
  dst.xy = (uint64_t) src0.x
  dst.zw = (uint64_t) src0.y

after:
  dst.xy = (int64_t) src0.x
  dst.zw = (int64_t) src0.y

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-23 14:09:49 +02:00
Gwan-gyeong Mun
9aabf80ef3 gallium/docs: Add missing word "Not"
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-08-23 14:09:22 +02:00
Nicolai Hähnle
26996ec3b8 tgsi: store opcode mnemonics in a separate table
They are only used for debug info.

Together with making tgsi_opcode_info::opcode a bitfield, this reduces
the size of tgsi_opcode_info on 64-bit systems from 24 bytes to 4 bytes,
and makes the whole data structure a bit more linker friendly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:54:57 +02:00
Nicolai Hähnle
438177aa19 gallium: use tgsi_get_opcode_name instead of tgsi_opcode_info::mnemonic
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:54:55 +02:00
Nicolai Hähnle
2f7c55c23f tgsi: macro-ify the opcodes table
So we can easily re-arrange members of tgsi_opcode_info, and readers of
the code don't have to guess what all the 0s mean.

Mostly done with regex search&replace.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:54:53 +02:00
Nicolai Hähnle
48ef0a1ee4 tgsi: remove post_indent from some 64-bit opcodes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:54:51 +02:00
Nicolai Hähnle
3f433e927c tgsi: reduce tgsi_opcode_info::pre_dedent and post_indent to 1 bit
It's not clear why they were ever 2 bits to begin with. Perhaps
the original intent was to use signed values, but that doesn't
seem to have ever been the case in master.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:54:47 +02:00
Nicolai Hähnle
83c5d12d9d gallium/radeon: fix saving multi-part command streams
Use the correct type to fix pointer arithmetic.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:54:09 +02:00
Nicolai Hähnle
6fdd7ba32e radeonsi: update comment describing indices into sctx->descriptors
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:54:01 +02:00
Nicolai Hähnle
556946f801 util: fix valgrind errors when dumping pipe_draw_info
Various index-related fields are only initialized when required, so
they should only be dumped in those cases.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:53:54 +02:00
Samuel Pitoiset
94cc01105e radeonsi: do not assert when reserving bindless slot 0
When assertions were disabled, the compiler removed
the call to util_idalloc_alloc() and the first allocated
bindless slot was 0 which is invalid per the spec.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-08-23 13:38:56 +02:00
Samuel Pitoiset
f4ec41ecc4 radeonsi: rename some bindless-related helper functions
I think it makes more sense.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:37:07 +02:00
Samuel Pitoiset
9141d13214 radeonsi: minor cleanups in si_make_{texture,image}_handle_resident()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-23 13:37:05 +02:00
Rob Herring
f8e4223728 Android: gallium_dri: pass dri.sym to linker
Pass the dri.sym version script to the linker. This ensures only
explicitly exported symbols are exported and shrinks the library by up
to 60KB.

HAVE_DLADDR also needs to be set so that __driDriverExtensions is defined.

We need to pass "--undefined-version" because the Android build system
sets --no-undefined-version by default and we get an error on
driver specific symbols if those drivers are disabled without the option.

Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2017-08-22 19:02:12 -05:00