Commit graph

34855 commits

Author SHA1 Message Date
Eric Anholt
9bf9a6d6a1 v3d: Drop the VG support from the XML.
This reflects a change on the HW/closed SW side to drop this unused HW.
With it dropped on their side, the CLIF parser no longer expects to find
VG fields.
2018-07-27 12:56:36 -07:00
Eric Anholt
1c8e4632a7 v3d: Stop using spaces in the names of our buffers.
For CLIF dumping, we need names to not have spaces.  Rather than rewriting
them after the fact, just change the two cases where I had put a space in.
2018-07-27 12:56:36 -07:00
Chad Versace
7953399e59 gallium/auxiliary: Fix Autotools on Android (v2)
Problem 1: u_debug_stack_android.cpp transitively included
"pipe/p_compiler.h", but src/gallium/include was missing from the C++
include path.

Problem 2: Add -std=c++11 to AM_CXXFLAGS. Android's libbacktrace headers
require C++11, but the Android toolchain (at least in the Chrome OS SDK)
does not enable C++11 by default.

v2: Add -std=c++11.

Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Cc: Eric Engestrom <eric.engestrom@intel.com>
2018-07-27 11:35:56 -07:00
Jan Vesely
1e8b8e0878 clover: Reduce wait_count in abort path.
Trigger waiter condition variable.
Passes 'events' CTS on carrizo and turks.
v2: reduce to 0

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-07-26 15:38:22 -04:00
Jan Vesely
c2942141ae clover: Don't extend illegal integer types.
It's OK to pass them in memory, which is what kernel invocation needs.
Fixes regressions since llvm r337535 ("Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"):
	scalar-arithmetic-char
	scalar-arithmetic-uchar
	scalar-arithemtic-short
	scalar-arithmetic-ushort
	scalar-comparison-char
	scalar-comparison-uchar
	scalar-comparison-short
	scalar-comparison-ushort

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-07-26 15:38:22 -04:00
Eric Anholt
deecc1ef86 v3d: Avoid the GFXH-1461 workaround if we have only Z or only S.
This seems like a sensible precaution to avoid extra draws.  It doesn't
deal with the case of a Z24S8 buffer created by the window system for an
application that happens to never use S.
2018-07-26 11:02:25 -07:00
Eric Anholt
301c32caf4 v3d: Rework the ordering of how we clear things.
First, figure out if we can just sneak the clear into the TLB clear, even
if drawing has already happened (since we have job->load and job->clear to
tell us), taking into account GFXH-1461.  For any pieces we can't TLB
clear, fall back to drawing a quad without flushing the scene.

Fixes extra scene flushes in glmark2 due to GFXH-1461.
2018-07-26 11:02:25 -07:00
Eric Anholt
ceecddfe77 v3d: Only store buffers that have been written to.
I've seen cases where a color buffer is bound, but only Z is written, and
we end up storing color.
2018-07-26 11:02:25 -07:00
Eric Anholt
d29435e7cb v3d: Track the buffers being loaded separately.
We were computing this at RCL generation time, but that means you can't
unflag the store for an invalidate_resource, or not flag the store if
writmasking is disabled.
2018-07-26 11:02:20 -07:00
Eric Anholt
47f5d158ae v3d: Rename cleared/resolve to clear/store.
These describe what the fields mean in RCL generation.  "resolve" is left
over from VC4, and sounds like MSAA resolves (which may or may not be
involved in the store we generate).
2018-07-26 11:00:34 -07:00
Erik Faye-Lund
e68fe445f5 gallium: initialize ureg_dst::Invariant bit
When this bit was added, it seems the some initialization code
was omitted by mistake.

Since stack-variables have kinda random contents, and we don't
zero initialize the whole struct in these code-paths, we end up
getting random-ish values for this bit.

Spotted by Coverity in the following CIDs:
- 1438115
- 1438123
- 1438130

Fixes: 70425bcfe6 ("gallium: plumb
invariant output attrib thru TGSI")

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-07-26 09:01:33 +02:00
Marek Olšák
ce8e6b970b ac: fix typo DSL_SEL -> DST_SEL 2018-07-26 01:45:47 -04:00
Marek Olšák
7039d9299e radeonsi: update a comment about cache behavior 2018-07-26 01:45:47 -04:00
Gert Wollny
82fc6bdebf r600: Scale integer valued texture border colors to float (v2)
It seems the hardware always expects floating point border color values
[0,1] for unsigned, and [-1,1] for signed texture component, regardless
of pixel type, but the border colors are passed according to texture
component type. Hence, before submitting the border color, convert and
scale it these ranges accordingly.

This doesn't seem to work for textures with 32 bit integer components
though, here, it seems that the border color is always set to zero,
regardless of the BORDER_COLOR_TYPE state set in Q_TEX_SAMPLER_WORD0_0.

v2: Simplyfy logic as suggested by Roland Schneidegger

Fixes:
  dEQP-GLES31.functional.texture.border_clamp.formats.compressed*
  dEQP-GLES31.functional.texture.border_clamp.formats.r* (non 32 bit integer)
  dEQP-GLES31.functional.texture.border_clamp.per_axis_wrap_mode.texture_2d*
 and a number of piglits out of
  piglit run gpu -t texture -t gather -t formats

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-07-25 08:58:33 +02:00
Karol Herbst
7f95564a22 nir: rename f2f16_undef to f2f16
we need rounding modes on other conversions involving floats and it is easier
to rename f2f16_undef than renaming all the other ones.

v2: rebased on master

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-07-24 20:40:05 +02:00
Marek Olšák
98ab24fdab radeonsi: handle SI_FORCE_FAMILY early
before LLVM target machines are created
2018-07-24 14:21:29 -04:00
Jose Fonseca
04d77d53aa gallium/tests: Don't ignore S3TC errors.
Now we do full S3TC decompression they should no longer fail.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-07-24 15:58:14 +01:00
Erik Faye-Lund
c3eaf8fe57 forward precise-flag if supported
New versions of virglrenderer supports the precise-flag, so let's
forward it from TGSI if that's the case.

This fixes a few dEQP-GLES31 tests:
- dEQP-GLES31.functional.tessellation.common_edge.quads_equal_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_even_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_odd_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_equal_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_even_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_odd_spacing_precise

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-07-24 10:27:27 +02:00
Marek Olšák
6853862a58 radeonsi: fix pk2h breakage 2018-07-23 22:29:59 -04:00
Marek Olšák
86b52d4236 radeonsi: reduce LDS stalls by 40% for tessellation
40% is the decrease in the LGKM counter (which includes SMEM too)
for the GFX9 LSHS stage.

This will make the LDS size slightly larger, but I wasn't able to increase
the patch stride without corruption, so I'm increasing the vertex stride.
2018-07-23 20:23:52 -04:00
Tom Stellard
0866edede0 radeonsi: Add debug option to enable LLVM GlobalISel (v2)
R600_DEBUG=gisel will tell LLVM to use GlobalISel rather than
SelectionDAG for instruction selection.

v2: mareko: move the helper to src/amd/common

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
2018-07-23 20:23:48 -04:00
Dave Airlie
d73f1026b4 r600: enable tess_input_info for TES
There might be a nicer way to do this, but this is at least correct.

This fixes:
KHR-GL44.tessellation_shader.single.max_patch_vertices
KHR-GL44.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2018-07-23 21:11:35 +01:00
Roland Scheidegger
09828feab0 draw: force draw pipeline if there's more than 65535 vertices
The pt emit path can only handle 65535 - the number of vertices is
truncated to a ushort, resulting in a too small buffer allocation, which
will crash.

Forcing the pipeline path looks suboptimal, then again this bug is
probably there ever since GS is supported, so it seems it's not
happening often. (Note that the vertex_id in the vertex header is 16
bit too, however this is only used by the draw pipeline, and it denotes
the emit vertex nr, and that uses vbuf code, which will only emit smaller
chunks, so should be fine I think.)
Other solutions would be to simply allow 32bit counts for vertex
allocation, however 65535 is already larger than this was intended for
(the idea being it should be more cache friendly). Or could try to teach
the pt emit path to split the emit in smaller chunks (only the non-index
path can be affected, since gs output is always linear), but it's a bit
tricky (we don't know the primitive boundaries up-front).

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107295

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-07-23 22:07:07 +02:00
Dave Airlie
83332618c1 Revert "virgl: remove unused stride-arguments"
This reverts commit dc938b8398.

This adds warnings in vtest, and possibly breaks it.
2018-07-24 06:03:20 +10:00
Dave Airlie
958b57ac82 virgl: add initial shader_storage_buffer_object support. (v2)
This adds the guest side support for ARB_shader_storage_buffer_object.

Co-authors: Gurchetan Singh <gurchetansingh@chromium.org>

v2: move to using separate maximums
(fixup macros)

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2018-07-24 05:54:21 +10:00
Erik Faye-Lund
dc938b8398 virgl: remove unused stride-arguments
The IOCTLs doesn't pass this along, so computing them in the first
place is kinda pointless.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-07-23 11:21:09 +01:00
Timothy Arceri
78f391d343 radeonsi/nir: make use of nir_lower_load_const_to_scalar()
This allows NIR to CSE more operations. LLVM does this also so the
impact is limited, however doing this in NIR allows other opts to
make progress. For example some loops in Civilization Beyond Earth
shaders are unrolled.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-23 09:48:51 +10:00
Chih-Wei Huang
e7ffd3fb08 Android: fix a missing nir_intrinsics.h error
The commit 76dfed8ae2 changed nir_intrinsics.h to be a generated
header, but the corresponding dependency was not updated for Android.
It causes the error:

[  0% 19/4336] target  C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_debug.c
...
In file included from external/mesa/src/gallium/drivers/radeonsi/si_debug.c:25:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:28:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:140:
In file included from external/mesa/src/amd/common/ac_llvm_build.h:30:
external/mesa/src/compiler/nir/nir.h:966:10: fatal error: 'nir_intrinsics.h' file not found
         ^~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 76dfed8ae2 ("nir: mako all the intrinsics")
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
2018-07-21 08:50:23 +02:00
Eric Anholt
945524ba0e st/dri: Don't require a dri_format for image creation.
Nothing in EGL_KHR_gl_image.txt seems to let us deny creation based on
formats, and doing so causes many failures in
dEQP-EGL.functional.image.api.*

The NONE value we were protecting from only gets looked at in the
__DRI_IMAGE_ATTRIB_FORMAT and __DRI_IMAGE_ATTRIB_FOURCC queries, which are
used from wayland and gbm (which throw an error cleanly on unknown format)
and DMABUF export.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-20 11:26:12 -07:00
Eric Anholt
a221f9709e v3d: Fix incorrect handling of two fences created back-to-back.
Recreating our context's syncobj with ALREADY_SIGNALED meant that if you
created two fences in a row, then waiting on the second would succeed
immediately.  Instead, export a sync file in the gallium fence (since we
don't have a syncobj clone ioctl), and just create a new syncobj to wait
on whenever we need to.

Noticed while debugging
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
2018-07-20 11:11:29 -07:00
Eric Anholt
fc28692a5a v3d: Fix the timeout value passed to drmSyncobjWait().
The API wants an absolute time, so we need to go add gallium's argument to
CLOCK_MONOTONIC.
2018-07-20 11:11:29 -07:00
Eric Anholt
4f04bd68cf v3d: Fix drmSyncobjWait() return value checking even more.
It tends to return >0 in the success case (I think the value is something
like "how much of the timeout remained").  Fixes
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
2018-07-20 11:11:29 -07:00
Eric Anholt
2f90879a34 v3d: Use the list_first_entry/list_last_entry macros. 2018-07-20 11:11:29 -07:00
Eric Anholt
d0e53373e5 v3d: Move BO cache counting to dump time instead of cache management.
This is one less way to get the dump stats wrong.
2018-07-20 11:11:29 -07:00
Eric Anholt
7d6aef6fa5 v3d: Reduce the stale BO reclamation spam with dump_stats set.
This was obviously meant to be when we were actually freeing a BO, not
just when there was at least one BO in the list.
2018-07-20 11:11:29 -07:00
Eric Anholt
5d11094db1 v3d: Respect a sampler view's first_layer field.
Fixes texturing from EGL images created from cubemap faces, as in
dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture
2018-07-20 11:11:29 -07:00
Sonny Jiang
c6737756ad radeonsi: emit_spi_map packets optimization
v2: marek: remove an empty line before break;
    rename reg_val_seq -> spi_ps_input_cntl
    "type * x" -> "type *x"

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-07-20 13:50:26 -04:00
Gert Wollny
4d094993c3 virgl: Expose GL_ARB_copy_image if host supports it
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-07-20 19:15:12 +02:00
Gert Wollny
0bde9739c0 virgl: Allow RGB32* textures only as buffer objects
When requesting a texture of the internal format GL_RGB32F Gallium will
try to allocate a renderable texture and returns RGBA32F or RGBX32F, but
when one requests GL_RGB32I or GL_RGB32UI the according 3-component
texture will be returned. This leads to problems later, when one wants
to use glCopyImageSubData to copy data between these textures that should
be compatible, but given the way virgl and Gallium  handle this the latter
fails with an assertion, because the per-texel bit size is different.

By allowing the GL_RGB32* only for texture buffers these problems are avoided
without losing the ARB_tbo_rgb32 extension (thanks Ilia Mirkin).

v2: Correct spelling (Gurchetan Singh)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2018-07-20 19:12:49 +02:00
Gert Wollny
016807161b r600: Correct evaluation of cube array index and face
The array index needs to be corrected and it must be insured that it is
rounded and its value is non-negative before it is combined with the
face id.

v5: Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin)

v6: Fix type (Roland Scheidegger)

Fixes 182 from android/cts/master/gles31-master.txt:
  dEQP-GLES31.functional.texture.filtering.cube_array.formats.*
  dEQP-GLES31.functional.texture.filtering.cube_array.sizes.*
  dEQP-GLES31.functional.texture.filtering.cube_array.combinations.nearest_mipmap_*
  dEQP-GLES31.functional.texture.filtering.cube_array.combinations.linear_mipmap_*
  dEQP-GLES31.functional.texture.filtering.cube_array.no_edges_visible.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-07-20 14:55:12 +02:00
Gert Wollny
01766c1db6 r600: correct texture offset for array index lookup
Correct the array index for TEXTURE_*1D_ARRAY, and TEXTURE_*2D_ARRAY
The standard says the array index is evaluated according to

   floor(z + 0.5)

but RNDNE is sufficient also for the test cases were z is close to 1.5
and it is likely to hit 1.5, the corner case were RNDNE gives a result
different from above formula.

v5: - Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin)
    - update commit message

Fixes 325 tests from android/cts/master/gles3-master.txt:
  dEQP-GLES3.functional.shaders.texture_functions.texture.*sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.textureoffset.*sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.texturelod.sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.texturelodoffset.*sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.*sampler2darray*
  dEQP-GLES3.functional.texture.filtering.2d_array.formats.*
  dEQP-GLES3.functional.texture.filtering.2d_array.sizes.*
  dEQP-GLES3.functional.texture.filtering.2d_array.combinations.*
  dEQP-GLES3.functional.texture.shadow.2d_array.*
  dEQP-GLES3.functional.texture.vertex.2d_array.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-07-20 14:55:12 +02:00
Gert Wollny
626bd455d4 r600: Delay emission of texture gradients and lookup offsets
Gradients used in texture lookups and the offsets must reside in the
same fetch clause (the first is imposed by the hardware and the second
is expected by sb). In order to ensure that no ALU clause is inserted
between emission and use of these, delay the emission of these
instructions until the texture instruction using them is also emitted.

This is needed in preparation for the correction of the texture array
indices.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-07-20 14:55:12 +02:00
Rhys Perry
409a60df3b nv50/ir: move LateAlgebraicOpt back to right after ConstantFolding
total instructions in shared programs : 5480808 -> 5472107 (-0.16%)
total gprs used in shared programs    : 647530 -> 647532 (0.00%)
total shared used in shared programs  : 389120 -> 389120 (0.00%)
total local used in shared programs   : 21064 -> 21064 (0.00%)
total bytes used in shared programs   : 58551648 -> 58459352 (-0.16%)

                local     shared        gpr       inst      bytes
    helped           0           0          73        2609        2609
      hurt           0           0          71          34          34
2018-07-19 23:34:58 +02:00
Rhys Perry
2afef231db nv50/ir: handle SHLADD in IndirectPropagation
An alternative solution to the problem fixed in
0bd83d0 ("nv50/ir: move LateAlgebraicOpt to the very end").

total instructions in shared programs : 5481195 -> 5480808 (-0.01%)
total gprs used in shared programs    : 647535 -> 647530 (-0.00%)
total shared used in shared programs  : 389120 -> 389120 (0.00%)
total local used in shared programs   : 21064 -> 21064 (0.00%)
total bytes used in shared programs   : 58555784 -> 58551648 (-0.01%)

                local     shared        gpr       inst      bytes
    helped           0           0           2          34          34
      hurt           0           0           0           0           0
2018-07-19 23:34:58 +02:00
Rhys Perry
3b6edd0b59 gm107/ir: use CS2R for SV_CLOCK
This instruction seems to be faster than S2R and requires no barrier,
though the range of special registers it can read from is limited.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2018-07-19 23:34:58 +02:00
Marek Olšák
565dacc3d6 winsys/amdgpu: remove RADEON_SURF_FMASK leftover
RADEON_SURF_FMASK is never set.
2018-07-19 00:58:51 -04:00
Marek Olšák
fb049742d6 r600: silence the signed overflow warning like radeonsi
r600_gpu_load.c: In function ‘r600_gpu_load_thread’:
../../../../src/util/os_time.h:82:7: warning: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Wstrict-overflow]
    if (start <= end)
2018-07-18 17:48:48 -04:00
Sonny Jiang
4bf7234061 radeonsi: emit_guardband packets optimization
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-07-18 15:04:27 -04:00
Sonny Jiang
80ade05b8d radeonsi: Save CLEAR_STATE initial values for optimization
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-07-18 15:04:27 -04:00
Jan Vesely
9baacf3fa7 radeonsi: Refuse to accept code with unhandled relocations
They might lead to unrecoverable GPU hang.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2018-07-18 13:56:56 -04:00