Commit graph

37284 commits

Author SHA1 Message Date
Alyssa Rosenzweig
89fdbb6707 panfrost/midgard: Add fcsel_i opcode
Whereas a normal fcsel acts on a boolean input in r31.w, the fcsel_i
variant acts on an integer input in r31.w, which can be preloaded with
an instruction like imov (with the appropriate negate flag on the
source).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-03-26 23:35:15 +00:00
Alyssa Rosenzweig
121417ef1d panfrost: Implement scissor test
This preliminary implementation should handle some basic cases. Future
work should scissor the FRAGMENT job as well for efficiency.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-03-26 23:35:14 +00:00
Alyssa Rosenzweig
bd9446e719 panfrost: Fix viewports
Our viewport code hardcoded a number of wrong assumptions, which sort of
sometimes worked but was definitely wrong (and broke most of dEQP). This
corrects the logic, accounting for flipped-Y framebuffers, which
fixes... most of dEQP.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-03-26 23:35:10 +00:00
Alyssa Rosenzweig
9da4603fb6 panfrost/midgard: Fix b2f32 swizzle for vectors
Fixes issues in most of dEQP-GLES2.functional.shaders.*

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-03-26 23:35:08 +00:00
Dave Airlie
e77013fb7f softpipe: fix clears to only clear specified color buffers.
This fixes piglit clearbuffer-mixed-format

Reviewed-by: Brian Paul <brianp@vmware.com>
2019-03-27 07:53:32 +10:00
Dave Airlie
7f7c9425a8 draw/vs: partly fix basevertex/vertex id
This gets the basevertex from the draw depending on whether
it's an indexed or non-indexed draw.

We still fail a transform feedback test for vertex id, as
the vertex id actually an index id, and isn't getting translated
properly to a vertex id, suggestions on how/where to fix that welcome.

Reviewed-by: Brian Paul <brianp@vmware.com>
2019-03-27 07:52:28 +10:00
Kristian H. Kristensen
a752422bd4 freedreno/ir3: Track whether shader needs derivatives
In 1088b788 ("freedreno/ir3: find # of samplers from uniform vars") we
started counting number of samplers based on the uniform vars instead
of number of cat5 instructions.  We used the number of samplers to
determine whether to enable derivatives, but when we only use
derivatives and no samplers, that now breaks.  Track whether we need
derivatives explicitly and use that to enable the state.

Fixes: 1088b788 ("freedreno/ir3: find # of samplers from uniform vars")
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-25 18:36:48 -07:00
Andre Heider
12f11e6fe6 st/nine: enable csmt per default on iris
iris is thread safe, enable csmt for a ~5% performace boost.

Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
2019-03-25 22:21:19 +01:00
Danylo Piliaiev
c8abe03f3b i965,iris,anv: Make alpha to coverage work with sample mask
From "Alpha Coverage" section of SKL PRM Volume 7:
 "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in
  hardware, regardless of the state setting for this feature."

From OpenGL spec 4.6, "15.2 Shader Execution":
 "The built-in integer array gl_SampleMask can be used to change
 the sample coverage for a fragment from within the shader."

From OpenGL spec 4.6, "17.3.1 Alpha To Coverage":
 "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value
  is generated where each bit is determined by the alpha value at the
  corresponding sample location. The temporary coverage value is then
  ANDed with the fragment coverage value to generate a new fragment
  coverage value."

Similar wording could be found in Vulkan spec 1.1.100
"25.6. Multisample Coverage"

Thus we need to compute alpha to coverage dithering manually in shader
and replace sample mask store with the bitwise-AND of sample mask and
alpha to coverage dithering.

The following formula is used to compute final sample mask:
  m = int(16.0 * clamp(src0_alpha, 0.0, 1.0))
  dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) |
     0x0808 * (m & 2) | 0x0100 * (m & 1)
  sample_mask = sample_mask & dither_mask
Credits to Francisco Jerez <currojerez@riseup.net> for creating it.

It gives a number of ones proportional to the alpha for 2, 4, 8 or 16
least significant bits of the result.

GEN6 hardware does not have issue with simultaneous usage of sample mask
and alpha to coverage however due to the wrong sending order of oMask
and src0_alpha it is still affected by it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-03-25 13:54:55 -07:00
Dave Airlie
551950cacd draw/gs: fix point size outputs from geometry shader.
If the geom shader emits a point size we failed to find it here,
use the correct API to look it up.

Fixes:
tests/spec/glsl-1.50/execution/geometry/point-size-out.shader_test

Reviewed-by: Brian Paul <brianp@vmware.com>
2019-03-26 05:17:06 +10:00
Dave Airlie
d3836510d2 draw: bail instead of assert on instance count (v2)
With indirect rendering it's fine to set the instance count
parameter to 0, and expect the rendering to be ignored.

Fixes assert in KHR-GLES31.core.compute_shader.pipeline-gen-draw-commands
on softpipe

v2: return earlier before changing fpstate

Reviewed-by: Brian Paul <brianp@vmware.com>
2019-03-26 05:16:56 +10:00
Leo Liu
382401aab7 vl/dri3: remove the wait before getting back buffer
The wait here is unnecessary since we got a pool of back buffers,
and the wait for swap buffer will happen before the present pixmap,
at the same time the previous back buffer will be put back to pool
for reuse after the check for PresentIdleNotify event

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2019-03-25 12:20:31 -04:00
Kishore Kadiyala
e1d8057160 android: static link with libexpat with Android O+
In Android O, MESA needs to statically link libexpat so that
it's in same VNDK namespace.

v2: apply change also to anv driver (Tapani)
v3: use += in anv change (Eric Engestrom)

Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33
Signed-off-by: Kishore Kadiyala <kishore.kadiyala@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-03-25 10:11:57 +02:00
Rob Clark
6fd5a7ff8c freedreno: add ESSL cap
Report 320 for a6xx, which isn't *quite* true (no geom/tess, in
particular), but other caps keep the reported GL and GLSL versions
correct (3.1 / 3.10 es).  But reporting 320 will switch on
EXT_gpu_shader5, which is the goal.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-22 16:39:14 -04:00
Rob Clark
de481947d9 gallium: add PIPE_CAP_ESSL_FEATURE_LEVEL
Adds a new cap to allow drivers to expose higher shading language
versions in GLES contexts, to avoid having to report an artificially
low version for the benefit of GL contexts.

The motivation is to expose EXT_gpu_shader5 even though a driver may
not support all the features needed for the corresponding GL extension
(ARB_gpu_shader5).

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-03-22 16:39:13 -04:00
Vinson Lee
93c81ca336 swr: Fix build with llvm-9.0.
Fix build error after llvm-9.0svn r352827 ("[opaque pointer types] Add a
FunctionCallee wrapper type, and use it.").

In file included from ./rasterizer/jitter/builder.h:158:0,
                 from swr_shader.cpp:35:
./rasterizer/jitter/gen_builder_meta.hpp: In member function ‘llvm::Value* SwrJit::Builder::VGATHERPD(llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, const llvm:
:Twine&)’:
./rasterizer/jitter/gen_builder_meta.hpp:51:117: error: no matching function for call to ‘cast(llvm::FunctionCallee)’
     Function* pFunc = cast<Function>(JM()->mpCurrentModule->getOrInsertFunction("meta.intrinsic.VGATHERPD", pFuncTy));
                                                                                                                     ^

Suggested-by: Philip Meulengracht <the_meulengracht@hotmail.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-03-22 13:13:51 -07:00
Samuel Pitoiset
23d30f4099 spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.

While we are at it, factorize a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-22 19:41:46 +01:00
Chris Wilson
db99d02fce iris: Push heavy memchecker code to DEBUG
Invoking VALGRIND_CHECK_MEM_IS_DEFINED pulls in enough code to convince
gcc to not inline __gen_uint and results in a lot of packing code ending
up out-of-line with lots of stack copying. To ameliorate this, only
insert the check inside the packer if DEBUG is defined and instead
perform the validation checking before submitting the batch to the
kernel. This should give accurate results if --trace-origins=yes is
used, and failing that we can recompile in full debug mode to check on
insertion.

Improve drawoverhead baseline by 25% with a default build with
valgrind-dev installed (with effectively no loss of vg coverage).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-22 10:38:03 -07:00
Kenneth Graunke
87f865aab3 iris: Fix batch chaining map_next increment.
Caught by Chris Wilson; split out from his valgrind patch.
2019-03-22 09:31:15 -07:00
Rob Clark
dbac1a80d1 freedreno/ir3: rename has_kill to no_earlyz
There are other cases where we need to disable early-z, like image
writes.  So rename to something more generic.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-22 08:53:28 -04:00
Kenneth Graunke
66c100a8d6 iris: Skip resolves and flushes altogether if unnecessary
Improves drawoverhead baseline scores by 1.17x.
2019-03-21 20:28:17 -07:00
Kenneth Graunke
365886ebe1 iris: Skip framebuffer resolve tracking if framebuffer isn't dirty
Improves drawoverhead baseline score by 1.86x.
2019-03-21 20:28:17 -07:00
Kenneth Graunke
1d05d24b1d iris: Skip input resolve handling if bindings haven't changed
This brings the drawoverhead 16 Tex w/ no state change score from
22% of baseline to 97% of baseline.
2019-03-21 20:28:17 -07:00
Kenneth Graunke
a342f2deb1 iris: Fix util_vma_heap_init size for IRIS_MEMZONE_SHADER
Fixes assertions when disabling bucket allocators.
2019-03-21 19:07:17 -07:00
Dave Airlie
9dd92d08a5 softpipe: fix integer texture swizzling for 1 vs 1.0f
The swizzling was putting float one in not integer 1.

This fixes a lot of arb_texture_view-rendering-formats cases.

Reviewed-by: Brian Paul <brianp@vmware.com>
2019-03-22 09:30:35 +10:00
Dave Airlie
aae5ba72ab softpipe: remove shadow_ref assert.
I don't think this really buys us anything and TG4 with cubemap arrays
falls over because sampler == 2, but otherwise works fine.

Fixes:
./bin/textureGather fs shadow r  CubeArray repeat

on softpipe with ARB_gpu_shader5 enabled.

Reviewed-by: Brian Paul <brianp@vmware.com>
2019-03-22 09:30:29 +10:00
Dave Airlie
8dc8b1361a softpipe: handle 32-bit bitfield inserts
Fixes piglits if ARB_gpu_shader5 is enabled

Reviewed-by: Brian Paul <brianp@vmware.com>
2019-03-22 09:30:26 +10:00
Dave Airlie
7b7cb1bc35 softpipe: fix 32-bit bitfield extract
These didn't deal with the width == 32 case that TGSI is defined with.

Fixes piglit tests if ARB_gpu_shader5 is enabled.

Reviewed-by: Brian Paul <brianp@vmware.com>
2019-03-22 09:30:21 +10:00
Eric Anholt
16f2770eb4 v3d: Upload all of UBO[0] if any indirect load occurs.
The idea was that we could skip uploading the constant-indexed uniform
data and just upload the uniforms that are variably-indexed.  However,
since the VS bin and render shaders may have a different set of uniforms
used, this meant that we had to upload the UBO for each of them.  The
first case is generally a fairly small impact (usually the uniform array
is the most space, other than a couple of FSes in shader-db), while the
second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms
instead of 18k.

Given that the optimization is of dubious value, has a big downside, and
is quite a bit of code, just drop it.  No change in shader-db.  No change
on 3DMMES2 (n=15).
2019-03-21 14:20:50 -07:00
Eric Anholt
320e96bace v3d: Move constant offsets to UBO addresses into the main uniform stream.
We'd end up with the constant offset in the uniform stream anyway, since
they're bigger than small immediates.  Avoids the extra uniforms and adds
in the shader in favor of just adding once on the CPU.

shader-db:
total instructions in shared programs: 6496865 -> 6494851 (-0.03%)
total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)
2019-03-21 14:20:50 -07:00
Eric Anholt
c36d2793ec v3d: Rename v3d_tmu_config_data to v3d_unit_data.
I want to reuse this for encoding small constant UBO/SSBO offsets into the
uniform stream to reduce the extra uniform loads and adds for the small
constant offsets.
2019-03-21 14:20:50 -07:00
Karol Herbst
99f202432b nv50/ir/nir: support gather offsets
v2: only emit offsets if those are !0

Signed-off-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Rafael Antognolli
e7c8402163 iris: Let blorp update the clear color for us.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:26 -07:00
Rafael Antognolli
93123417dd iris: Track fast clear color.
v2: Update tracked clear color when we update the surface state.
v3: Update all aux surface states when updating the clear color.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:26 -07:00
Rafael Antognolli
5658c661de iris: Stall on the CPU and resolve predication during fast clears.
Only if the clear color/depth is changing. In those cases, it's hard to
keep track of the current clear color, and aux state of some layers,
when predication is enabled. So simplify everything by stalling on the
few cases where we would have a fast clear color change with
predication.

v2:
 - fix comment (Ken)
 - explicitly check for predicate state after resolving it (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:26 -07:00
Rafael Antognolli
ce830a364e iris: Add iris_resolve_conditional_render().
This function can be used to stall on the CPU and resolve the predicate
for the conditional render. It will convert ice->state.predicate from
IRIS_PREDICATE_STATE_USE_BIT to either IRIS_PREDICATE_STATE_RENDER or
IRIS_PREDICATE_STATE_DONT_RENDER, depending on the result of the query.

v2:
 - return void (Ken)
 - update the stored condition (Ken)
 - simplify the code leading to resolve the predicate (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:25 -07:00
Rafael Antognolli
131b42f0aa iris: Implement fast clear color.
If all the restrictions are satisfied, do a fast clear instead of
regular clear.

v2:
 - add perf_debug() when we can't fast clear (Ken)
 - improve comment: s/miptree/resource/ (Ken)
 - use swizzle_color_value from blorp (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:25 -07:00
Rafael Antognolli
7f6344a726 iris: Bring back check for srgb and fast clear color.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:25 -07:00
Rafael Antognolli
a8b5ea8ef0 iris: Add function to update clear color in surface state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:25 -07:00
Rafael Antognolli
32c8fa6411 iris: Add helper to convert fast clear color.
It needs to be converted to a value that can be used by ISL (and our
hardware SURFACE_STATE structure).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:25 -07:00
Rafael Antognolli
51638cf18a iris: Fast clear depth buffers.
Check and do a fast clear instead of a regular clear on depth buffers.

v3:
 - remove swith with some cases that we shouldn't wory about (Ken)
 - more parens into the has_hiz check (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:25 -07:00
Rafael Antognolli
34d00b4410 iris: Use the clear depth when emitting 3DSTATE_CLEAR_PARAMS.
Take the clear depth into account when IRIS_DIRTY_DEPTH_BUFFER is marked
as dirty.

Also update the blorp surface clear color.

v2: Use a single if (zres && zres->aux.bo) (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:25 -07:00
Rafael Antognolli
37f2692591 iris: Allocate buffer space for the fast clear color.
Also store clear color in the iris_resource.

Always allocate clear color state buffer.

v2:
 - Make clear_color_offset be 64 bits (Ken).
 - Simplify the logic to decide when to memset the aux buffer (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:25 -07:00
Dave Airlie
04189565a0 softpipe: fix texture view crashes
I noticed we crashed piglit arb_texture_view-rendering-formats
when run on softpipe.

This fixes the clear tiles to use the surface format not the
underlying storage format.

This fixes a bunch of srgb piglits as well.

Fixes: 396ac41fc2 (softpipe: add integer support)

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-03-21 05:06:07 +10:00
Kenneth Graunke
3c3f250456 nvc0: Skip new update barrier bits
I added new barrier bits in 220c1dce1e
and made most drivers skip them.  I thought nvc0 was already skipping
those but missed the else case here, which does something.  So make it
explicitly skip like I did everywhere else.

Thanks to Ilia for catching this.

Fixes: 220c1dce1e gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits.
2019-03-20 10:30:32 -07:00
Kenneth Graunke
220c1dce1e gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits.
The glMemoryBarrier() function makes shader memory stores ordered with
respect to things specified by the given bits.  Until now, st/mesa has
ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT,
saying that drivers should implicitly perform the needed flushing.

This seems like a pretty big assumption to make.  Instead, this commit
opts to translate them to new PIPE_BARRIER bits, and adjusts existing
drivers to continue ignoring them (preserving the current behavior).

The i965 driver performs actions on these memory barriers.  Shader
memory stores go through a "data cache" which is separate from the
render cache and other read caches (like the texture cache).  All
memory barriers need to flush the data cache (to ensure shader memory
stores are visible), and possibly invalidate read caches (to ensure
stale data is no longer visible).  The driver implicitly flushes for
most caches, but not for data cache, since ARB_shader_image_load_store
introduced MemoryBarrier() precisely to order these explicitly.

I would like to follow i965's approach in iris, flushing the data cache
on any MemoryBarrier() call, so I need st/mesa to actually call the
pipe->memory_barrier() callback.

Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate
and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on
the iris driver.

Roland said this looks reasonable to him.
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-03-19 23:43:33 -07:00
Tapani Pälli
3e534489ec iris: mark switch case fallthrough
CID: 1444103
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 08:21:50 +02:00
Tapani Pälli
03cbfbd913 iris: initialize num_cbufs
Currently initialized only if 'ish' is non-NULL.

CID: 1444106
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 08:20:09 +02:00
Daniel Stone
d258b787fa panfrost: Properly align stride
Handle buffers whose width is not aligned to 16px by padding the stride
and storing it accordingly.

This does not reject imports for images whose stride is not sufficiently
aligned.

v2: make sure bo->stride is set on imported buffers, and add missing
variable definition. (Tomeu)

Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-03-20 04:20:42 +00:00
Eric Anholt
17115da6ad v3d: Expose the dma-buf modifiers query.
This allows DRI3 to pick between UIF and raster according to whether we're
pageflipping or not and whether the pageflipping display can do UIF,
avoiding copies for the windowed/composited case that previously was
forced to linear.

Improves windowed glmark2 -b build:use-vbo=false performance by 30.7783%
+/- 13.1719% (n=3)
2019-03-19 08:59:01 -07:00