Commit graph

101692 commits

Author SHA1 Message Date
Jonathan Marek
bce4f11dbc freedreno: a2xx: disable PIPE_CAP_PACKED_UNIFORMS
a2xx driver is currently broken when PIPE_CAP_PACKED_UNIFORMS is enabled,
disable it for now.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-04-23 17:13:32 +00:00
Jonathan Marek
418c3d9a4f freedreno: a2xx: fix builtin blit program compilation
tgsi_to_nir now requires a screen pointer and is used by fd2_prog_init.
fd2_prog_init is used before fd_context_init so set the pointer manually.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-04-23 17:13:32 +00:00
Jonathan Marek
33cafb41a2 svga: add new ATC formats to the format conversion table
Fixes the static assertion error.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-04-23 17:11:56 +00:00
Jonathan Marek
0e696416f9 freedreno: a2xx: add GL_AMD_compressed_ATC_texture support
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-04-23 17:11:56 +00:00
Jonathan Marek
734409096b freedreno: a3xx: add GL_AMD_compressed_ATC_texture support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-04-23 17:11:56 +00:00
Jonathan Marek
0719a5f646 st/mesa: add ATC support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-04-23 17:11:56 +00:00
Jonathan Marek
bfa72e4d52 llvmpipe, softpipe: no support for ATC textures
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-04-23 17:11:56 +00:00
Jonathan Marek
ea254fcd3c gallium: add ATC format support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-04-23 17:11:56 +00:00
Jonathan Marek
73c1d7e8c9 mesa: add GL_AMD_compressed_ATC_texture support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-04-23 17:11:56 +00:00
Marek Olšák
951d60f8cd radeonsi: delay adding BOs at the beginning of IBs until the first draw
so that bound compute shader resources won't be added when they are not
needed and same for graphics.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:36 -04:00
Marek Olšák
09bb8c8557 radeonsi: add helper si_get_minimum_num_gfx_cs_dwords
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:34 -04:00
Marek Olšák
c59d238bb0 radeonsi: add si_cp_copy_data
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:33 -04:00
Marek Olšák
694e320643 winsys/amdgpu: clean up and remove nonsensical assertion
The assertion considers max_dw from the current IB in the chain, but
big_ib_buffer is a buffer for the next IB, which can be smaller.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:31 -04:00
Marek Olšák
1807f6cfe9 winsys/amdgpu: enable chaining for compute IBs
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:36:06 -04:00
Marek Olšák
b99bed6246 winsys/amdgpu: reorder chunks, make BO_HANDLES first, IB and FENCE last
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:28:56 -04:00
Marek Olšák
437d032b7d winsys/amdgpu: make IBs writable and expose their address
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:28:56 -04:00
Marek Olšák
2313176817 ac: add REWIND and GDS registers to register headers
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:28:56 -04:00
Marek Olšák
35cd57df2e ac: add ac_get_i1_sgpr_mask
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:28:56 -04:00
Marek Olšák
bfb9287599 ac: add radeon_info::is_pro_graphics
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:28:56 -04:00
Marek Olšák
64d6cc982d ac: add radeon_info::marketing_name, replacing the winsys callback
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:28:56 -04:00
Marek Olšák
9b33465481 tgsi/scan: add uses_drawid
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-04-23 11:28:56 -04:00
Kenneth Graunke
77449d7c41 iris: Track valid data range and infer unsynchronized mappings.
Applications frequently call glBufferSubData() to consecutive regions
of a VBO to append new vertex data.  If no data exists there yet, we
can promote these to unsynchronized writes, even if the buffer is busy,
since the GPU can't be doing anything useful with undefined content.
This can avoid a bunch of unnecessary blitting on the GPU.

u_threaded_context would do this for us, and in fact prohibits us from
doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED).  But we haven't
hooked that up yet, and it may be useful to disable u_threaded_context
when debugging...at which point we'd still want this optimization.  At
the very least, it would let us measure the benefit of threading
independently from this optimization.  And it's not a lot of code.

Removes most stall avoidance blits in "Total War: WARHAMMER."

On my Skylake GT4e at 1920x1080, this appears to improve performance
in games by the following (but I did not do many runs for proper
statistics gathering):

   ----------------------------------------------
   | DiRT Rally        | +2% (avg) | + 2% (max) |
   | Bioshock Infinite | +3% (avg) | + 9% (max) |
   | Shadow of Mordor  | +7% (avg) | +20% (max) |
   ----------------------------------------------
2019-04-23 00:24:08 -07:00
Kenneth Graunke
768b17a7ad iris: Make a resource_is_busy() helper
This checks both "is it busy" and "do we have work queued up for it"?
2019-04-23 00:24:08 -07:00
Kenneth Graunke
5ad0c88dbe iris: Replace buffer backing storage and rebind to update addresses.
This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(),
as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag.  When either
of these happen, we swap out the backing storage of the buffer for a
new idle BO, allowing us to write to it immediately without stalling
or queueing a blit.

On my Skylake GT4e at 1920x1080, this improves performance in games:

   -----------------------------------------------
   | DiRT Rally        | +25% (avg) | +17% (max) |
   | Bioshock Infinite | +22% (avg) | +11% (max) |
   | Shadow of Mordor  | +27% (avg) | +83% (max) |
   -----------------------------------------------
2019-04-23 00:24:08 -07:00
Kenneth Graunke
0a082b6560 iris: Make memzone_for_address non-static
I want to use this in iris_resource.c.
2019-04-23 00:24:08 -07:00
Kenneth Graunke
72277044e2 iris: Make a gl_shader_stage -> pipe_shader_stage helper function
This is probably not the best place for it, but I don't feel like moving
the one out of the TGSI translator today, and we already have the other
direction here, so...*shrug*
2019-04-23 00:24:08 -07:00
Kenneth Graunke
b45dff1da8 iris: Rework image views to store pipe_image_view.
This will be useful when rebinding images.
2019-04-23 00:24:08 -07:00
Kenneth Graunke
2f60850a3f iris: Rework UBOs and SSBOs to use pipe_shader_buffer
This unifies a bunch of the UBO and SSBO code to use common structures.
Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size,
which can be useful when filling out the surface state.
2019-04-23 00:24:08 -07:00
Kenneth Graunke
00d4019676 iris: Track bound constant buffers
This helps avoid having to iterate over [0, PIPE_MAX_CONSTANT_BUFFERS)
looking to see if any resources are bound.
2019-04-23 00:24:08 -07:00
Kenneth Graunke
4d12236072 iris: Mark constants dirty on transfer unmap even if no flushes occur
I have various conditions in place to try and avoid unnecessary
PIPE_CONTROL flushes, especially to batches which may have never
used the buffer being mapped.  But if we do a CPU map to a bound
constant buffer, we still need to mark push constants dirty, even
if there's nothing happening in batches that would warrant a flush.

Fixes obvious misrendering in the "XCOM 2: War of the Chosen" menus
(lots of rainbow colored triangles).  Fixes lots of blinking elements
in "Shadow of Mordor".  Fixes missing crowd rendering in "DiRT Rally".
2019-04-23 00:24:08 -07:00
Lionel Landwerlin
b1ba7ffdbd intel: workaround VS fixed function issue on Gen9 GT1 parts
The issue is noticeable in the
dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_3d
test where a triangle goes missing when we use the maximum number of
URB entries as specified by the documentation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107505
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-23 13:41:20 +08:00
Matt Turner
4ec258ac3c intel/compiler: Improve fix_3src_operand()
Allow ATTR and IMM sources unconditionally (ATTR are just GRFs, IMM will
be handled by opt_combine_constants(). Both are already allowed by
opt_copy_propagation().

Also allow FIXED_GRF if the regioning is 8,8,1. Could also allow other
stride=1 regions (e.g., 4,4,1) and scalar regions but I don't think
those occur. This is sufficient to allow a pass added in a future commit
(fs_visitor::lower_linterp) to avoid emitting extra MOV instructions.

I removed the 'src.stride > 1' case because it seems wrong: 3-src
instructions on Gen6-9 are align16-only and can only do stride=1 or
stride=0. A run through Jenkins with an assert(src.stride <= 1) never
triggers, so it seems that it was dead code.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-04-22 16:54:31 -07:00
Matt Turner
8aae7a3998 intel/compiler: Add unit tests for sat prop for different exec sizes
The two new unit tests verify that propagating a saturate between
instructions of different exec sizes does not happen.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-04-22 16:54:21 -07:00
Matt Turner
54d4d34b96 intel/compiler: Use SIMD16 instructions in fs saturate prop unit test
Will allow us to test that propagation between instructions of different
exec sizes does not happen (in the next commit).

The stray-looking change in intervening_dest_write is to adjust the size
of the texture result to keep the test functioning identically when the
instructions' exec sizes are doubled. Without the change, the texture
does not overwrite the destination fully as the unit test intends.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-04-22 16:54:17 -07:00
Rafael Antognolli
70e03e220c intel/fs: Remove fs_generator::generate_linterp from gen11+.
We now have a lowering pass that will do this at the fs_visitor level,
so we can remove this code from gen11+.

v2: Reduce size of the "i" array from 4 to 2 (Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:54:00 -07:00
Rafael Antognolli
9ea90aae1e intel/fs: Add a lowering pass for linear interpolation.
On gen11, instead of using a PLN instruction, we convert
FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the
fs_generator code.

This patch adds a lowering pass that does the same thing at the
fs_visitor. It also drops the usage of NF types, since we don't need the
extra precision and it lets us skip the accumulator. With all that, some
optimizations will still be run on the generated code, and we should get
better scheduling.

v2: Update comment about saturation and conditional mod (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:54:00 -07:00
Rafael Antognolli
c0504569ea intel/fs: Move the scalar-region conversion to the generator.
Move the scalar-region conversion from the IR to the generator, so it
doesn't affect the Gen11 path. We need the non-scalar regioning
for a later lowering pass that we are adding.

v2: Better commit message (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:54:00 -07:00
Rafael Antognolli
0778748eba intel/fs: Only propagate saturation if exec_size is the same.
Otherwise it could propagate the saturation from a SIMD16 instruction
into a SIMD8 instruction. With that, only part of the destination
register, which is the source of the move with saturation, would have
been updated.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:53:55 -07:00
Kenneth Graunke
087f92c59a i965: Tidy bogus indentation left by previous commit
I left code indented one level too far in the previous commit to make
the diff easier to review.  Drop that extra level now.

Fixes: 6981069fc8 i965: Ignore uniform storage for samplers or images, use binding info

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-22 15:41:56 -07:00
Kenneth Graunke
6981069fc8 i965: Ignore uniform storage for samplers or images, use binding info
gl_nir_lower_samplers_as_deref creates new top level sampler and image
uniforms which have been split from structure uniforms.  i965 assumed
that it could walk through gl_uniform_storage slots by starting at
var->data.location and walking forward based on a simple slot count.
This assumed that structure types were walked in a particular order.

With samplers and images split out of structures, it becomes impossible
to assign meaningful locations.  Consider:

   struct S {
      sampler2D a;
      sampler2D b;
   } s[2];

The gl_uniform_storage locations for these follow this map:

   0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0].

But the new split variables look like:

   sampler2D lowered_a[2];
   sampler2D lowered_b[2];

and there is no way to know that there's effectively a stride to get to
the location for successive elements of a[] or b[].  So, working with
location becomes effectively impossible.

Ultimately, the point of looking at uniform storage was to pull out the
bindings from the opaque index fields.  gl_nir_lower_samplers_as_derefs
can obtain this information while doing the splitting, however, and sets
up var->data.binding to have the desired values.

We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so
gl_nir_lower_samplers_as_derefs has the opportunity to set proper image
bindings.  Then, we make the uniform handling code skip sampler(-array)
variables, and handle image param setup based on var->data.binding.

Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct,
this time without regressing dEQP-GLES2.functional.uniform_api.random.3.

Fixes: f003859f97 nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-22 15:39:55 -07:00
Kenneth Graunke
47303b466c Revert "glsl: Set location on structure-split sampler uniform variables"
This reverts commit 9e0c744f07, which
regressed dEQP-GLES2.functional.uniform_api.random.3.  It turns out
that the newly produced location is meaningless and impossible to
consume by drivers that want to look at gl_uniform_storage, so it's
probably better to leave it unset (0) than a number that looks usable.

Leave a tombstone^Wcomment to discourage the next person from making
the obvious looking fix.

See the next commit for a longer description of the problem.

This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct
on i965, which was originally fixed by the revert.  The next commit
will fix it again.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-22 15:39:55 -07:00
Marek Olšák
b58e5fb6f3 radeonsi: use CP DMA for the null const buffer clear on CIK
This is a workaround for a thread deadlock that I have no idea
why it occurs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-22 16:05:52 -04:00
Danylo Piliaiev
f280c36c08 drirc: Add workaround for Epic Games Launcher
Epic Games Launcher could be launched in opengl mode
with "-opengl" option. It creates 4.4 opengl core context
however it uses deprecated functionality e.g. default
vertex buffer object.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110462

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-04-22 16:04:19 -04:00
Kenneth Graunke
1566054459 iris: Track bound and writable SSBOs
Marek recently extended pipe->set_shader_buffers() to take an extra
writable_bitmask parameter, indicating which SSBOs are writable (some
may be bound read-only).  We can use this to decide whether to set
EXEC_OBJECT_WRITE when pinning.  Avoiding the write flag can save us
some cross-batch flushing if the SSBO is used for reading in both the
render and compute engines.
2019-04-22 11:31:14 -07:00
Chia-I Wu
e9c5e13344 virgl: clear vertex_array_dirty
Clear vertex_array_dirty after the state is emitted.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-04-22 10:19:47 -07:00
Lubomir Rintel
e983a975c6 gallivm: disable NEON instructions if they are not supported
The LLVM project made some questionable decisions about defaults for
armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell
platforms).

On top of that, getHostCPUFeatures() doesn't disable missing machine
attributes. Finally, -neon alone is not sufficient to disable emmision
of NEON instructions.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 09:47:49 -07:00
Lubomir Rintel
bc6bfc861f gallivm: guess CPU features also on ARM
getHostCPUFeatures() is also available on ARM, for even longer time than
for x86. Use it -- it potentially enables instructions that may speed
things up.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 09:47:39 -07:00
Kenneth Graunke
36478b9f77 iris: Enable the dual_color_blend_by_location driconf option.
This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.
2019-04-22 09:36:36 -07:00
Kenneth Graunke
faa52e328e iris: Add mechanism for iris-specific driconf options
Based on Nicolai's 0f8c5de869.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-04-22 09:35:36 -07:00
Jason Ekstrand
ccb25aaeaf nir: Use the NIR_SRC_AS_ macro to define nir_src_as_deref
We have a macro for this now; no reason to hand-roll it for derefs.
While we're here, move the NIR_DEFINE_CAST for derefs down to where all
the other ones are.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-22 15:23:24 +00:00