Commit graph

96644 commits

Author SHA1 Message Date
Kenneth Graunke
78c404f106 i965: Use a separate state buffer, but avoid changing flushing behavior.
Previously, we emitted GPU commands and indirect state into the same
buffer, using a stack/heap like system where we filled in commands from
the start of the buffer, and state from the end of the buffer.  We then
flushed before the two met in the middle.

Meeting in the middle is fatal, so you have to be certain that you
reserve the correct amount of space before emitting commands or state
for a draw.  Currently, we will assert !no_batch_wrap and die if the
estimate is ever too small.  This has been mercifully obscure, but has
happened on a number of occasions, and could in theory happen to any
application that issues a large draw at just the wrong time.

Estimating the amount of batch space required is painful - it's hard to
get right, and getting it right involves a lot of code that would burn
CPU time, and also be painful to maintain.  Rolling back to a saved
state and retrying is also painful - failing to save/restore all the
required state will break things, and redoing state emission burns a
lot of CPU.  memcpy'ing to a new batch and continuing is painful,
because commands we issue for a draw depend on earlier commands as well
(such as STATE_BASE_ADDRESS, or the GPU being in a pirtacular state).

The best plan is to never run out of space, which is totally doable but
pretty wasteful - a pessimal draw requires a huge amount of space, and
rarely occurs.  Instead, we'd like to grow the batch buffer if we need
more space and can't safely flush.

We can't grow with a meet in the middle approach - we'd have to move the
state to the end, which would mean updating every offset from dynamic
state base address.  Using separate batch and state buffers, where both
fill starting at the beginning, makes it easy to grow either as needed.

This patch separates the two concepts.  We create a separate state
buffer, with a second relocation list, and use that for brw_state_batch.

However, this patch tries to retain the original flushing behavior - it
adds the amount of batch and state space together, as if they were still
co-existing in a single buffer.  The hope is to flush at the same time
as before.  This is necessary to avoid provoking bugs caused by broken
batch wrap handling (which we'll fix shortly).  It also avoids suddenly
increasing the size of the batch (due to state not taking up space),
which could have a significant performance impact.  We'll tune it later.

v2:
- Mark the statebuffer with EXEC_OBJECT_CAPTURE when supported (caught
  by Chris).  Unfortunately, we lose the ability to capture state data
  on older kernels.
- Continue to support the malloc'd shadow buffers.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-14 16:17:36 -07:00
Kenneth Graunke
0bf3fa4c53 i965: Pass screen to intel_batchbuffer_reset().
This will let us access screen->kernel_features in the next patch.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-14 16:17:36 -07:00
Kenneth Graunke
2e68c4e454 i965: Prepare INTEL_DEBUG=bat decoding for a separate statebuffer.
We'll need to read from both buffers when decoding state.

This also drops the "failed to map" fallback - it's completely useless
on LLC systems where we write directly to the mapped BO.  It's not that
useful on non-LLC systems either.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-14 16:17:36 -07:00
Kenneth Graunke
e723255901 i965: Split brw_emit_reloc into brw_batch_reloc and brw_state_reloc.
brw_batch_reloc emits a relocation from the batchbuffer to elsewhere.
brw_state_reloc emits a relocation from the statebuffer to elsewhere.

For now, they do the same thing, but when we actually split the two
buffers, we'll change brw_state_reloc to use the state buffer.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-14 16:17:36 -07:00
Kenneth Graunke
1674a0bcbc i965: Refactor relocs into a brw_reloc_list structure.
I'm planning on splitting batch and state into separate buffers, at
which point we'll need two relocation lists.  In preparation for that,
this patch refactors the relocation stuff into a structure we can
replicate...which looks a lot like anv_reloc_list.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-09-14 16:17:36 -07:00
Kenneth Graunke
1bc44e0e7f i965: Move brw_state_batch code to intel_batchbuffer.c
The batch buffer and state buffer code is fairly tied together,
and having it in one .c file will make refactoring easier.

Also, drop some commentary above brw_state_batch.  The "aperture
checking performance hacks" are long since gone, so that paragraph
makes little sense at this point.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-14 16:17:36 -07:00
Kenneth Graunke
3b812e62a1 i965: Drop a useless ret == 0 check.
Prior to the previous patch, we would pwrite the batchbuffer contents,
and wanted to skip the execbuffer if that failed.  Now that we memcpy,
we don't set ret != 0 on failure anymore, so it will always be 0.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-14 16:17:36 -07:00
Kenneth Graunke
717e753912 i965: Use a WC map and memcpy for the batch instead of pwrite.
We'd like to eliminate the malloc'd shadow copy eventually, but there
are still unresolved performance problems.  In the meantime, let's at
least get rid of pwrite.

On Apollolake, improves Synmark OglBatch6 performance by:
1.53581% +/- 0.269589% (n=108).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-14 16:17:36 -07:00
Kenneth Graunke
343aa09a22 i965: Use batch->bo->size in brw_emit_reloc assertion.
This makes the assertion safe against batchbuffers growing.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-14 16:17:36 -07:00
Kenneth Graunke
d124521141 i965: Delete a batch size assertion that isn't very useful.
This assertion prevents you from doing intel_batchbuffer_require_space
with a size so huge it won't fit in the batchbuffer.  This doesn't seem
like a common mistake, and I've never seen the assert to be useful.

Soon, I hope to have batches grow, at which point this won't make sense.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-14 16:17:36 -07:00
Jason Ekstrand
939b53d332 i965/screen: Implement queryDmaBufFormatModifierAttirbs
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-09-14 14:47:42 -07:00
Jason Ekstrand
9c52aef7d7 i965/screen: Report the correct number of image planes
For non-CCS images, we were reporting just one plane even though they
may have multiple in the case of YUV.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-09-14 14:47:40 -07:00
Jason Ekstrand
8824141b8d gbm: Add a gbm_device_get_format_modifier_plane_count function
This allows the user to query the number of planes required by a given
format+modifier combination without having to create a bo or surface.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-09-14 14:47:39 -07:00
Jason Ekstrand
0a25a417ce dri/image: Add a format modifier attributes query
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-09-14 14:47:18 -07:00
Christoph Berliner
7ffd4d2a66 drirc: enable glthread for more games (Civ5, CivBE, Dreamfall, Hitman, SR3)
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-09-14 21:02:36 +02:00
Iago Toral Quiroga
98141366f9 glsl: avoid accessing invalid memory after get_variable_being_redeclared()
After get_variable_being_redeclared() has been called, it is no longer
safe to access the original variable pointer, since its memory might have
been freed.

Since callers of this function should only be accessing the variable pointer
returned by the function, avoid potential bugs by re-assigning the
original variable pointer to the result of the function call,
making it impossible for the remaining code to access an invalid variable
pointer.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-14 11:23:26 +02:00
Iago Toral Quiroga
a7017746d7 glsl: make the redeclared variable NULL if it is deleted
get_variable_being_redeclared() can delete the original variable
in a specific scenario. The code sets it to NULL after this so other
code in that same function doesn't try to access trashed memory after
the fact, however, the copy of that variable in the caller code
won't see any of this making it very easy to overlook.

Make the function a bit safer by taking a pointer to the original
variable so we can also make NULL the caller's pointer to the variable
if this function deletes it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-14 11:23:26 +02:00
Iago Toral Quiroga
4af156224e glsl: use 'declared_var' instead of 'var' after checking redeclarations
Since the original 'var' might have been deleted from this point forward.

Bugzila: https://bugs.freedesktop.org/show_bug.cgi?id=102685
Fixes: 51bf007d2c (glsl: Disallow unsized array of atomic_uint)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-14 11:23:26 +02:00
Eric Engestrom
412ab3f6fd dri/radeon: use ARRAY_SIZE macro
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-09-14 09:56:00 +01:00
Samuel Pitoiset
49c72d84c2 radv: dump the list of enabled options when a hang occured
Useful to know which debug/perftest options were enabled when
a hang report is generated.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
302e34d24b radv: dump last 60 lines of dmesg when a hang occured
Copied from dd_dump_dmesg().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
26bc664ca0 radv: dump descriptors when a hang occured
Might be useful for checking if all descriptors are sets by
the application.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
b3c8de1c55 radv: save all descriptor pointers into the trace BO
To dump them when a hang is detected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
d7f2430703 radv: dump annotated shaders using UMR
This might be very useful in order to figure out where a shader
is stucked. This uses UMR to detect which instruction is executing
bad things.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
f0d09d9012 radeonsi: move si_get_wave_info() to AMD common code
This will allow us to use it from radv.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
8181427b14 radv: dump some status MMIO registers when a hang occured
Might report some useful information to help figuring out where
does the hang happened.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
140621f7c4 radv/winsys: add a read_registers() callback
To dump some status MMIO registers when a hang is detected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
6d957a86ff radv: dump shader stats when a hang occured
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
80b8d9f7e7 radv: add radv_shader_dump_stats() helper
To dump the shader stats when a hang is detected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
d28cbf6f9e radv: dump the active shaders when a hang occured
Only the disassembly is currently dumped.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
e2e72477c0 radv: add debug flags for syncing shaders after every draw call
To improve GPU hangs detection when shaders are stucked.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
061f5b7d73 radv: add radv_cmd_buffer_after_draw() helper function
To share common code after every draw/compute calls.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
bcf7698211 radv: save the bound pipeline pointers into the trace BO
When a GPU hang is detected in radv_gpu_hang_occured() we know
which command buffer is faulty but the bound pipelines might
have been updated during the execution.

The pointers to the radv_pipeline objects are emitted just
after the second trace ID, that way it would be easy to dump
the active shaders at the moment of the hang.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
3c61c99ed5 radv: add a comment that describes the trace BO layout
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-14 10:37:57 +02:00
Samuel Pitoiset
4224b31bf3 radv: initialize the trace BO to 0
To avoid random initial values.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-09-14 10:37:57 +02:00
Eric Engestrom
396d2dbce4 swr: use ARRAY_SIZE macro
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-09-14 09:36:01 +01:00
Jeremy Huddleston Sequoia
e7ef901650 mesa: Deal with size differences between GLuint and GLhandleARB in GetAttachedObjectsARB
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-09-13 19:48:58 -07:00
Denis Pauk
74d2456491 gallium/{r600, radeonsi}: Fix segfault with color format (v2)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102552

v2: Patch cleanup proposed by Nicolai Hähnle.
    * deleted changes in si_translate_texformat.

Cc: Nicolai Hähnle <nhaehnle@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-09-14 00:59:24 +02:00
Kenneth Graunke
edfd8d42a9 i965: Add an INTEL_DEBUG=submit option for printing batch statistics.
When a batch is submitted, INTEL_DEBUG=bat prints a message indicating
which part of the code triggered the flush, and some statistics about
the batch/state buffer utilization.

It also decodes the batchbuffer in debug builds...which is so much
output that it drowns out the utilization messages, if that's all you
care about.

INTEL_DEBUG=submit now just does the utilization messages.
INTEL_DEBUG=bat continues to do both (as the message is a good indicator
that we're starting decode of a new batch).

v2: Rename from "flush" to "submit" (suggested by Chris) because we
    might want "flush" for PIPE_CONTROL debugging someday.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-13 13:52:38 -07:00
Dave Airlie
64d9bd149a radv/nir: call opt_remove_phis after trivial continues.
With the shaders in the ssao demo, the nir_opt_if wasn't
working properly without this, after this the if gets optimised
so that loop unrolling gets called.

(loop unrolling fails due to instruction count, but at least
it gets to do that.)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-09-13 21:13:03 +01:00
Chad Versace
f9412a4e75 util/build_id: Include <dlfcn.h>
Fix the build for Android Nougat.

The dladdr(3) manpage says that <dlfcn.h> is required. On Linux, the
build succeeded without it because build_id.c includes <link.h> which
includes <dlfcn.h>. On Android, we must include <dlfcn.h> directly.

Fixes: 5c98d382 "util: Query build-id by symbol address, not library name"
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-09-13 12:43:42 -07:00
Chad Versace
5c98d3825c util: Query build-id by symbol address, not library name
This patch renames build_id_find_nhdr() to
build_id_find_nhdr_for_addr(), and changes it to never examine the
library name.

Tested on Fedora by confirming that build_id_get_data() returns the same
build-id as the file(1) tool. For BSD, I confirmed that the API used
(dladdr() and struct Dl_info) is documented in FreeBSD's manpages.

This solves two problems:

    - We can now the query the build-id without knowing the installed library's
      filename.

      This matters because Android requires specific filenames for HAL
      modules, such as "/vendor/lib/hw/vulkan.${board}.so". The HAL
      filenames do not follow the Unix convention of "libfoo.so".  In
      other words, the same query code will now work on Linux and Android.

    - Querying the build-id now works correctly when the process
      contains multiple shared objects with the same basename.
      (Admittedly, this is a highly unlikely scenario).

Cc: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-09-13 09:49:27 -07:00
Nicolai Hähnle
c8db134e4d st/glsl_to_tgsi: remove unused code in temprename
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-13 18:28:29 +02:00
Nicolai Hähnle
55ca12be9d st/glsl_to_tgsi: be precise about merging scopes
enclosing_scope already contains enclosing_scope_first_read.
What we really want to check here -- not for correctness, but
for speed -- is whether last_read_scope already contains
enclosing_scope.

Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-13 18:28:11 +02:00
Nicolai Hähnle
cffc0ae0d9 ac/surface: match Z and stencil tile config
Fixes various piglit tests on Stoney, see the comment.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:27:01 +02:00
Nicolai Hähnle
481df8032b ac/surface: sanity-check that we got a TC-compatible HTILE if requested
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:26:59 +02:00
Nicolai Hähnle
b2b0702868 ac/addrlib: enable assertions in debug builds
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:26:56 +02:00
Nicolai Hähnle
113ecc2bfa ac/addrlib: relax an assertion
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:26:54 +02:00
Nicolai Hähnle
b0ee0e0860 ac/addrlib: relax an assertion
This assertion is triggered on Stoney in Piglit
./bin/framebuffer-blit-levels {draw,read} stencil -auto -fbo
and similar tests. It should be harmless -- just relax it until
we can get internal clarification.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:26:51 +02:00
Nicolai Hähnle
e4af4433fc radeonsi: hard-code pixel center for interpolateAtSample without multisample buffers
The GLSL rules for interpolateAtSample are unfortunate:

   "Returns the value of the input interpolant variable at
    the location of sample number sample. If
    multisample buffers are not available, the input
    variable will be evaluated at the center of the pixel.
    If sample sample does not exist, the position used to
    interpolate the input variable is undefined."

This fix will fallback to monolithic shader compilation when
interpolateAtSample is used without multisampling.

One alternative would be to always upload 16 sample positions,
filling the buffer up with repetition when the actual number of
samples is less, and then ANDing the sample ID with 0xf. However,
that punishes all well-behaving users of interpolateAtSample,
when in reality, only conformance tests should be affected by
the issue.

Fixes
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-13 18:25:45 +02:00