Commit graph

76442 commits

Author SHA1 Message Date
Marek Olšák
c8fe3b9dca st/mesa: completely rewrite state atoms
The goal is to do this in st_validate_state:
   while (dirty)
      atoms[u_bit_scan(&dirty)]->update(st);

That implies that atoms can't specify which flags they consume.
There is exactly one ST_NEW_* flag for each atom. (58 flags in total)

There are macros that combine multiple flags into one for easier use.

All _NEW_* flags are translated into ST_NEW_* flags in st_invalidate_state.
st/mesa doesn't keep the _NEW_* flags after that.

torcs is 2% faster between the previous patch and the end of this series.

v2: - add st_atom_list.h to Makefile.sources

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
53bc28920a st/mesa: remove st_tracked_state::name
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Marek Olšák
f2adba4a4c st/mesa: remove atom debugging code
This won't be needed after the rewrite.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-30 15:02:14 +02:00
Kenneth Graunke
ebdc82d065 i965: Fix move_interpolation_to_top() pass.
The pass I introduced in commit a2dc11a781
was entirely broken.  A missing "break" made the load_interpolated_input
case always fall through to "default" and hit a "continue", making it
not actually move any load_interpolated_input intrinsics at all.
It would only move the simple load_barycentric_* intrinsics, which
don't emit any code anyway, making it basically useless.

The initial version I sent of the pass worked, but I apparently
failed to verify that the simplified version in v2 actually worked.

With the obvious fix applied (so we actually tried to move
load_interpolated_input intrinsics), I discovered a second bug: we
weren't moving the offset SSA def to the top, breaking SSA validation.

The new version of the pass actually moves load_interpolated_input
intrinsics and all their dependencies, as intended.

Papers over GPU hangs on Ivybridge and Baytrail caused by the
recent NIR FS input rework by restoring the old behavior.
(I'm not honestly sure why they hang with PLN not at the top.)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97083
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-29 16:05:24 -07:00
Rob Clark
591eeb7d1c freedreno: limit non-user constant buffers to a4xx
Seems to mostly work on a3xx.  Except when it doesn't and kills gpu
quite badly.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-29 14:58:39 -04:00
Jan Ziak
427771d1c7 glsl: fix uninitialized instance variable
Valgrind detected that variable ir_copy_propagation_visitor::killed_all
is uninitialized.

Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-29 14:57:51 -04:00
Rob Herring
a235765d27 virgl: add exported dmabuf to BO hash table
Exported dmabufs can get imported by the same process, but the handle was
not getting added to the hash table on export. Add the handle to the hash
table on export.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-07-29 09:09:56 +10:00
Anuj Phogat
6d958c7c16 anv: Enable per sample shading on gen8+
Vulkan CTS test results on gen9:
./deqp-vk --deqp-case=dEQP-VK.pipeline.multisample.min_sample_shading*
Test run totals:
  Passed:        60/90 (66.7%)
  Failed:        0/90 (0.0%)
  Not supported: 30/90 (33.3%)
  Warnings:      0/90 (0.0%)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-28 13:11:12 -07:00
Anuj Phogat
0f94cdc976 anv/pipeline: Fix setting per sample shading in pixel shader
We should use the persample_dispatch variable in prog_data.

Fixes all (~60) the DEQP sample shading tests. Many tests exited with
VK_ERROR_OUT_OF_DEVICE_MEMORY without this patch.

V2: Use the shader key bits set in brw_compile_fs (Jason)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-07-28 13:11:12 -07:00
Nicolas Boichat
9ee683f877 egl/dri2: Add reference count for dri2_egl_display
android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
eglTerminate, followed by eglReleaseThread. A similar case is
observed in this bug: https://bugs.freedesktop.org/show_bug.cgi?id=69622,
where the test calls eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).

With the current code, dri2_dpy structure is freed on eglTerminate
call, so the display is not initialized when eglReleaseThread calls
MakeCurrent with NULL parameters, to unbind the context, which
causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
either in glFlush or in a latter call.

eglTerminate specifies that "If contexts or surfaces associated
with display is current to any thread, they are not released until
they are no longer current as a result of eglMakeCurrent."

However, to properly free the current context/surface (i.e., call
glFlush, unbindContext, driDestroyContext), we still need the
display vtbl (and possibly an active dri dpy connection). Therefore,
we add some reference counter to dri2_egl_display, to make sure
the structure is kept allocated as long as it is required.

One drawback of this is that eglInitialize may not completely reinitialize
the display (if eglTerminate was called with a current context), however,
this seems to meet the EGL spec quite well, and does not permanently
leak any context/display even for incorrectly written apps.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-28 14:08:25 +01:00
Emil Velikov
8431c0e9d4 vc4: automake: remove vc4_drm.h from the sources lists
The file was removed with earlier commit breaking 'make dist'.
Drop it from Makefile.sources since it's no longer around.

Fixes: 16985eb308 ("vc4: Switch to using the libdrm-provided
vc4_drm.h.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-07-28 14:08:24 +01:00
Nicolai Hähnle
bade0cd0fb ddebug: use pclose to close a popen()'d FILE
Found by Coverity.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-28 10:47:51 +01:00
Nicolai Hähnle
21556d86fc glsl: fix optimization of discard nested multiple levels
The order of optimizations can lead to the conditional discard optimization
being applied twice to the same discard statement. In this case, we must
ensure that both conditions are applied.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96762
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-07-28 10:47:04 +01:00
Nicolai Hähnle
185b0c15ab st_glsl_to_tgsi: only skip over slots of an input array that are present
When an application declares varying arrays but does not actually do any
indirect indexing, some array indices may end up unused in the consuming
shader, so the number of input slots that correspond to the array ends
up less than the array_size.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-28 10:46:02 +01:00
Dieter Nützel
041b330a32 clover: make GCC 4.8 happy
Without this GCC 4.8.x throws below error:

error: invalid initialization of non-const reference of type
'clover::llvm::compat::raw_ostream_to_emit_file {aka llvm::raw_svector_ostream&}'
from an rvalue of type '<brace-enclosed initializer list>'

v2: change commit title and add error message like Eric Engestrom requested

Signed-off-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97019
[ Francisco Jerez: Trivial formatting fix. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-07-27 20:41:05 -07:00
Timothy Arceri
a86aa87342 i965: remove unnecessary null check
We would have hit a segfault already if this could be null.

Fixes Coverity warning spotted by Matt.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-28 11:05:57 +10:00
Timothy Arceri
29d70cc964 glsl: free hash tables earlier
These are only used by get_matching_input() which has been call
at this point so free the hash tables.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-07-28 08:05:04 +10:00
Samuel Pitoiset
af08cfc626 nvc0: enable ARB_tessellation_shader on GM107+
This exposes OpenGL 4.1 on Maxwell (tested on GM107 and GM206).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-27 23:19:07 +02:00
Samuel Pitoiset
3ac373df6e gm107/ir: add a legalize SSA pass for PFETCH
PFETCH, actually ISBERD on GM107+ ISA only accepts a GPR for src0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-27 23:18:58 +02:00
Samuel Pitoiset
653af07119 nvc0: fix up TCP header on GM107+
The number of outputs patch (limited to 255) has moved in the TCP
header, but blob seems to also set the old position. Also, the high
8-bits are now located inbetween the min/max parallel output read
address at position 20.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-07-27 23:18:41 +02:00
Mathias Fröhlich
2060f19b4f vbo: Fix handling of POS/GENERIC0 attributes.
In case of split primitives we need to restore
the original setting of the vtx.attrsz array to make
immediate mode attribute array tracking work.

v2: Use bool instead of boolean.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96950
2016-07-27 06:43:03 +02:00
Marek Olšák
c98c732158 radeon/llvm: Use alloca instructions for larger arrays [revert a revert]
This reverts commit f84e9d749f.

Bioshock Infinite no longer hangs.
2016-07-26 23:31:56 +02:00
Marek Olšák
8636a718b5 r600g: add support for B5G6R5 PBO uploads via texture buffers (v2)
v2: set endian swap to 16

untested
2016-07-26 23:21:45 +02:00
Marek Olšák
1e5f00f9d5 radeonsi: pre-generate shader logs for ddebug
This cuts down the overhead of si_dump_shader when ddebug is capturing
shader logs, which is done for every draw call unconditionally (that's
quite a lot of work for a draw call).

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
18475aab6d radeonsi: add empty lines after shader stats
to separate individual shaders dumped consecutively.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
dd66f9d3e7 radeonsi: move the shader key dumping to si_shader_dump
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
b47727a83a ddebug: implement pipelined hang detection mode
For good performance while being able to generate decent hang reports.
The report doesn't contain the parsed IB and the buffer list, but it
isolates the draw call and dumps shaders while not having to flush
the context.

This is for GPU hangs that are harder to reproduce and require interactive
playing for minutes or even hours.

dd_pipe.h explains some implementation details. Initializing, copying
(recording) and clearing states is most of the code.

The performance should be at least 50% of the normal performance depending
on the circumstances. (i.e. 50% is expected to be the worst case scenario,
not the best case) The majority of time is spent in
dump_debug_state(PIPE_DUMP_CURRENT_SHADERS) and that's after all
the optimizations in later patches. There is no obvious way to optimize
that further.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
0795a3d54f ddebug: don't save pointers to call parameters
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
e4079677a7 ddebug: move dd_call into dd_pipe.h
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
d50f9e9b04 ddebug: separate draw call dumping logic
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
95c3025a41 ddebug: move all states into a separate structure
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
f7720948cc ddebug: write contents of dmesg into hang reports
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
1f85f17998 ddebug: implement create_batch_query
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
6b9924ccb6 ddebug: don't use abort()
We don't want a core dump.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
26ef8158ac ddebug: make dd_get_file_stream accept the screen only
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
27fa933a71 ddebug: clean up ddebug_screen_create
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Marek Olšák
6bf81de339 gallium: rework flags for pipe_context::dump_debug_state
The pipelined hang detection mode will not want to dump everything.
(and it's also time consuming) It will only dump shaders after a draw call
and then dump the status registers separately if a hang is detected.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-26 23:06:46 +02:00
Rob Herring
9ace2c1355 vc4: add hash table look-up for exported dmabufs
It is necessary to reuse existing BOs when dmabufs are imported. There
are 2 cases that need to be handled. dmabufs can be created/exported and
imported by the same process and can be imported multiple times.
Copying other drivers, add a hash table to track exported BOs so the
BOs get reused.

v2: Whitespace fixup (by anholt)

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-07-26 13:47:50 -07:00
Eric Anholt
ce8504d196 vc4: Disable early Z with computed depth.
We don't tell the hardware whether we're computing depth, so we need
to manage early Z state manually.  Fixes piglit early-z.
2016-07-26 13:47:50 -07:00
Eric Anholt
4d0b2c7aaa ttn: Update shader->info as we generate code.
We could use the nir_shader_gather_info() pass to update it after the
fact, but this is what glsl_to_nir and prog_to_nir do.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
2016-07-26 13:47:50 -07:00
Vedran Miletić
7b9a0f4e38 mesa: standardize naming Mesa3D, MESA -> Mesa
Signed-off-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-26 13:28:01 -07:00
Kenneth Graunke
95c48391ee mesa: Make MESA_SHADER_CAPTURE_PATH skip shaders with Name == -1.
Shaders with shProg->Name == ~0 (aka 4294967295) are internal meta
shaders that we don't really want to capture.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-07-26 13:27:09 -07:00
Matt Turner
20553e4a2d mesa: Use AC_HEADER_MAJOR to include correct header for major().
Gentoo has been smoke testing an upcoming change to glibc.

Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=580392
2016-07-26 12:12:41 -07:00
Matt Turner
815135166c glsl: Remove references to tail_pred. 2016-07-26 12:12:27 -07:00
Matt Turner
5ed3299822 glx: Avoid aliasing violations.
Compilers are perfectly capable of generating efficient code for calls
like these to memcpy().

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
2a1d2874f1 mesa: Avoid aliasing violation in uniform_query.cpp.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
f5ac1d366e mesa: Avoid aliasing violation in FXT1.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
a1e9b72102 swrast: Avoid aliasing violation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
149309a424 glsl: Avoid aliasing violations.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00
Matt Turner
d1f6f65697 glsl: Separate overlapping sentinel nodes in exec_list.
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.

No difference in OglBatch7 (n=20).

Co-authored-by: Davin McCall <davmac@davmac.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-07-26 12:12:27 -07:00