Commit graph

30386 commits

Author SHA1 Message Date
Ilia Mirkin
0e9232dbcc nv50,nvc0: enable TEX_LZ and TXF_LZ
There should be minimal gain, if any, for nvc0, but nv50 may end up
noticing more often that the lod argument is uniform. This, in turn,
will remove the need for some unnecessary transformations, which were
being hit due to the checks being done pre-ssa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-18 20:37:52 -04:00
Karol Herbst
09f16de7e6 nvc0/ir: treat FMA like MAD for operand propagation
Helps mainly Feral-ported games, due to their use of fma()

shader-db changes:
total instructions in shared programs : 3901147 -> 3842505 (-1.50%)
total gprs used in shared programs    : 471258 -> 467359 (-0.83%)
total local used in shared programs   : 27405 -> 27361 (-0.16%)
total bytes used in shared programs   : 35749888 -> 35214176 (-1.50%)

                local        gpr       inst      bytes
    helped          17        1829        4091        4091
      hurt           4          44           3           3

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-03-18 20:15:45 -04:00
Timothy Arceri
9e42b93f33 st/dri: wait for thread to finish before unbinding context
Fixes a bunch of piglit crashes that hit an assert() when trying
to delete the framebuffer. The assert() was triggered because
WinSysDrawBuffer was set to NULL before glDeleteFramebuffers()
was called.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-18 14:15:52 +11:00
Marek Olšák
4b064d16e5 gallium/radeon: formalize that create_batch_query doesn't need pipe_context
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
be6173e7d6 gallium/radeon: formalize that create_query doesn't need pipe_context
for threaded gallium

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
04e6977e5d gallium/radeon: reference pipe_resource in pipe_transfer
for threaded gallium

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
03127bb6d5 radeonsi: compile all TGSI compute shaders asynchronously
required by threaded gallium

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
e9c6953ddb radeonsi: require that compiler threads are enabled
threaded gallium can't use pipe_context's LLVM target machine, because
create_shader_selector can be called from a non-driver thread.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-17 18:30:21 +01:00
Marek Olšák
080f322f06 trace: remove leftover assertions after pipe_resource wrapping removal 2017-03-17 18:30:21 +01:00
Marek Olšák
6c0a28084d gallium/u_upload: make the first persistent mapping unsynchronized
This is simpler for drivers.
2017-03-17 18:30:21 +01:00
Marek Olšák
c83562ccaa gallium: implement the backend of threaded GL dispatch
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-16 14:14:19 +11:00
Roland Scheidegger
e1f9e9bafd gallivm: (trivial) remove duplicated line
pointed out by clang (stored value never read)
2017-03-16 04:03:29 +01:00
Roland Scheidegger
9d104dfd55 draw: (trivial) remove a unnecessary lp_build_alloca()
pointed out by clang (stored value never read)
2017-03-16 04:03:29 +01:00
Ilia Mirkin
e893b3a367 swr: support layer output in geometry shaders
This makes bin/gl-3.2-layered-rendering-gl-layer-render fail only with
2DMS_ARRAY, which is expected given the lackluster MSAA support. However
all the regular types pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-15 21:03:11 -04:00
Francisco Jerez
e6469ec43b gallium/tgsi: Treat UCMP sources as floats to match the GLSL-to-TGSI pass expectations.
Currently the GLSL-to-TGSI translation pass assumes it can use
floating point source modifiers on the UCMP instruction.  See the bug
report linked below for an example where an unrelated change in the
GLSL built-in lowering code for atan2 (e9ffd12827)
caused the generation of floating-point ir_unop_neg instructions
followed by ir_triop_csel, which is translated into UCMP with a negate
modifier on back-ends with native integer support.

Allowing floating-point source modifiers on an integer instruction
seems like rather dubious design for a transport IR, since the same
semantics could be represented as a sequence of MOV+UCMP instructions
instead, but supposedly this matches the expectations of TGSI
back-ends other than tgsi_exec, and the expectations of the DX10 API.
I take no responsibility for future headaches caused by this
inconsistency.

Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by
the above-mentioned glsl front-end commit.  Even though the commit
that triggered the regression doesn't seem to have made it to any
stable branches yet, this might be worth back-porting since I don't
see any reason why the bug couldn't have been reproduced before that
point.

Suggested-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2017-03-15 15:47:14 -07:00
Tim Rowley
a7ce0490e4 swr: validate backend state numAttributes
General protection and prevents us from smashing the stack
on the first clear state validation (a7b8d50bcb).  Fixes crash
using icc.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-15 15:08:59 -05:00
Marek Olšák
0550f3d631 radeonsi: implement TGSI opcodes TEX_LZ and TXF_LZ
This massively decreases VGPR spilling for DiRT Showdown, because we
no longer have to use v4i32 for 2D fetches when level == 0.
We now use v2i32 for those cases.

DiRT Showdown - Spilled VGPRs: -26 (-81%)

This surprisingly doesn't have any useful effect on performance (+ 0.05%).
2017-03-15 18:17:41 +01:00
Marek Olšák
cca0389c72 gallium: add TGSI opcodes TEX_LZ and TXF_LZ
for better code generation in radeonsi
2017-03-15 18:17:41 +01:00
Marek Olšák
bf3cdf0fd3 gallium: add PIPE_CAP_TGSI_TEX_TXF_LZ 2017-03-15 18:17:41 +01:00
Samuel Pitoiset
7751ed39e4 radeonsi: disable sinking common instructions down to the end block
Initially this was a workaround for a bug introduced in LLVM 4.0
in the SimplifyCFG pass that caused image instrinsics to disappear
(because they were badly sunk). Finally, this is a win because it
decreases SGPR spilling and increases the number of waves a bit.

Although, shader-db results are good I think we might want to
remove it in the future once the issue is fixed. For now, enable
it for LLVM >= 4.0.

This also fixes a rendering issue with the speedometer in Dirt Rally.

More information can be found here https://reviews.llvm.org/D26348.

Thanks to Dave Airlie for the patch.

v2: - add a FIXME comment
    - use if (HAVE_LLVM >= 0x0400) instead

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-15 14:24:40 +01:00
Samuel Pitoiset
74265fd03c tgsi: add missing compute shader entry in tgsi_get_processor_name()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-15 14:16:29 +01:00
Samuel Pitoiset
38ee3246d2 radeonsi: clean up tex_fetch_ptrs()
Will also help when the src sampler register will be
TGSI_FILE_CONSTANT for bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-15 14:16:26 +01:00
Emil Velikov
858170e8a4 winsys/amdgpu: use drmGetDevice2 API
Analogous to previous commit

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98502
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
2017-03-15 11:37:58 +00:00
Dave Airlie
686d060458 r600: refactor binding code for attach buffer to CB.
This refactors out the code and fixes it up to be used
for images later. It uses the code in the current RAT binding
for compute.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:33:26 +10:00
Dave Airlie
222e42e45f r600: refactor out CB setup.
This moves the code to create CB info out into
a separate function so it can be reused in images
code to create RATs.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:33:23 +10:00
Dave Airlie
0cf717821e r600: refactor texture resource words setup code.
This refactors out the code to setup a texture resource
so we can reuse it later from the images code.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:33:06 +10:00
Dave Airlie
95a976b651 r600: factor out the code to initialise a buffer resource.
This takes the code required to initialise a buffer resource
out of the texture buffer code, into it's own function.

This is going to be used for the image support later.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:32:48 +10:00
Dave Airlie
cf2af021b9 r600g: make framebuffer atom rely on dual src blend state.
In order to make ARB_shader_image_load_store, we have to share
the CB space with RATs, so we should only steal the dual src
space if we have dual src enabled.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-15 14:32:44 +10:00
Jason Ekstrand
762a6333f2 nir: Rework conversion opcodes
The NIR story on conversion opcodes is a mess.  We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random.  This commit re-organizes things and makes them all
consistent:

 - All non-bool conversion opcodes now have the explicit size in the
   destination and are named <src_type>2<dst_type><size>.

 - Integer <-> integer conversion opcodes now only come in i2i and u2u
   forms (i2u and u2i have been removed) since the only difference
   between the different integer conversions is whether or not they
   sign-extend when up-converting.

 - Boolean conversion opcodes all have the explicit size on the bool and
   are named <src_type>2<dst_type>.

Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako.  This will make adding
int8, int16, and float16 versions much easier when the time comes.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-14 07:36:40 -07:00
Marek Olšák
cdbe4990cd gallium/radeon: disable the shader cache if dumping shaders
otherwise, cached shaders aren't dumped.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-13 23:34:52 +01:00
Marek Olšák
71a2e4e945 radeonsi: mark all bound shader buffer ranges as initialized
This should prevent cases when a buffer was incorrectly mapped without
synchronization just because this wasn't done.

Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-13 23:34:52 +01:00
Julien Isorce
9df3f28a8b gallium/hud: check NULL return from u_upload_alloc
Fixes the following segmentation fault:

signal SIGSEGV: invalid address (fault address: 0x0)
 frame #0: 0x00007fffe718e117 radeonsi_dri.so hud_draw_background_quad hud_context.c:170
   167
   168 	   assert(hud->bg.num_vertices + 4 <= hud->bg.max_num_vertices);
   169
-> 170 	   vertices[num++] = (float) x1;
   171 	   vertices[num++] = (float) y1;
   172
   173 	   vertices[num++] = (float) x1;
(lldb) bt
  * frame #0: 0x00007fffe718e117 radeonsi_dri.so`hud_draw_background_quad
    frame #1: 0x00007fffe718f458 radeonsi_dri.so`hud_draw
    frame #2: 0x00007fffe712967f radeonsi_dri.so`dri_flush

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-13 17:20:21 +01:00
Julien Isorce
d08c0930af winsys/radeon: check null return from radeon_cs_create_fence in cs_flush
Follow-up of patch:
"radeon_cs_create_fence: check null return from radeon_winsys_bo_create"

radeon_drm_cs_flush
  radeon_cs_create_fence
    radeon_winsys_bo_create

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-13 17:19:29 +01:00
Julien Isorce
d09edb0146 winsys/radeon: check null in radeon_cs_create_fence
Fixes the following segmentation fault:

radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
  -> if (!bo->handle)
(gdb) bt
0  radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
1  0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c
2  0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-13 17:17:30 +01:00
Rob Clark
f805593b12 freedreno/ir3: fragz cannot be half precision
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-03-13 10:33:07 -04:00
Rob Clark
b1df639db6 freedreno/ir3: optimize less in glsl
Rely on nir for optimization, to reduce compile times.  Very minimal impact
on shader-db:

  total instructions in shared programs:          104170 -> 104199 (0.03%)
  total dwords in shared programs:                209664 -> 209728 (0.03%)
  total full registers used in shared programs:   7156 -> 7161 (0.07%)
  total half registers used in shader programs:   109 -> 109 (0.00%)
  total const registers used in shared programs:  24222 -> 24224 (0.01%)

                   half       full      const      instr     dwords
      helped          12         107         103         112          98
        hurt          11         104         105         115         102

But shader db runtime dropped from ~29.3s user to ~20.4s user.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-03-13 10:33:07 -04:00
Christian König
8dee325752 svga: handle P016 format as well
Fixes: 62cff79378 ("gallium: add P016 format")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100180
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-13 12:49:41 +01:00
Christian König
5369b5a91d st/va: add config support for 10bit decoding v2
Advertise 10bpp support if the driver supports decoding to a P016 surface.

v2: Advertise 10bpp for the decoder as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:44 +01:00
Christian König
e9d3e29bb3 st/va: add support for allocating 10bpp surfaces
We support P010 and P016 as targets for 10bpp video decoding.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:41 +01:00
Christian König
e58a1e8f68 st/va: add support for P010 and P016 formats v3
No hardware I know off can actually support P010 natively. But we can easily
support P016 and as long as nobody decodes anything into the lower 6bits it
doesn't make any difference to P010.

v2: allow P0160 for post processing as well
v3: fix post processing once more

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:38 +01:00
Christian König
f1d1deb015 st/va: clear the video surface on allocation
This makes debugging of decoding problems quite a bit easier.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:35 +01:00
Christian König
1ce68af07b st/va: cleanup error handling in vlVaCreateSurfaces2
No need to have that twice.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:32 +01:00
Christian König
88f3451083 radeon/uvd: enable 10bit HEVC decode v2
Just use whatever the state tracker allocated.

v2: fix msb mode

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:29 +01:00
Christian König
3e1e441aa0 radeon/UVD: fix the decoding target pitch calculation
The firmware expects the value in pixel not bytes. Didn't made a difference
so far because we only used 8bpp surfaces.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:25 +01:00
Christian König
cee591a224 vl/video_buffer: add support for P016
Just simply the description of the planes.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:22 +01:00
Christian König
62cff79378 gallium: add P016 format
Same layout as NV12, but 16bit per channel instead of 8.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2017-03-13 08:51:07 +01:00
Timothy Arceri
ca76a2ba1b gallium/util: replace pipe_thread_setname() with u_thread_setname()
They do the same thing we just moved the function to be
accessible to all of Mesa.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:04 +11:00
Timothy Arceri
14e6b86952 gallium/util: replace pipe_thread_get_time_nano() with u_thread_get_time_nano()
They do the same thing we just moved the function to be
accessible to all of Mesa.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:04 +11:00
Timothy Arceri
f8cc4c25b8 gallium/util: replace pipe_thread_create() with u_thread_create()
They do the same thing we just moved the function to be
accessible to all of Mesa.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:04 +11:00
Timothy Arceri
b822d9dd67 gallium/util: move u_queue.{c,h} to src/util
This will allow us to use it outside of gallium for things like
compressing shader cache entries.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-12 17:49:03 +11:00