Commit graph

25705 commits

Author SHA1 Message Date
Marek Olšák
a72ed2f6bc radeonsi: move MRT color exporting into a separate function
This will be used by a fragment shader epilog.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
0ffe3d3772 radeonsi: use EXP_NULL for pixel shaders without outputs
This never happens currently.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
677c65968b radeonsi: only use LLVMBuildLoad once when updating color outputs at the end
without LLVMBuildStore.

So:
- do LLVMBuildLoad
- update the values as necessary
- export

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
185267a6fd radeonsi: export "undef" values for undefined PS outputs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
1ce659f820 radeonsi: move MRTZ export into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
5f3e6b5b0f radeonsi: simplify setting the DONE bit for PS exports
First find out what the last export is and simply set the DONE bit there.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
e00f3f23b1 radeonsi: set SPI color formats and CB_SHADER_MASK outside of compilation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
4e597c25c7 radeonsi: write all MRTs only if there is exactly one output
This doesn't fix a known bug, but better safe than sorry.

Also, simplify the expression in si_shader.c.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
746a7a7498 radeonsi: determine SPI_SHADER_Z_FORMAT outside of shader compilation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
2cb8bf90cd radeonsi: determine DB_SHADER_CONTROL outside of shader compilation
because the API pixel shader binary will not emulate alpha test one day,
so the KILL_ENABLE bit must be determined elsewhere.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
ff7e77724e tgsi/scan: set which color components are read by a fragment shader
This will be used by radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
18ec76730a tgsi/scan: fix tgsi_shader_info::reads_z
This has no users in Mesa.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
f3658be108 tgsi/scan: set if a fragment shader writes sample mask
This will be used by radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Roland Scheidegger
8d4039ecdb softpipe: tell draw about the vertex layout we want
This makes it more similar to llvmpipe. It also allows us to let draw emit
code handle things like getting zeros for non-existing vs outputs
automatically. There probably isn't really any overhead either way, there isn't
really any "simply copy everything" code in the emit path it would copy each
attrib individually just the same. Likewise, we still do another mapping step
in softpipe as the layout may still not match exactly (same as in llvmpipe,
should probably nuke the pointless mapping in both drivers).

This fixes the piglit arb_fragment_layer_viewport no_gs/no_write tests.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 02:00:04 +01:00
Roland Scheidegger
8e3a76791f llvmpipe: use ints not unsigned for slots
They can't actually be 0 (as position is there) but should avoid confusion.

This was supposed to have been done by af7ba989fb
but I accidentally pushed an older version of the patch in the end...
Also prettify slightly. And make some notes about the confusing and useless
fs input "map".

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:59:17 +01:00
Roland Scheidegger
2dbc20e456 draw: nuke the interp parameter from vertex_info
draw emit couldn't care less what the interpolation mode is...
This somehow looked like it would matter, all drivers more or less
dutifully filled that in correctly. But this is only used for emit,
if draw needs to know about interpolation mode (for clipping for instance)
it will get that information from the vs anyway.
softpipe actually used to depend on that interpolation parameter, as it
abused that structure quite a bit but no longer.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:58:05 +01:00
Roland Scheidegger
892e2d1395 softpipe: don't abuse the draw vertex_info struct for something different
softpipe would calculate two "vertex layouts". The second one was however
just used for internal purposes, draw would know nothing about it even though
it looked exactly the same as the other one we tell draw about.
So, store that information separately as this was just confusing.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:57:21 +01:00
Roland Scheidegger
b64d008052 softpipe: fix mapping of "special" vs outputs
Unlike llvmpipe, softpipe always tells draw to emit the vertices as-is.
The two vertex layouts it calculates are a bit confusing, one which is just
used to tell draw to emit vertices as-is, and the other which has draw written
all over it but draw is completely unaware of and is used only to look up the
correct interpolation info later in setup.
Thus, the slots used are different to what llvmpipe does (I'm going to clean
up the confusing two layout stuff).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:56:43 +01:00
Roland Scheidegger
01761a38e8 llvmpipe: scratch some special handling of vp_index/layer
It was actually slightly buggy (missing initialization / setup not dependent
on new vs albeit I didn't see issues), but the case of non-existing attributes
is now handled by draw emit code so don't need that anymore.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:55:45 +01:00
Roland Scheidegger
afa035031f draw: rework handling of non-existing outputs in emit code
Previously the code would just redirect requests for attributes which
don't exist to use output 0. Rework this to output all zeros instead which
seems more useful - in particular some extensions like
ARB_fragment_layer_viewport require 0 in the fs even if it wasn't output by
previous stages. That way, drivers don't have to special case this depending
if the vs/gs outputs some attribute or not.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 01:52:39 +01:00
Sinclair Yeh
0819287f56 svga: Rename SVGA_HINT_FLAG_DRAW_EMITTED
Rename SVGA_HINT_FLAG_DRAW_EMITTED to SVGA_HINT_FLAG_CAN_PRE_FLUSH
because preemptive flush can be unblocked by more commands than
draw.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 16:04:45 -07:00
Sinclair Yeh
9ccc716534 svga: allow preemptive flushing on DMA, update, and readback commands
The existing code effectively turns off preemptive flushing for all
but the regions used for draws.  This turns out to be overly
restrictive as some memory regions, e.g. GMR, may never get a draw
when used as a DMA upload staging area, causing problems for apps
that upload a large amount of textures, e.g. Unigine Heaven.

This patch fixes the Unigine Heaven memory allocation error and
has been verified to not cause a regression in the previous extended
retina display issue.

Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 16:03:33 -07:00
Charmaine Lee
b074a5b02d svga: skip vertex attribute instruction with zero usage_mask
In emit_input_declarations(), we are skipping declarations for those
registers that are not being used. But in emit_vertex_attrib_instructions(),
we are still emitting instructions to tweak the vertex attributes even if
they are not being used. This causes an assert in the backend because an
input register is not declared in the shader. This patch fixes the problem
by skipping the instruction if the vertex attribute is not being used.
Changes in this patch is originated from the code snippet from Jose as
suggested in bug 1530161.

Tested with piglit, Heaven, Turbine, glretrace.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 16:01:38 -07:00
Krzysztof Sobiecki
0d7477a289 gallium/r600: Replace ALIGN_DIVUP with DIV_ROUND_UP
ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality.
Replacing it with DIV_ROUND_UP eliminates this problems.

Signed-off-by: Krzysztof A. Sobiecki <sobkas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-06 16:09:12 -05:00
Eric Anholt
bbd29f1375 vc4: Fix driver build from last minute rebase fix.
I had the driver all tested for the last series, and in my last build I
noticed that get_swizzled_channel was unused now, and removed
it... apparently without testing to find that I removed the wrong channel
swizzle function.
2016-01-06 12:49:45 -08:00
Eric Anholt
25aa436e86 vc4: Optimize out a comparison for bcsel based on an ALU comparison
We routinely have code like:

	vec1 ssa_220 = fge ssa_104, ssa_61
	vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105

and we would compare fge's args and choose between ~0 and 0 to generate
ssa_220, then compare ssa_220 to 0 and choose between bcsel's args.
Instead, try to notice the pattern and compare between fge's args to
select between bcsel's args.

total instructions in shared programs: 88019 -> 87574 (-0.51%)
instructions in affected programs:     9985 -> 9540 (-4.46%)
total estimated cycles in shared programs: 245752 -> 245237 (-0.21%)
estimated cycles in affected programs:     17232 -> 16717 (-2.99%)
2016-01-06 12:43:09 -08:00
Eric Anholt
7a9eb76786 vc4: Add missing sRGB decode to texel fetches.
We only see txf on MSAA textures, currently, and apparently this didn't
impact any of our piglit tests.
2016-01-06 12:43:09 -08:00
Eric Anholt
f01ca9eeda vc4: Add support for GL_ARB_texture_swizzle.
We already had the code supporting it, since it's needed for the depth
mode when doing shadow comparisons.
2016-01-06 12:43:09 -08:00
Eric Anholt
12519a972f vc4: Use NIR texture lowering for texture swizzling.
We can't use its other features currently (mostly because we don't want
Newton-Raphson on rcps for texture coordinates), but it gets us started.

This eliminates some comparisons with constants in GLB2.7 and ETQW traces
at the QIR level by moving the comparisons into NIR, where they get
constant-folded out.

instructions in affected programs:     165 -> 156 (-5.45%)
total uniforms in shared programs: 32087 -> 32085 (-0.01%)
total estimated cycles in shared programs: 245762 -> 245752 (-0.00%)
estimated cycles in affected programs:     461 -> 451 (-2.17%)
2016-01-06 12:43:08 -08:00
Eric Anholt
71db7d3dc5 vc4: Replace the SSA-style SEL operators with conditional MOVs.
I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers.  By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.

total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs:     39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs:     162887 -> 162343 (-0.33%)
2016-01-06 12:39:51 -08:00
Eric Anholt
0a89f307f9 vc4: Don't try the SF coalescing unless it's on a def.
If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.
2016-01-06 12:39:27 -08:00
Edward O'Callaghan
1953cee6d7 gallium/drivers/svga: Use unsigned for loop index
Fix a 's/unsigned int/unsigned/' consistency case while here.

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
8e2a8ec731 gallium/drivers/r600: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
76a7d6f412 gallium/drivers/ilo: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
5071c192cc gallium: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
bfabd5e74a gallium/drivers: Remove unnecessary semicolons
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
67d4b4b28c gallium: Remove unnecessary semicolons
Fix silly issue with MSVC case fall-though support to need
a extra 'break;'

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Oded Gabbay
9d59b9d00c llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8
This patch converts the SSE-optimized lp_rast_triangle_32_3_16()
to VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
 Name            Before     After    Delta
------------------------------------------------
openarena        16.35      16.7     2.14%
xonotic          4.707      4.97     5.57%

glmark2 didn't show a significant (more than 1%) difference.

v2: Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
925c46cfc4 llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8
This patch converts the SSE-optimized build_mask_32() and
build_mask_linear_32() to VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
  Name            Before     After    Delta
------------------------------------------------
glmark2 (score)   139.8      142.7    2.07%

openarena and xonotic didn't show a significant (more than 1%)
difference.

v2: Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
3bbe16ea79 llvmpipe: Optimize do_triangle_ccw for POWER8
This patch converts the SSE optimization done in do_triangle_ccw to
VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
  Name            Before     After    Delta
------------------------------------------------
glmark2 (score)   136.6      139.8    2.34%
openarena         16.14      16.35    1.30%
xonotic           4.655      4.707    1.11%

v2:

- Convert loads to use aligned loads
- Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Oded Gabbay
e99555ef0b llvmpipe: add POWER8 portability file - u_pwr8.h
This file provides a portability layer that will make it easier to convert
SSE-based functions to VMX/VSX-based functions.

All the functions implemented in this file are prefixed using "vec_".
Therefore, when converting from SSE-based function, one needs to simply
replace the "_mm_" prefix of the SSE function being called to "vec_".

Having said that, not all functions could be converted as such, due to the
differences between the architectures. So, when doing such
conversion hurt the performance, I preferred to implement a more ad-hoc
solution. For example, converting the _mm_shuffle_epi32 needed to be done
using ad-hoc masks instead of a generic function.

All the functions in this file support both little-endian and big-endian
but currently the file is build only on POWER8 LE machine.

All of the functions are implemented using the Altivec/VMX intrinsics,
except one where I needed to use inline assembly (due to missing
intrinsic).

v2:
- Use vec_vgbbd instead of __builtin_vec_vgbbd
- Add an aligned load function
- Don't use typeof()
- Make file build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00
Brian Paul
f4caa7d2fc draw: minor indentation fix 2016-01-05 13:03:05 -07:00
Brian Paul
95d412181d util: add debug_dump_ubyte_rgba_bmp()
Like debug_dump_float_rgba_bmp() but takes ubyte values.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
eec8d7e7e0 svga: fix test for SVGA_NEW_STIPPLE
We only want to set the SVGA_NEW_STIPPLE dirty flag when the polygon
stipple state changes.  Before, we only set the flag when we were
enabling stipple, but not disabling.

We don't really have to add SVGA_NEW_STIPPLE to the dirty FS state
set since it's a subset of SVGA_NEW_RAST, but let's be explicit.

This doesn't fix any known bugs.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
993b04ee2c svga: add some comments in svga_state_vs.c
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
fc07658895 svga: change svga_hw_view_state::dirty to boolean
Since it's a true/false value.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
077aa3be93 svga: avoid emitting redundant SetVertexBuffers() commands
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Brian Paul
b11bd20889 svga: check for no-ops in svga_bind_sampler_states()
and svga_set_sampler_views().  If there's no change, return early
and don't set a SVGA_NEW_x dirty state flag.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-01-05 13:03:04 -07:00
Julien Isorce
777d1453f1 build: enable st/va with nouveau driver
vainfo fails in vaDriverInit because "dd_create_screen"
does not reach strcmp(driver_name, "nouveau") code.
Indeed when compiling the va target.c, the macro GALLIUM_NOUVEAU
is not defined.
This patch define the macro the same it is done for dri and
vdpau targets.

Tested with:
./autogen.sh --enable-glx --enable-gles2 --enable-egl --enable-vdpau --enable-glx-tls=yes --enable-va
--with-gallium-drivers=swrast,nouveau --with-dri-drivers=swrast,nouveau --with-egl-platforms=x11

LIBVA_DRIVER_NAME=gallium vainfo
Output:
vainfo: Driver version: mesa gallium vaapi
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple                  :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileMPEG4Simple            :	VAEntrypointVLD
      VAProfileMPEG4AdvancedSimple    :	VAEntrypointVLD
      VAProfileVC1Simple              :	VAEntrypointVLD
      VAProfileVC1Main                :	VAEntrypointVLD
      VAProfileVC1Advanced            :	VAEntrypointVLD
      VAProfileH264Baseline           :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileNone                   :	VAEntrypointVideoProc

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00
Julien Isorce
abb30b9c8b nvc0: add support for st/va
- split nvc0_decoder_bsp in begin/next/end
- preserve content buffer when calling nvc0_decoder_bsp_next
- implement pipe_video_codec::begin_frame/end_frame

https://bugs.freedesktop.org/show_bug.cgi?id=89969

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-05 12:07:53 -05:00