Commit graph

82384 commits

Author SHA1 Message Date
Ilia Mirkin
3e11656694 gallium: add sufficient draw interface to allow new indirect features
This makes it possible to support indirect multidraws as well as having
the number of such draws to come from a separate GPU resource.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-07 18:38:46 -05:00
Ilia Mirkin
60d0cfd429 vbo: create a new draw function interface for indirect draws
All indirect draws are passed to the new draw function. By default
there's a fallback implementation which pipes it right back to
draw_prims, but eventually both the fallback and draw_prim's support for
indirect drawing should be removed.

This should allow a backend to properly support ARB_multi_draw_indirect
and ARB_indirect_parameters.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 18:38:45 -05:00
Roland Scheidegger
2923c7a0ed llvmpipe: do 64bit plane calculations in the sse path
The sse path was pretty much disabled for practical purposes because the
largest allowed fb size was 128x128. So, adapt it for 64bit plane calculations.
This is actually not that difficult, though a problem is that we can't do
a signed 32x32->64bit mul, only unsigned, so need to fix that up. Overall,
the code still looks reasonable, though it's not like changes there in
setup really make much of a difference in the end...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 00:34:14 +01:00
Roland Scheidegger
fad283ba9e llvmpipe: don't store eo as 64bit int
eo, just like dcdx and dcdy, cannot overflow 32bit.
Store it as unsigned though just in case (it cannot be negative, but
in theory twice as big as dcdx or dcdy so this gives it one more bit).
This doesn't really change anything, albeit it might help minimally on
32bit archs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 00:34:14 +01:00
Roland Scheidegger
b61b9a377e llvmpipe: use aligned data for the assembly program in setup
Back in the day (before 24678700ed) the values
were not actually in a struct but even then I can't see why we didn't simply
align the values. Especially since it's trivial to do so.
(Not that it actually matters since the code is pretty much unused for now.)

Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
2016-01-08 00:34:13 +01:00
Roland Scheidegger
9db7309595 draw: initialize prim header flags when clipping lines
Otherwise, clipped lines would have undefined stippling reset bit if line
stippling is enabled.
(Untested, and I just assume copying over the bits from the original line
is actually the right thing to do.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-08 00:34:13 +01:00
Roland Scheidegger
64da11f052 draw: fix line stippling with unfilled prims
The unfilled stage was not filling in the prim header, and the line stage
then decided to reset the stipple counter or not based on the uninitialized
data. This causes some failures in conform linestipple test (albeit quite
randomly happening depending on environment).
So fill in the prim header in the unfilled stage - I am not entirely sure
if anybody really needs determinant after that stage, but there's at least
later stages (wide line for instance) which copy over the determinant as well.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-08 00:34:13 +01:00
Timothy Arceri
5cf156c6b4 glsl: replace null check with assert
This was added in 54f583a20 since then error handling has improved.

The test this was added to fix now fails earlier since 01822706ec

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-08 09:12:45 +11:00
Nicolai Hähnle
051603efd5 i965: use _mesa_delete_buffer_object
This is more future-proof, plugs the memory leak of Label and properly
destroys the buffer mutex.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 17:07:12 -05:00
Nicolai Hähnle
1b74c02e83 i915: use _mesa_delete_buffer_object
This is more future-proof, plugs the memory leak of Label and properly
destroys the buffer mutex.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 17:07:09 -05:00
Nicolai Hähnle
8882b46226 radeon: use _mesa_delete_buffer_object
This is more future-proof, plugs the memory leak of Label and properly
destroys the buffer mutex.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 17:07:03 -05:00
Nicolai Hähnle
1c2187b1c2 st/mesa: use _mesa_delete_buffer_object
This is more future-proof than the current code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-01-07 17:06:58 -05:00
Nicolai Hähnle
6aed083b93 mesa/bufferobj: make _mesa_delete_buffer_object externally accessible
gl_buffer_object has grown more complicated and requires cleanup. Using this
function from drivers will be more future-proof.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-01-07 17:05:54 -05:00
Chad Versace
4c7f4c25d0 anv/meta: Fix hardcoded format size in anv_CmdCopy*
When looping through VkBufferImageCopy regions, for each region we
incremented the offset into the VkBuffer assuming the format size was 4.

Fixes CTS tests dEQP-VK.pipeline.image.view_type.cube_array.3d.* on
Skylake.
2016-01-07 13:56:58 -08:00
Oded Gabbay
f41b6cfb07 llvmpipe: use sse2 conv code for altivec
In lp_build_conv() and lp_build_conv_auto(), there is a special case of
conversion when sse2 is present. That code path is suitable without any
changes to altivec, because all the functions that are called in that
code path already support altivec.

This patch increase the FPS in POWER arch across the board
between 10%-25%

I checked ipers, glxgears, glxspheres64, openarena, xonotic and glmark2.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-07 22:07:02 +02:00
Chad Versace
a50c78a5cf isl: Add missing break statement in array pitch calculation
Fixes regression in ed98c374bd3f1952fbab3031afaf5ff4d178ef41.
2016-01-07 11:08:12 -08:00
Chad Versace
d1e6c1b29b isl/gen9: Fix array pitch of 3d surfaces
For tiled 3D surfaces, the array pitch must aligned to the tile height.

From the Skylake BSpec >> RENDER_SURFACE_STATE >> Surface QPitch:

   Tile Mode != Linear: This field must be set to an integer multiple of
   the tile height

Fixes CTS tests 'dEQP-VK.pipeline.image.view_type.3d.format.r8g8b8a8_unorm.*'.
Fixes Crucible tests 'func.miptree.r8g8b8a8-unorm.aspect-color.view-3d.*'.
2016-01-07 11:04:17 -08:00
Chad Versace
0af77fe5b6 isl: Refactor func isl_calc_array_pitch_sa_rows
Update the function to calculate the array pitch is *element rows*, and
it rename it accordingly to isl_calc_array_pitch_el_rows.
2016-01-07 11:04:17 -08:00
Jordan Justen
2f0a10149c isl: Assert that alignments are not 0 for isl_align
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-07 10:37:35 -08:00
Jordan Justen
4d68c477ad anv: Assert that alignments are not 0 for align_*
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-07 10:37:35 -08:00
Jordan Justen
be91f23e3b isl: Fix image alignment calculation
The previous code was resulting in an alignment of 0.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-07 10:37:35 -08:00
Marek Olšák
bca18057a3 radeonsi: adjust the parameters of si_shader_dump
The function will be extended to dump all binaries shaders will consist of,
so si_shader* makes sense here.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
0a51b010e5 radeonsi: move si_shader_dump call out of si_compile_llvm
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
b0df5f4c19 radeonsi: inline si_shader_binary_read
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
c9c031f3d0 radeonsi: move si_shader_dump call out of si_shader_binary_read
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
f8b34fe093 radeonsi: separate shader dumping code to si_shader_dump and *_dump_stats
Eventually, I'd like to dump stats for several combined binaries, which is
why you don't see a binary parameter in si_shader_dump_stats

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
ccd7d7e13d radeonsi: add si_shader_destroy_binary
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
5c9f104567 radeonsi: don't pass si_shader to si_compile_llvm
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
54ed83669e radeonsi: move si_shader_binary_upload out of si_compile_llvm
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
f20a76a4fd radeonsi: always keep shader code, rodata, and relocs in memory
We won't compile shaders in draw calls, but we will concatenate shader
binaries according to states in draw calls, so keep the binaries.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
63345cfc3a radeonsi: don't pass si_shader to si_shader_binary_read
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
2d3a96448a radeonsi: don't pass si_shader to si_shader_binary_read_config
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
20b9b5d7f5 radeonsi: add struct si_shader_config
There will be 1 config per variant, which will be a union of configs
from {prolog, main, epilog}. For now, just add the structure.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
890873d106 radeonsi: move NULL exporting into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
a72ed2f6bc radeonsi: move MRT color exporting into a separate function
This will be used by a fragment shader epilog.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
0ffe3d3772 radeonsi: use EXP_NULL for pixel shaders without outputs
This never happens currently.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
677c65968b radeonsi: only use LLVMBuildLoad once when updating color outputs at the end
without LLVMBuildStore.

So:
- do LLVMBuildLoad
- update the values as necessary
- export

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
185267a6fd radeonsi: export "undef" values for undefined PS outputs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
1ce659f820 radeonsi: move MRTZ export into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
5f3e6b5b0f radeonsi: simplify setting the DONE bit for PS exports
First find out what the last export is and simply set the DONE bit there.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
e00f3f23b1 radeonsi: set SPI color formats and CB_SHADER_MASK outside of compilation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
4e597c25c7 radeonsi: write all MRTs only if there is exactly one output
This doesn't fix a known bug, but better safe than sorry.

Also, simplify the expression in si_shader.c.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:06 +01:00
Marek Olšák
746a7a7498 radeonsi: determine SPI_SHADER_Z_FORMAT outside of shader compilation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
2cb8bf90cd radeonsi: determine DB_SHADER_CONTROL outside of shader compilation
because the API pixel shader binary will not emulate alpha test one day,
so the KILL_ENABLE bit must be determined elsewhere.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
ff7e77724e tgsi/scan: set which color components are read by a fragment shader
This will be used by radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
18ec76730a tgsi/scan: fix tgsi_shader_info::reads_z
This has no users in Mesa.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Marek Olšák
f3658be108 tgsi/scan: set if a fragment shader writes sample mask
This will be used by radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-07 18:26:05 +01:00
Kenneth Graunke
3e8f644ed3 glsl: Disallow vectorization of vector_insert/extract.
vector_insert takes a vector, a scalar location, and a scalar value,
and produces a new vector with that component updated.  As such, it
can't be vectorized properly.

vector_extract takes a vector and a scalar location, and returns
that scalar component of the vector.  Vectorization doesn't really
make any sense.

Treating both as horizontal operations makes sure the vectorizer
won't try to touch these.

Found by inspection.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-06 21:22:06 -08:00
Jason Ekstrand
d8cd5e333e anv/state: Pull sampler vk-to-gen maps into genX_state_util.h 2016-01-06 19:53:45 -08:00
Jason Ekstrand
195c60deb4 nir/spirv: Wrap borrow/carry ops in b2i
NIR specifies them as booleans but SPIR-V wants ints.
2016-01-06 17:13:06 -08:00