Commit graph

85652 commits

Author SHA1 Message Date
Jason Ekstrand
5c5a9b7bf6 brw/device_info: Add a helper for getting a device name
This is needed by the Vulkan driver

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-06 18:08:56 -07:00
Jason Ekstrand
a241ab43b5 i965/fs_surface_builder: Mask signed integers after conversion
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-04-06 18:08:56 -07:00
Jason Ekstrand
3921b64e63 i965/fs: Make the repclear shader support either a uniform or a flat input
In the Vulkan driver we use a single flat input instead of a uniform
because setting up push constants is more disruptive to the pipeline than
setting up another vertex input.  This uses the number of uniforms as a key
to keep it working for the GL driver.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-06 18:08:50 -07:00
Jason Ekstrand
061969f9dd i965: Move get_hw_prim_for_gl_prim to brw_util.c
It's used by brw_compile_gs in brw_vec4_gs_visitor.cpp so it needs to be in
a file that's linked into libi965_compiler.la.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-06 18:08:47 -07:00
Bas Nieuwenhuizen
3393358115 radeonsi: set shader calling conventions
Note that old mesa + new LLVM or new mesa + old LLVM breaks
with this change and the corresponding LLVM change (D18559).

For LLVM version <= 3.8 we use the old method, but we can't detect
people using a post 3.8 svn version that is still too old.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-06 21:54:35 +02:00
Marek Olšák
0293d72fa5 drirc: add a workaround for blackness in Warsow
Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
2016-04-06 12:53:40 +02:00
Ilia Mirkin
2e123e1a25 glsl: use has_shader_storage_buffer_objects helper
Replaces open-coded logic with existing helper.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-04-05 20:27:32 -04:00
Timothy Arceri
5d39f03806 glsl: remove remaining tabs in link_uniform_blocks.cpp
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-06 09:56:33 +10:00
Timothy Arceri
7ef57aa685 mesa: remove unused IsShaderStorage field
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-06 09:56:28 +10:00
Timothy Arceri
f1293b2f9b glsl: fully split apart buffer block arrays
With this change we create the UBO and SSBO arrays separately from the
beginning rather than putting them into a combined array and splitting
it apart later.

A bug is with UBO and SSBO stage reference querying is also fixed as
we now use the block index to lookup the references in the separate arrays
not the combined buffer block array.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-04-06 09:56:24 +10:00
Rob Clark
506b561ba7 freedreno/ir3: insert extra move into phi
We had an implicit assumption that the phi src was assigned in it's
source (pred) block leading into the phi.  But this is not true with
NIR, so we can't just ignore the source block specified in the
nir_phi_src.  Insert an extra mov in the source block.  If it is not
required the CP pass will take it back out again.

Fixes:

  ./tests/spec/glsl-1.10/execution/vs-call-in-nested-loop.shader_test
  ./tests/spec/glsl-1.10/execution/vs-inner-loop-modifies-outer-loop-var.shader_test

and probably others.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-05 15:04:43 -04:00
Rob Clark
f9cdbf4405 freedreno/ir3: eliminate unnecessary absneg's
The frontend inserts (abs) and (neg)'s to convert between NIR boolean
(~0/0) and native boolean (1/0).  So we'd end up with things like:

   cmps.s.ge r1.x, ...
   absneg.s r1.x, (neg)r1.x
   absneg.s r1.x, (abs)r1.x
   sel.b32 r2.x, r0.x, r1.x, r0.y

The (neg) already gets collapsed due to the following (abs).  Now by
realizing that r1.x comes from a cmps.s instruction, we can drop the
(abs) as well.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-05 15:04:25 -04:00
Michel Dänzer
0daab9878d clover: Fix build against clang SVN >= r265359
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-04-05 17:00:58 +00:00
Bas Nieuwenhuizen
799789ba99 radeonsi: use bounded indexing for samplers
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-05 19:19:18 +02:00
Bas Nieuwenhuizen
713353db18 radeonsi: use bounded indexing for constant buffers
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-04-05 19:19:07 +02:00
Marek Olšák
a64dbdf612 gallium/radeon: allow multiple exports of the same texture with different usage
Instead of failing an assertion, disable DCC and CMASK on the first export
that needs it, and merge the external usage flags.

v2: clear the EXPLICIT_FLUSH flag if it's not set; whitespace fixes

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2016-04-05 15:32:40 +02:00
Marek Olšák
25f96d2b97 docs/relnotes: document EGL_KHR_reusable_sync 2016-04-05 15:32:40 +02:00
Dongwon Kim
70299474f5 egl: add EGL_KHR_reusable_sync to egl_dri
This patch enables an EGL extension, EGL_KHR_reusable_sync.
This new extension basically provides a way for multiple APIs or
threads to be excuted synchronously via a "reusable sync"
primitive shared by those threads/API calls.

This was implemented based on the specification at

https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_reusable_sync.txt

v2
- use thread functions defined in C11/threads.h instead of
  using direct pthread calls
- make the timeout set with reference to CLOCK_MONOTONIC
- cleaned up the way expiration time is calculated
- (bug fix) in dri2_client_wait_sync, case EGL_SYNC_CL_EVENT_KHR
  has been added.
- (bug fix) in dri2_destroy_sync, return from cond_broadcast
  call is now stored in 'err' intead of 'ret' to prevent 'ret'
  from being reset to 'EGL_FALSE' even in successful case
- corrected minor syntax problems

v3
- dri2_egl_unref_sync now became 'void' type. No more error check
  is needed for this function call as a result.
- (bug fix) resolved issue with duplicated unlocking of display in
  eglClientWaitSync when type of sync is "EGL_KHR_REUSABLE_SYNC"

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-04-05 15:24:57 +02:00
Rob Clark
3e13572826 freedreno/ir3: deal with duplicate phi sources
Otherwise we end up with funny things like:

  mov.f32f32 r0.x, r1.y
  mov.f32f32 r0.x, r1.y

(It doesn't happen as much after fixing the problem w/ CP into phi src,
but it can still happen since we aren't too clever about generating phi
sources in the first place.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
f8feb97ba5 freedreno/ir3: fix silly brain-fart in RA
We want to consider all the vars, not 1/32nd of them, when extending
live-ranges.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
8e451c2d06 freedreno/ir3: don't cp into phi's
The block defining a phi source might not have been executed.  If we
allow copy propagation, we could end up pointing to a src instruction in
the wrong block.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
383b6e87f9 freedreno/ir3: we can't store immediate values
Fixes some transform-feedback piglits, like:

bin/ext_transform_feedback-nonflat-integral

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
d47fb856af freedreno/ir3: add dumping for use/def/live-in/live-out
Turned out to be useful to debug an issue in RA.  Let's keep it.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
38ae05a340 freedreno/ir3: drop unused instr category arg
No longer used, so drop the extra arg to ir3_instr_create()

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
19739e4fb9 freedreno/ir3: remove ir3_instruction::category
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Rob Clark
70735643f4 freedreno/ir3: encode instruction category in opc_t
Been on my TODO list for a while.  If nothing else this will make gdb
properly grok the opc_t enum.

This first step preserves ir3_instruction::category (with an added
assert that category matches what is encoded in opc_t).  Next step is
to drop the category field (and arg to ir3_instr_create()), but that
is split into next commit for bisectability and so that we can run
piglit in the intermediate state to flush out any problems.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-04-04 20:18:18 -04:00
Jason Ekstrand
5ea3647f89 i965/fs: Move the code for load/store_shared to emit_cs_intrinsic
They are compute-shader only and that's where the code for doing atomics on
shared variables lives so it seemes to make sense.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-04-04 15:56:50 -07:00
Jason Ekstrand
80c72a8ea7 i965/nir: Provide a default LOD for buffer textures
Our hardware requires an LOD for all texelFetch commands even if they are
on buffer textures.  GLSL IR gives us an LOD of 0 in that case, but the LOD
is really rather meaningless.  This commit allows other NIR producers to be
more lazy and not provide one at all.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-04-04 15:56:39 -07:00
Jason Ekstrand
e5c833db5a i965/compiler: Remove a redundant declaration of brw_compiler_create 2016-04-04 14:51:35 -07:00
Kenneth Graunke
3babb7b0a4 nir: Use PRIi64 and PRIu64 instead of %ld and %lu.
%ld and %lu aren't the right format specifiers for int64_t and uint64_t
on 32-bit (x86) systems.  They're %zu on Linux and %Iu on Windows.

Use the standard C99 macros in hopes that they work everywhere.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2016-04-04 14:38:48 -07:00
Kenneth Graunke
da5d08707b i965: Fix invalid pointer read in dead_control_flow_eliminate().
There may not be a previous block.  In this case, there's no real work
to do, so just continue on to the next one.

v2: Update for bblock->prev() API change.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-04 14:34:40 -07:00
Kenneth Graunke
9486614938 i965: Make bblock_t::next and friends return NULL at sentinels.
The bblock_t::prev/prev_const/next/next_const API returns bblock_t
pointers, rather than exec_nodes.  So it's a bit surprising.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-04-04 14:34:16 -07:00
Kenneth Graunke
5509d43a11 glsl: Lower variable indexing of system value arrays unconditionally.
lower_variable_index_to_cond_assign() did not handle system values.
gl_SampleMaskIn[] is a system value, and also an array.  Accessing it
with a variable index would trigger an unreachable() assert.

Rather than adding a new EmitNoIndirectSystemValues flag, we simply
lower unconditionally.  There is exactly one case where this occurs,
and for all current drivers, lowering produces optimal code.  Even
for future drivers with 32x MSAA, it produces reasonable code.

Fixes Piglit's new samplemaskin-indirect test.  Also fixes many ES31-CTS
tests when OES_sample_variables is enabled.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-04 14:29:21 -07:00
Jason Ekstrand
db35a851ad i965/defines: Unconditionally define primitives 2016-04-04 14:25:36 -07:00
Jason Ekstrand
6a04968784 Merge remote-tracking branch 'public/master' into vulkan 2016-04-04 13:58:05 -07:00
Jason Ekstrand
88ef2476dc i965/peephole_ffma: Only match a mul+add if none of the ops are exact
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-04 13:48:10 -07:00
Jason Ekstrand
eb93d6dec8 nir/search: Don't match inexact expressions with exact subexpressions
In the first pass of implementing exact handling, I made a mistake with
search-and-replace.  In particular, we only reallly handled exact/inexact
on the root of the tree.  Instead, we need to check every node in the tree
for an exact/inexact match.  As an example of this, consider the following
GLSL code

precise float a = b + c;
if (a < 0) {
   do_stuff();
}

In that case, only the add will be declared "exact" and an expression that
looks for "b + c < 0" will still match and replace it with "b < -c" which
may yield different results.  The solution is to simply bail if any of the
values are exact when matching an inexact expression.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-04 13:48:10 -07:00
Jason Ekstrand
fe247bbe92 nir: Stop double-printing function arguments 2016-04-04 12:10:20 -07:00
Jason Ekstrand
cb317b8d07 glsl: Stop force-enabling compute shaders
This isn't needed since we no longer use the GLSL compiler in Vulkan.
2016-04-04 12:09:12 -07:00
Jason Ekstrand
4d040a4ad3 glsl/standalone: Get rid of the unneeded _mesa_error_no_memory stub
This hasn't been needed since we stopped using the GLSL compiler in the
Vulkan driver and it was tripping up scons.  Removing it fixes the scons
build.
2016-04-04 12:07:51 -07:00
Kenneth Graunke
65fbc43d54 i965: Add an INTEL_PRECISE_TRIG=1 option to fix SIN/COS output range.
The SIN and COS instructions on Intel hardware can produce values
slightly outside of the [-1.0, 1.0] range for a small set of values.
Obviously, this can break everyone's expectations about trig functions.

According to an internal presentation, the COS instruction can produce
a value up to 1.000027 for inputs in the range (0.08296, 0.09888).  One
suggested workaround is to multiply by 0.99997, scaling down the
amplitude slightly.  Apparently this also minimizes the error function,
reducing the maximum error from 0.00006 to about 0.00003.

When enabled, fixes 16 dEQP precision tests

   dEQP-GLES31.functional.shaders.builtin_functions.precision.
   {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.

at the cost of making every sin and cos call more expensive (about
twice the number of cycles on recent hardware).  Enabling this
option has been shown to reduce GPUTest Volplosion performance by
about 10%.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-04-04 11:35:16 -07:00
Jason Ekstrand
8c8157bf6f Remove more spirv2nir remnants 2016-04-04 11:24:48 -07:00
Kenneth Graunke
3aa51e02d6 i965: Allow 8x MSAA on >= 64bpp formats on Gen8+.
See commit 3b0279a69 - this restriction is documented in the "Surface
Format" field of RENDER_SURFACE_STATE.

Looking at newer documentation, this restriction appears to exist on
Haswell, but no longer applies on Gen8+.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2016-04-04 10:41:29 -07:00
Brian Paul
1eeec7ec41 docs: remove stray 'TBD' in 11.2.0 relnotes file 2016-04-04 10:33:11 -06:00
Emil Velikov
35132c413c docs: add news item and link release notes for 11.2.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-04-04 12:57:56 +01:00
Emil Velikov
dc4923d41f docs: add sha256 checksums for 11.2.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit e7fb889dcc)
2016-04-04 12:55:55 +01:00
Emil Velikov
7dc11ed0b2 docs: Update 11.2.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ff9ddb9eb1)
2016-04-04 12:55:54 +01:00
Dave Airlie
f9b8b48bed mesa/get: fix MAX_GEOMETRY_SHADER_STORAGE_BLOCKS
this was returning the fragment shader value.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-04 10:52:25 +01:00
Ilia Mirkin
4bc3b1ca48 nvc0: add hardware ETC2 and ASTC support on GK20A and GM107+
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-04 00:32:48 -04:00
Ilia Mirkin
dab40d8083 docs: add note about GL_EXT_base_instance, sort entries
Trivial.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-04-03 21:18:17 -04:00