Commit graph

75410 commits

Author SHA1 Message Date
Kenneth Graunke
4a1c8a3037 i965: Push most TES inputs in SIMD8 mode.
Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache.  Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.

However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs.  An arbitrary cut-off of
32 vec4 slots (16 registers) is more than sufficient to ensure that
100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven,
GPUTest/TessMark, and SynMark.

Note that unlike most SIMD8 stages, this actually reads packed vec4
data, since that is what our vec4 TCS programs write.

Improves performance in GPUTest's tessmark_x64 microbenchmark
by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768.

Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark
by 22.74% +/- 0.309394% (n = 5).

Improves performance in Shadow of Mordor at low settings with
tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4).

shader-db statistics for files containing tessellation shaders:

total instructions in shared programs: 184358 -> 181181 (-1.72%)
instructions in affected programs: 27971 -> 24794 (-11.36%)
helped: 226

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-02 18:46:16 -08:00
Kenneth Graunke
b022150d70 i965: Use LOAD_PAYLOAD for SIMD8 TES input loads, not MOV.
We need a MOV to replicate g0.0<0,1,0> to all 8 channels.  Since the
message payload is a single register, MOV seemed more sensible than
LOAD_PAYLOAD.  However, MOV cannot be CSE'd, while LOAD_PAYLOAD can.

All input loads can use the same header - we don't need to re-expand
g0 every time.  CSE accomplishes this, saving instructions.

shader-db statistics for files containing tessellation shaders:

total instructions in shared programs: 186923 -> 184358 (-1.37%)
instructions in affected programs: 30536 -> 27971 (-8.40%)
helped: 226
HURT: 0

total cycles in shared programs: 1009850 -> 1005356 (-0.45%)
cycles in affected programs: 168206 -> 163712 (-2.67%)
helped: 226
HURT: 0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-02 18:46:16 -08:00
Kenneth Graunke
53a9b6223f i965: Move 3-src subnr swizzle handling into the vec4 backend.
While most align16 instructions only support a SubRegNum of 0 or 4
(using swizzling to control the other channels), 3-src instructions
actually support arbitrary SubRegNums.  When the RepCtrl bit is set,
we believe it ignores the swizzle and uses the equivalent of a <0,1,0>
region from the subnr.

In the past, we adopted a vec4-centric approach of specifying subnr of
0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper
SubRegNum.  This isn't a great fit for the scalar backend, where we
don't set swizzles at all, and happily set subnrs in the range [0, 7].

This patch changes brw_eu_emit.c to use subnr and swizzle directly,
relying on the higher levels to set them sensibly.

This should fix problems where scalar sources get copy propagated into
3-src instructions in the FS backend.  I've only observed this with
TES push model inputs, but I suppose it could happen in other cases.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-02 18:46:16 -08:00
Eric Anholt
64253fdb2e vc4: Fix build from upload changes. 2016-01-02 17:33:19 -08:00
Nicolai Hähnle
8f384d07a8 gallium/radeon: send LLVM diagnostics as debug messages
Diagnostics sent during code generation and the every error message reported
by LLVMTargetMachineEmitToMemoryBuffer are disjoint reporting mechanisms. We
take care of both and also send an explicit message indicating failure at the
end, so that log parsers can more easily tell the boundary between shader
compiles.

Removed an fprintf that could never be triggered.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:24 -05:00
Nicolai Hähnle
255ccd1e99 gallium/radeon: pass pipe_debug_callback into radeon_llvm_compile (v2)
This will allow us to send shader debug info via the context's debug callback.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:24 -05:00
Nicolai Hähnle
f8cd11403a radeonsi: send shader info as debug messages in addition to stderr output
The output via stderr is very helpful for ad-hoc debugging tasks, so that remains
unchanged, but having the information available via debug messages as well
will allow the use of parallel shader-db runs.

Shader stats are always provided (if the context is a debug context, that is),
but you still have to enable the appropriate R600_DEBUG flags to get
disassembly (since it is rather spammy and is only generated by LLVM when we
explicitly ask for it).

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:24 -05:00
Nicolai Hähnle
4bb1c8dfec radeonsi: pass pipe_debug_callback down into si_shader_binary_read (v2)
This will allow us to send shader debug info.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:23 -05:00
Nicolai Hähnle
b6847062dd gallium/radeon: implement set_debug_callback
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-02 16:47:23 -05:00
Marek Olšák
ecb2da1559 u_upload_mgr: allow specifying PIPE_USAGE_* for the upload buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:45 +01:00
Marek Olšák
37d0aea772 u_upload_mgr: remove alignment parameter from u_upload_create
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:45 +01:00
Marek Olšák
1bb79c3a7b u_upload_mgr: pass alignment to u_upload_buffer manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák
e0f932846c u_upload_mgr: pass alignment to u_upload_data manually
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák
020009f7cc u_upload_mgr: pass alignment to u_upload_alloc manually
The fixed alignment of u_upload_mgr will go away.
This is the first step.

The motivation is that one u_upload_mgr can have multiple users,
each allocating from the same buffer, but requiring a different alignment.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák
ffc4716e97 u_upload_mgr: rework the application of alignment
The function only aligned the size, but not the offset.
The offset was aligned only when the previous suballocation was aligned.
That yielded the correct offset alignment if the alignment was constant
for all suballocations.

Instead, directly align the offset, but allow an unaligned size.
There is no change in behavior, because the alignment is constant
at the moment.

This a prerequisite for allowing a variable alignment for suballocations.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-02 15:15:44 +01:00
Marek Olšák
36c93a6fae st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels (v2)
Spotted by luck. The GLSL uniform storage is only associated once
in LinkShader and can't be reallocated afterwards, because that would
break the association.

v2: don't remove st_upload_constants calls, clarify why they're needed

Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
2016-01-02 15:15:44 +01:00
Marek Olšák
294ed5cd13 program: add _mesa_reserve_parameter_storage
The next commit will use this.

Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
2016-01-02 15:15:44 +01:00
Jordan Justen
a2942d8f26 mesa: Fix warning with MESA_VERBOSE=api for BindBufferRange
Reported-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-01 17:27:14 -08:00
Ilia Mirkin
c1d14c6817 nv50,nvc0: make sure there's pushbuf space and that we ref the bo early
First off, we can't flush in the middle of a command. Secondly
requesting the extra push space might cause a flush to happen. If that
flush happens, we'd have to do the PUSH_REFN again. So instead do
PUSH_REFN after the push space request. This helps avoid rare crashes
with supertuxkart in libdrm due to assertion failures.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-01-01 19:52:41 -05:00
Ilia Mirkin
33a415310b st/mesa: sort extensions enablement array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-01 19:50:02 -05:00
Rob Clark
816ddee6b8 nir/lower_clip: add missing writemask on store
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-01-01 15:32:46 -05:00
Jordan Justen
3dce7bf268 mesa: Add MESA_VERBOSE=api for GL_ARB_program_interface_query
v2:
 * Add braces '{}' when the _mesa_debug call spans multiple lines (Ken)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-01 12:00:51 -08:00
Jordan Justen
36db91c4c4 mesa: Add MESA_VERBOSE=api for several indexed BindBuffer variants
v2:
 * Add braces '{}' when the _mesa_debug call spans multiple lines (Ken)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-01 12:00:51 -08:00
Dave Airlie
b835255992 st/glsl_to_tgsi: fix block movs for doubles
While playing with fp64, I disable varying packing to debug
something else, and noticed we never emitted half the output
movs for double matrix arrays.

We should be moving the left index two slots for dual
source doubles, and the right index two slots for non-vs
input doubles.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:54 +10:00
Dave Airlie
d214ce86cf st/glsl_to_tgsi: handle different attrib size
vertex inputs are counted differently in some cases, with
vertex inputs we need to make sure we don't double count them.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:54 +10:00
Dave Airlie
dc7b33c1f3 st/glsl_to_tgsi: readd the double_reg2 for input index mapping
Otherwise we end up emitting the wrong index for the second
double.

This fixes dmat-vs-gs-tcs-tes.shader_test and dvec3-vs-gs-tcs-tes.shader_test

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:54 +10:00
Dave Airlie
84dbf3c4ff st/glsl_to_tgsi: when doing reladdr get vec4 of correct type
This fixes fp64 relative addressing, in the upcoming
dmat-vs-gs-tcs-tes.shader_test.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
d87894b98f st/glsl_to_tgsi: handle double immediates in matrices properly.
This handles matrix initialisation properly.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
7351c7684f st/glsl_to_tgsi: setup writemask for double arrays and matricies.
It's important for the double instruction emission code that
the writemasks are correct going in for double so it know
which channels to replicate.

This fixes it for the array and matrix cases.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
14506dcae2 st/glsl_to_tgsi: handle doubles in array shrinking code.
This code takes into account double inputs in the array
shrinking code. This fixes some issues with doubles
and geom/tess inputs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
aab0c6c9c4 st/glsl_to_tgsi: handle doubles outputs in arrays.
This handles the case where a double output is stored
in an array, and tracks it for use in the double
instruction emit code.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Dave Airlie
fc890d703e st/glsl_to_tgsi: store if dst is double in array
This is just a precursor patch to a fix for doubles with
tessellation that I've written.

We need to descend into output arrays in that case and
mark dst's as double.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-01-01 09:43:53 +10:00
Kenneth Graunke
65d3f85eb3 nvc0: Set winding order regardless of domain.
Quads need to respect winding order, too - not just triangles.

Fixes rendering in GFXBench 4.0's tessellation benchmark.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-30 16:04:12 -08:00
Kenneth Graunke
7cdc2b9ca0 glsl: Fix varying struct locations when varying packing is disabled.
varying_matches::record tries to compute the number of components in
each varying, which varying_matches::assign_locations uses to assign
locations.  With varying packing, it uses glsl_type::component_slots()
to come up with a reasonable value.

Without varying packing, it fell back to an open-coded computation
that didn't bother to handle structs at all.  I believe we can simply
use 4 * glsl_type::count_attribute_slots(false), which already handles
these cases correctly.

Partially fixes rendering in GFXBench 4.0's tessellation benchmark.
(NVE0 is almost right after this, but i965 is still mostly garbage.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-30 16:04:12 -08:00
Kenneth Graunke
4acf71c89b drirc: Disable ARB_blend_func_extended for Heaven 4.0/Valley 1.0.
Unigine Heaven 4.0 and Valley 1.0 use dual color blending but don't
specify which fragment shader output is which, so there's at best a
50/50 chance of us guessing it correctly.  This is invalid.

Unigine fixed this in 4.1 and 1.1 versions over a year and a half ago,
but hasn't actually released them for whatever reason.  So, add the
workaround back so that it works for most people.

Fixes Heaven 4.0/Valley 1.0 rendering on Ivybridge.  For whatever
reason, Broadwell worked.  4.1 and 1.1 have always worked.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
2015-12-30 16:04:12 -08:00
Ilia Mirkin
5ac15f788b glsl: add GL_ARB_shader_draw_parameters define
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2015-12-30 18:59:18 -05:00
Ilia Mirkin
517a93b346 nvc0: add ARB_shader_draw_parameters support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-30 16:55:57 -05:00
Ilia Mirkin
89bda9772d st/mesa: add GL_ARB_shader_draw_parameters support
Hooks up the new system values, passes the drawid in.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-30 16:55:56 -05:00
Ilia Mirkin
daaf0bdf46 gallium: add a drawid to pipe_draw_info
This will allow the state tracker to inform the driver where in a
broken-up multidraw we currently are. This can then be passed into the
vertex shader.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-30 16:55:56 -05:00
Ilia Mirkin
87b4e4e29f gallium: add PIPE_CAP_DRAW_PARAMETERS
This allows the state tracker to know that the various draw parameters
are available in vertex shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-30 16:55:56 -05:00
Ilia Mirkin
bb52ea45cc gallium: add baseinstance/drawid semantics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-12-30 16:55:56 -05:00
Ilia Mirkin
d50e6128b8 nv50/ir: attempt to do more constant folding on mad -> add conversion
The add might actually have a 0 as an argument, which would convert it
into a mov. Make sure to detect that. Also avoid the hack of putting the
immediate directly into the instruction, instead use a mov to put it
into place and let the later LoadPropagation pass place it if possible.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-30 12:29:07 -05:00
Marta Lofstedt
97685ff10e i965/gen8: Always use BRW_REGISTER_TYPE_UW for MUL on GEN8+
The imulExtended tests of the shader bitfield tests of the
OpenGL ES 3.1 CTS, fail on gen8+, when BRW_REGISTER_TYPE_W
is used for SHADER_OPECODE_MULH.

Also, remove unused helper function:
static inline bool type_is_signed(unsigned type)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-12-30 09:29:14 +01:00
Timothy Arceri
0d4cd045c8 glsl: tidy up struct with a single member
There used to be more members but they now share other fields
in order to keep memory use low.

Also making the naming more generic will allow us to reuse the
field for explicit byte offsets within blocks for
ARB_enhanced_layouts.

Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-30 11:52:05 +11:00
Emil Velikov
2c1a215409 glsl/linker: annotate static functions as such
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-30 11:51:58 +11:00
Emil Velikov
c704b89fe4 glsl: annotate ast_process_struct_or_iface_block_members() as static
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2015-12-30 11:51:51 +11:00
Jason Ekstrand
0119773ffc nir/builder: Add an init function that creates a simple shader for you
A hugely common case when using nir_builder is to have a shader with a
single function called main.  This adds a helper that gives you just that.
This commit also makes us use it in the NIR control-flow unit tests as well
as tgsi_to_nir and prog_to_nir.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2015-12-29 13:44:05 -08:00
Kristian Høgsberg Kristensen
55ca5b0e74 mesa/st: Pad out _mesa_sysval_to_semantic for new SYSTEM_VALUE_* enums
GL_ARB_shader_draw_parameters added two new system values.  This gets us
back to mapping mesa system values to the right TGSI semantics.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-12-29 12:15:01 -08:00
Ilia Mirkin
724134f683 nv50/ir: float(s32 & 0xff) = float(u8), not s8
Make sure to make conversion unsigned when we're ANDing the high bits
away. Fixes corruption in dolphin.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2015-12-29 15:08:20 -05:00
Kristian Høgsberg Kristensen
581f81860e i965: Reemit vertex state between indirect multi draws
If we're doing an indirect draw, prims[i].basevertex is always 0 and the
real base vertex value is in the indirect parameter buffer. We try to
avoid flagging BRW_NEW_VERTICES if prims[i].basevertex doesn't change,
which then breaks down for indirect draws. Thus, if a program uses base
vertex or base instance, and the draw call is indirect, always flag
BRW_NEW_VERTICES.  A new piglit test,
spec/ARB_shader_draw_parameters/drawid-indirect-vertexid tests this.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-12-29 10:39:25 -08:00