Commit graph

81989 commits

Author SHA1 Message Date
Francisco Jerez
bb89beb26b i965/fs: Remove manual splitting of DDY ops in the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:02 -07:00
Francisco Jerez
982c48dc34 i965/fs: Remove manual unrolling of BFI instructions from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:23 -07:00
Francisco Jerez
95272f5c7e i965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:23 -07:00
Francisco Jerez
f14b9ea6e6 i965/fs: Drop lowering code for a few three-source instructions from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:23 -07:00
Francisco Jerez
117a9a0a64 i965/fs: Set default access mode to Align1 for all instructions in the generator.
Currently the generator code for most opcodes honours the default
access mode (which should typically be Align1 in the scalar back-end),
but generate_code() doesn't set it explicitly which means that the
access mode from a previous instruction could leak into the following
ones if you did something special and weren't careful enough to save
and restore the previous access mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
3a541d0c0b i965/fs: Remove handcrafted math SIMD lowering from the generator.
Most of this wouldn't have worked for SIMD32 and had various
dispatch_width and compression control bugs.  It's mostly dead now
with SIMD lowering of math instructions turned on in the compiler.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
cf5443f984 i965/fs: Limit SIMD width of various virtual opcodes to the maximum supported value.
Which is 16 or 8 in most cases.  This will make sure that 32-wide
virtual instructions get chopped up into chunks of their maximum
execution size.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
197833caa3 i965/fs: Lower LOAD_PAYLOAD instructions of unsupported width.
Only per-channel LOAD_PAYLOAD instructions can be lowered, which
should cover everything that comes in from the front-end.

LOAD_PAYLOAD instructions used to construct actual message payloads
cannot be easily lowered because they contain headers and vectors of
variable type that aren't necessarily channel-aligned -- We shouldn't
find any of them in the program at SIMD lowering time though because
they're introduced during logical send lowering.

An alternative that may be worth considering would be to re-run the
SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
9eea3df29f i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time
...on hardware lacking compressed Align16 support.  Will allow
simplifying the generator code and fixing it for SIMD32 codegen.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
12ae87abb1 i965/fs: Apply usual FPU-like execution size restrictions to MULH.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
dea9c1df89 i965/fs: Calculate maximum execution size of MOV_INDIRECT correctly.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
122e031548 i965/fs: Assert that IF instruction with embedded compare has legal exec_size.
We shouldn't encounter these right now but if we did it wouldn't be
possible for the SIMD lowering pass to split it into multiple
instructions because of its side effects on control flow, so just
assert in order to kill the program.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
98c8bef01c i965/fs: Implement HSW BFI exec size workarounds in the SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
88d9cc1563 i965/fs: Implement workaround for IVB CMP dependency race in the SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00
Francisco Jerez
a6bf5f88c7 i965/fs: Enforce common regioning restrictions by SIMD splitting.
This change addresses a number of hardware restrictions on the source
and destination regions and other execution controls of regular
FPU-like instructions that in some cases can be avoided by reducing
the execution size of the instruction.  Some of these restrictions
(e.g. the one about 3src instructions not supporting compression on
some hardware) are currently being worked around case by case in the
generator with ad-hoc splitting code that is buggy in several ways
(e.g. doesn't handle non-trivial execution controls which would break
SIMD32 code), but it seems cleaner to implement as many restrictions
as we can in a single lowering pass since that will allow us to
simplify some of the surrounding code considerably and also make sure
that we don't forget applying them in the future.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
2b5adb942b i965/fs: Enforce extended math exec size limits during SIMD lowering.
This teaches the SIMD lowering pass about the hardware limits on the
execution size of math instructions, which will allow simplifying the
generator code and at the same time get rid of a number of bugs in the
manual SIMD unrolling done currently that prevent SIMD32 codegen from
working.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
a8e7b4f1d9 i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.
Seems like this texturing opcode was missing its logical counterpart
which would prevent it from taking advantage of the SIMD lowering
infrastructure, define it and plumb it through the back-end.  At some
point we'll likely want to emit a single SAMPLEINFO message shared
among all channels irrespective of this change, but for the moment
this should be enough to get the intrinsic working in SIMD32 mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
99b5476d33 i965/fs: Lower math into Gen4-5 send-like instructions in lower_logical_sends.
The benefit is we will be able to use the SIMD lowering pass to unroll
math instructions of unsupported width and then remove some cruft from
the generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
e531b7907a i965/fs: Add missing get_latency_gen7() cases for the Gen7 pull constant opcodes.
This was causing the scheduler to be rather optimistic about the
latency of pull constant opcodes on Gen7+.  This might seem to
increase the cycle count estimate calculated by the scheduler itself
for some shaders, even though the actual cycle count should actually
be decreased.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
ed4d0e41ac i965/fs: Rename Gen4 physical varying pull constant load opcode.
For consistency with the Gen7 variant.  I'm not doing the same to the
uniform pull constant message at this point because the non-GEN7 one
is still overloaded to be either an expression-like logical
instruction or a Gen4-specific physical send message.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
64a6cb87f1 i965/fs: Implement promotion of varying pull loads on Gen4 during SIMD lowering.
Varying pull constant loads inherit the same limitation of pre-ILK
hardware that requires expanding SIMD8 texel fetch instructions to
SIMD16, we can deal with pull constant loads in the same way it's done
for texturing during SIMD lowering.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
d8a3294ac2 i965/fs: Hide varying pull constant load message setup behind logical opcode.
This will allow the SIMD lowering pass to split 32-wide varying pull
constant loads (not natively supported by the hardware) into 16-wide
instructions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:21 -07:00
Francisco Jerez
0bc5ad8d19 i965/fs: Avoid constant propagation when the type sizes don't match.
The case where the source type of the instruction is smaller than the
immediate type could be handled by calculating the portion of the
immediate read by the instruction (assuming that the source channels
are aligned with the destination channels of the copy) and then
representing the same value as an immediate of the source type
(assuming such an immediate type exists), but the code below doesn't
do that, so just bail for the moment.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
52cc80d859 i965/fs: Fix CSE temporary copy for some LOAD_PAYLOAD corner cases.
If the LOAD_PAYLOAD instruction only has header sources it's possible
for the number of registers written to be less than or equal to the
SIMD component size, in which case it would take the single-MOV path
at the bottom which would cause the channel enable masks to be applied
incorrectly to the header contents and/or cause it to write past the
end of the allocated temporary.  If the instruction is either
LOAD_PAYLOAD or doesn't write exactly one component the MOV path is
going to mess up the program so just don't use it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
c5f224145a i965/fs: Handle instruction predication in SIMD lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
1760c24b4b i965/fs: No need to unzip SIMD-periodic sources during SIMD lowering.
If the source value is going to the same for all SIMD-lowered chunks
of the instruction there should be no need to unzip the value into
multiple temporary registers one for each lowered chunk.  As a side
effect this fixes SIMD lowering of instructions with a vector
immediate source.  In the long term it *might* still be worth fixing
offset() to handle vector immediates correctly though, this should be
good enough for the moment.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
168163f5f0 i965/fs: Generalize is_uniform() to is_periodic().
This will be useful in the SIMD lowering pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
b736e78ddb i965/fs: Fix byte_offset() for MRF/ARF/FIXED_GRF regs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:20 -07:00
Francisco Jerez
2db9dd5aeb i965/fs: Fix off-by-one region overlap comparison in copy propagation.
This was introduced in cf375a3333 but
the blame is mine because the pseudocode I sent in my review comment
for the original patch suggesting to do things this way already had
the off-by-one error.  This may have caused copy propagation to be
unnecessarily strict while checking whether VGRF writes interfere with
any ACP entries and possibly miss valid optimization opportunities in
cases where multiple copy instructions write sequential locations of
the same VGRF.

Cc: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-05-27 23:19:20 -07:00
Ronie Salgado
8f538d9ae0 anv/cmd_buffer: Don't delete command buffers in ResetCommandPool()
v2 (Jason Ekstrand): Destroy command buffers in DestroyCommandPool().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95034
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 18:56:33 -07:00
Brian Paul
747754f027 gallium/util: another s/unsigned/enum pipe_prim_type/ for clang
Trivial.
2016-05-27 18:42:21 -06:00
Jason Ekstrand
b93b5935a7 anv: Try the first 8 render nodes instead of just renderD128
This way, if you have other cards installed, the Vulkan driver will still
work.  No guarantees about WSI working correctly but offscreen should at
least work.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95537
2016-05-27 17:18:33 -07:00
Jason Ekstrand
e023c104f7 anv: strdup the device path into the physical device
This way we don't have to assume that the string coming in is a piece of
constant data that exists forever.
2016-05-27 17:18:33 -07:00
Jason Ekstrand
9048dee328 anv/formats: Exit early for unsupported formats 2016-05-27 17:17:09 -07:00
Jason Ekstrand
10bc9f7024 anv/formats: Map VK_FORMAT_UNDEFINED to ISL_FORMAT_UNSUPPORTED
At one point in time, we may have used the mapping to ISL_FORMAT_RAW for
certain buffer surfaces but that time has long since passed.  This fixes a
bug where doing format queries on VK_FORMAT_UNDEFINED would assert-fail.
2016-05-27 17:17:09 -07:00
Jason Ekstrand
b16326c740 anv/clear: Remove an unused variable 2016-05-27 17:17:09 -07:00
Brian Paul
8beb6f3c9c gallium/util: another unsigned -> enum pipe_prim_type change
gcc didn't warn about the unsigned / enum pipe_prim_type mismatch
between the .c and .h file.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-27 17:55:05 -06:00
Jordan Justen
47e2a57fe9 i965/compute: Fix uniform init issue when SIMD8 is skipped
In d8347f12ea, we added support for
skipping SIMD8 generation when the program local size is too large for
SIMD8 to be usable. This change was missed in that commit.

This bug would impact gen7 platforms when the compute shader local
size is greater than 512, and gen8 platforms when the local size is
greater than 448.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-05-27 16:44:00 -07:00
Bas Nieuwenhuizen
65d4ba6f20 docs: Mention GL4.3 and ES3.1 support for nvc0 and radeonsi
v2: also update the introductory text.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-28 01:04:03 +02:00
Jason Ekstrand
fb2a5ceb32 anv: Emit DRAWING_RECTANGLE once at driver initialization
Also, we don't actually need it for clipping because meta always colors
inside the lines and, for all other operations, the user is required to set
a scissor.  Since DRAWING_RECTANGLE stalls the GPU, we want to emit it as
little as possible.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:18:11 -07:00
Jason Ekstrand
3a83c176ea anv/cmd_buffer: Only emit PIPE_CONTROL on-demand
This is in contrast to emitting it directly in vkCmdPipelineBarrier.  This
has a couple of advantages.  First, it means that no matter how many
vkCmdPipelineBarrier calls the application strings together it gets one or
two PIPE_CONTROLs.  Second, it allow us to better track when we need to do
stalls because we can flag when a flush has happened and we need a stall.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:18:09 -07:00
Jason Ekstrand
7120c75ec3 genxml: Make PIPE_CONTROL::CommandStreamerStallEnable a boolean
This has been declared as a uint since SNB but it's only one bit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:18:07 -07:00
Jason Ekstrand
b26bd6790d anv/clear: Only clear the render area when doing subpass clears
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:18:04 -07:00
Jason Ekstrand
5432487792 anv: Move push constant allocation to the command buffer
Instead of blasting it out as part of the pipeline, we put it in the
command buffer and only blast it out when it's really needed.  Since the
PUSH_CONSTANT_ALLOC commands aren't pipelined, they immediately cause a
stall which we would like to avoid.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-05-27 15:17:43 -07:00
Bas Nieuwenhuizen
2cee0d0f9c radeonsi: enable OpenGL 4.3
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-27 22:28:11 +02:00
Dave Airlie
0438bc76e2 nouveau: enable GL 4.3 on kepler/fermi
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-28 05:52:13 +10:00
Marek Olšák
43550f25ed radeonsi: always reserve output space for tess factors
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Dave Airlie <airlied@redhat.com>
2016-05-27 21:40:43 +02:00
Dave Airlie
c44513a1f3 glsl/linker: call link_uniform blocks on linked shader.
The old code called this on the prelinked shader list,
but at this point we have the linked shader, so we should
call the interface on that alone.

This fixes a regression in:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.13
introduced in
5b2675093e
glsl: handle implicit sized arrays in ssbo

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96228
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reported-by: Mark James
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-28 05:35:53 +10:00
Dave Airlie
f0254fdd07 mesa/get: drop unused extension checks.
These all show up as unused warnings here, so drop them for now.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-28 05:29:23 +10:00
Bas Nieuwenhuizen
4717d5a2d3 gallium/ddebug: Add passthrough for query_memory_info.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-27 20:00:07 +02:00