Commit graph

82384 commits

Author SHA1 Message Date
Francisco Jerez
37fd13ee2d i965/fs: Extend back-end interface for limiting the shader dispatch width.
This replaces the current fs_visitor::no16() interface with
fs_visitor::limit_dispatch_width(), which takes an additional
parameter allowing the caller to specify the maximum dispatch width a
shader can be compiled with.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
2d288cb9ea i965/fs: Implement SIMD32 register allocation support.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
7f10d3983b i965/fs: Remove pre-Gen7 register allocation class micro-optimization.
This was trying to save some one-time init on pre-Gen7 hardware under
the assumption that one would only ever need 1, 2, 4 and 8-wide
registers on those platforms.  However nothing guarantees that those
will be the only VGRF sizes used after lowering and optimization.  In
some cases we may end up with a temporary of different size being
allocated (e.g. by SIMD lowering to zip or unzip a multi-component
register region of a logical send instruction), and there is no
guarantee that they will be optimized away before register allocation
(especially since the compute_to_mrf coalescing pass is
rather... lacking...).  Instead just allocate classes for all possible
VGRF sizes up to MAX_VGRF_SIZE to avoid a crash in pq_test() when we
encounter a variable of any other size.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
1d5bf46ad1 i965/fs: Don't mutate multi-component arguments in sampler payload set-up.
The Gen5+ sampler message payload construction code steps through the
coordinate and derivative components by induction like 'coordinate =
offset(coordinate, bld, 1)', the problem is that while doing that it
may step one past the end of the coordinate vector causing an
assertion failure in offset() if it happens to be a (single component)
immediate.  Right now coordinates and derivatives are typically passed
as actual registers but that will no longer be the case when we start
propagating constants into logical messages.

Instead express coordinate components in closed form like
'offset(coordinate, bld, i)' -- The end result seems slightly more
readable that way and it allows passing the coordinate and derivative
registers by const reference instead of by value, so it seems like a
clean-up in its own right.

v2: Fold a few post-increment operators into the last MOV
    statement. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
ad8f66ed33 i965/fs: Fix multiple ACP interference during copy propagation.
This is more fallout from cf375a3333.
It's possible for multiple ACP entries to interfere with a given VGRF
write, so we need to continue iterating even if an overlapping entry
has already been found.

Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
c88b52745c i965/fs: Fix cmod propagation not to propagate non-identity cmod into CMP(N).
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction.  This prevents
cmod propagation from (mis)optimizing:

 cmp.z.f0 tmp, ...
 mov.z.f0 null, tmp

into:

 cmp.z.f0 tmp, ...

which gives the negation of the flag result of the original sequence.
I could reproduce this easily with SIMD32 but I don't see any reason
why the problem would be SIMD32-specific, it was most likely working
by luck.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:06 -07:00
Francisco Jerez
8476233ae2 i965/fs: Estimate number of registers written correctly in opt_register_renaming.
The current estimate is incorrect for non-32b types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
437e65f9d9 i965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg.
These are completely ignored by the conversion to brw_reg, so they
better be zero.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
51dd6a60f5 i965/fs: Reset reg_offset of the original destination to zero in compute_to_mrf().
Prevents an assertion failure in the following commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
b9eab911ba i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.
The pass is disabled in SIMD16 dispatch mode for the same reason, it
cannot handle instructions that write multiple MRF registers at once.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
796238d9e6 i965/fs: Use SIMD8 SSBO GET_BUFFER_SIZE message regardless of the dispatch width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
29e4717251 i965/fs: Don't emit duplicated SSBO GET_BUFFER_SIZE instruction unnecessarily.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
a55452530f i965/fs: Emit fixed width memory fence opcode regardless of the dispatch width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
ae730049c6 i965/fs: Return 32 bit mask from fs_builder::sample_mask().
This doesn't actually handle the FS case, just add an assertion for
the moment so I don't forget to update it later on for SIMD32 fragment
shader dispatch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
8b6edee679 i965/fs: Emit fixed-width null register regardless of the dispatch width.
brw_null_vec() cannot handle widths over 16 but it doesn't really
matter what width we specify for null registers because destination
regions have no width field at the hardware level.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
298320280f i965/fs: Fix half() to handle more exotic register files.
horiz_offset() is able to deal with a superset of the register files
currently special-cased in half().  Just call horiz_offset() in all
cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
8c9601ef7b i965/fs: Fix horiz_offset() to handle ARF and HW GRF register files.
We'll hit these in some cases during SIMD lowering in 32-wide
programs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
7d430fc05e i965/fs: Clean up remaining uses of fs_inst::reads_flag and ::writes_flag.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:05 -07:00
Francisco Jerez
ecd7a7255a i965/fs: Keep track of flag dependencies with byte granularity during scheduling.
This prevents false dependencies from being created between
instructions that write disjoint 8-bit portions of the flag register
and OTOH should make sure that the scheduler considers dependencies
between instructions that write or read multiple flag subregisters
at once (e.g. 32-wide predication or conditional mods).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
0fec265373 i965/fs: Track flag register liveness with byte granularity.
This is required for correctness in presence of multiple 8-wide flag
writes (e.g. 8-wide instructions with a conditional mod set) which
update a different portion of the same 16-bit flag subregister.  Right
now we keep track of flag dataflow with 16-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 16 channels wide,
which can cause live flag register updates to be dead code-eliminated
incorrectly.

Additionally this makes sure that we handle 32-wide flag writes and
reads which may span multiple flag subregisters so the current
approach of just setting/testing a single bit from the live set
wouldn't have worked.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
df1aec763e i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.
v2: Codestyle fixes (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
ece41df247 i965/fs: Expose arbitrary channel execution groups to the IR.
This generalizes the current fs_inst::force_sechalf flag to allow
specifying channel enable groups other than 0 or 8.  At some point it
will likely make sense to fix the vec4 generator to support arbitrary
execution groups and then move the definition of fs_inst::group into
backend_instruction (e.g. so we can do FP64 in the VEC4 back-end).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
81bc6de8c0 i965/ir: Make BROADCAST emit an unmasked single-channel move.
Alternatively we could have extended the current semantics to 32-wide
mode by changing brw_broadcast() to emit multiple indexed MOV
instructions in the generator copying the selected value to all
destination registers, but it seemed rather silly to waste EU cycles
unnecessarily copying the exact same value 32 times in the GRF.

The vstride change in the Align16 path is required to avoid assertions
in validate_reg() since the change causes the execution size of the
MOV and SEL instructions to be equal to the source region width.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
41562eb8f3 i965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL.
This makes FIND_LIVE_CHANNEL behave like a normal instruction for
non-zero quarter control.  On Gen8+ we just leave the quarter control
field of the emitted FBL instruction set to the default value so the
hardware applies the expected shift to the execution mask signals.  On
Gen7 we apply the offset manually by specifying a non-zero subregister
offset in the source region of the FBL instruction.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
a5a0810960 i965/fs: Allow specifying arbitrary execution sizes up to 32 to FIND_LIVE_CHANNEL.
Due to a Gen7-specific hardware bug native 32-wide instructions get
the lower 16 bits of the execution mask applied incorrectly to both
halves of the instruction, so the MOV trick we currently use wouldn't
work.  Instead emit multiple 16-wide MOV instructions in 32-wide mode
in order to cover the whole execution mask.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:04 -07:00
Francisco Jerez
1e3c58ffaf i965/fs: Lower 32-wide scratch writes in the generator.
The hardware has messages that can write 32 32bit components at once
but the channel enable mask gets messed up.  We need to split them
into several 16-wide scratch writes for the channel enables to be
applied correctly.  The SIMD lowering pass cannot be used for this
because scratch writes are emitted rather late during register
allocation long after SIMD lowering has been done.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:29:02 -07:00
Francisco Jerez
a7d319c00b i965/fs: Implement scratch reads and writes of 4 GRFs at a time.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
fe5cdde2f9 i965/eu: Fix Gen7+ DP scratch message size calculation on Gen7.
Gen7 hardware expects the block size field in the message descriptor
to be the number of registers minus one instead of the log2 of the
number of registers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
fc7107de1d i965/eu: Set execution size explicitly for memory fence send message.
We don't want to emit a 32-wide send message in 32-wide programs.  The
memory fence message should have the same effect regardless of the
execution size (as long as it's valid) so just set it to one.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
5c887326c5 i965/eu: Consider QtrCtrl 3Q-4Q in typed surface message descriptor setup.
In SIMD32 programs the compiler is responsible for providing the
appropriate half of the sample mask in the message header, so the
first and third quarters both map to the first slot group of the
provided 16-bit half, while the second and fourth quarters map to the
second slot group -- IOW they should be equivalent to 1Q and 2Q modulo
two.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
448340d31f i965/fs: Clean up remaining uses of dispatch_width in the generator.
Most of these are bugs because the intended execution size of an
instruction and the dispatch width of the shader aren't necessarily
the same (especially in SIMD32 programs).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
7f28ad8c4d i965/eu: Remove brw_codegen::compressed and ::compressed_stack.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:59 -07:00
Francisco Jerez
646213168e i965/eu: Use current exec size instead of p->compressed in surface message generation.
This was kind of an abuse of p->compressed, dataport send message
instructions are always uncompressed.  Use the current execution size
instead since p->compressed is on its way out.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:28:46 -07:00
Francisco Jerez
492286e90b i965/fs: No need to reset predicate control after emitting some instructions.
Trivial clean-up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
8ef5637729 i965/fs: Pass current execution size to brw_IF() and brw_DO().
This gets IF and DO instructions working in SIMD32 programs.  brw_IF()
and brw_DO() should probably behave in the same way as other generator
functions that emit control flow instructions and just figure out the
right execution size by themselves from the current execution controls
specified through the brw_codegen argument.  Changing that will
require updating lots of Gen4-5 clipper code though, so for the moment
just pass the current value redundantly from the FS generator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
fdae8b9f91 i965/eu: Stop using p->compressed to specify the exec size of control flow instructions.
p->compressed won't work for SIMD32, we should just be using the
execution size value specified via p->current instead.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
0b4cd91071 i965/fs: Extend region width calculation to allow arbitrary execution sizes.
Instead of just halving the execution size when the instruction is
compressed hoping that it will give a legal source region width, we
can calculate the maximum legal width value in closed form from the
component size and stride.  This makes sure that brw_reg_from_fs_reg()
always returns a valid hardware region even for virtual 32-wide
instructions (e.g. send-like instructions) that would seem to exceed
the hardware region width limit after halving.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Kenneth Graunke
dabaf4fb96 i965/fs: Pass the compression mode to brw_reg_from_fs_reg().
Curro is planning to eliminate p->compressed, so let's avoid using it
here and just pass in the value directly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
[ Francisco Jerez: Pass boolean flag instead of brw_compression enum. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
3340a66fce i965/fs: Simplify per-instruction compression control setup in generator.
By using the new compression/group control interface.  This will allow
easier extension to support arbitrary channel enable groups at the IR
level.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
c78edcea8b i965/fs: No need to set compression control at the top of generate_code().
The right value is dependent on the specific IR instruction being
generated so it has to be reset in every iteration of the loop anyway.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
c19c3d3a52 i965/eu: Fix a bunch of compression control bugs in the generator.
Most of these were resetting quarter control to zero incorrectly even
though everything they needed to do was disable instruction
compression -- The brw_SAMPLE() case was doing the right thing but it
can be simplified slightly by using the new compression control
interface.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
3dffd81583 i965/eu: Define alternative interface for setting compression and group controls.
This implements some simple helper functions that can be used to
specify the group of channel enable signals and compression enable
that apply to a brw_inst instruction.

It's intended to replace brw_set_default_compression_control
eventually because the current interface has a number of shortcomings
inherited from the Gen-4-5-centric representation of compression and
group controls as a single non-orthogonal enum: On the one hand it
doesn't work for specifying arbitrary group controls other than 1Q and
2Q, which are frequently useful in SIMD32 and FP64 programs.  On the
other hand the current interface forces you to update the compression
*and* group controls simultaneously, which has been the source of a
number of generator bugs (a bunch of them fixed in this series),
because in many cases we would end up resetting the group controls to
zero inadvertently even though everything we wanted to do was disable
instruction compression -- The latter seems especially unfortunate on
Gen6+ hardware which have no explicit compression control, so we would
end up bashing the quarter control field of the instruction for no
benefit.

Instead of a single function that updates both at the same time
introduce separate interfaces to update one or the other independently
preserving the current value of the other (which typically comes from
the back-end IR so it has to be respected).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
5db4d62395 i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.
It's just a byte MOV with strided source.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:10 -07:00
Francisco Jerez
29ce110be6 i965/fs: Remove extract virtual opcodes.
These can be easily represented in the IR as a MOV instruction with
strided source so they seem rather redundant.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:09 -07:00
Francisco Jerez
9dcb8ff6a1 i965: Define brw_int_type() helper.
Intended as a (partial) inverse of type_sz().  Will be useful in the
next commit and some other SIMD32 generator changes I have queued up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:09 -07:00
Francisco Jerez
bb89beb26b i965/fs: Remove manual splitting of DDY ops in the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:22:02 -07:00
Francisco Jerez
982c48dc34 i965/fs: Remove manual unrolling of BFI instructions from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:23 -07:00
Francisco Jerez
95272f5c7e i965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:23 -07:00
Francisco Jerez
f14b9ea6e6 i965/fs: Drop lowering code for a few three-source instructions from the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:23 -07:00
Francisco Jerez
117a9a0a64 i965/fs: Set default access mode to Align1 for all instructions in the generator.
Currently the generator code for most opcodes honours the default
access mode (which should typically be Align1 in the scalar back-end),
but generate_code() doesn't set it explicitly which means that the
access mode from a previous instruction could leak into the following
ones if you did something special and weren't careful enough to save
and restore the previous access mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-05-27 23:19:22 -07:00