Commit graph

117 commits

Author SHA1 Message Date
Anuj Phogat
abe9a71a09 intel: Rename gen field in gen_device_info struct to ver
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "info\)*(.|->)gen" -rIl $SEARCH_PATH | xargs sed -ie "s/info\()*\)\(\.\|->\)gen/info\1\2ver/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Ian Romanick
6c8e2e9317 intel/compiler: Enable the ability to emit CMPN instructions
v2: Move checks to the EU validator.  Suggested by Jason.

Fixes: 2f2c00c727 ("i965: Lower min/max after optimization on Gen4/5.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9027>
2021-02-17 19:52:24 +00:00
Jason Ekstrand
f3a43e36e0 intel/fs: Add an ex_desc field to fs_inst for SHADER_OPCODE_SEND
I meant to do this years ago when I first added SHADER_OPCODE_SEND.  At
the time, the only use for the extended descriptor was bindless handles
which were always one thing and never non-constant.  However, it doesn't
actually require any extra instructions because we have to OR in ex_mlen
anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8748>
2021-01-28 17:57:48 +00:00
Ian Romanick
92f08860c9 intel/compiler: Silence unused parameter warning in brw_surface_payload_size
src/intel/compiler/brw_eu_emit.c: In function ‘brw_surface_payload_size’:
src/intel/compiler/brw_eu_emit.c:3070:46: warning: unused parameter ‘p’ [-Wunused-parameter]
 3070 | brw_surface_payload_size(struct brw_codegen *p,
      |                          ~~~~~~~~~~~~~~~~~~~~^

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6826>
2020-09-28 11:43:04 -07:00
Jason Ekstrand
8d8a3815ef intel/eu: Add a mechanism for emitting relocatable constant MOVs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
2020-09-02 19:48:44 +00:00
Jason Ekstrand
91348d125d intel/eu: Add some new helpers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
2020-09-02 19:48:44 +00:00
Jason Ekstrand
cccb497d3c intel/fs: Fix MOV_INDIRECT and BROADCAST of Q types on Gen11+
The immediate case is pretty uncommon to see but it can happen, in
theory.  BROADCAST is typically used to uniformize values and those are
usually 32-bit.  However, it does come up in some subgroup ops.

Fixes: 49c21802cb "intel/compiler: Split has_64bit_types into float/int"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6211>
2020-09-01 13:25:20 -05:00
Matt Turner
c883c482be intel/compiler: Relax SENDS regioning assertions
The next commit fixes a mistake in the assembler and ends up running
afoul of this assertion.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5956>
2020-07-31 12:59:24 -07:00
Jason Ekstrand
561aaeeb48 intel/eu: Add the RNDU opcode
We don't want to use it on gen5 and earlier because only RNDD can be
done with a single instruction and we can implement RNDU(x) as -RNDD(-x)
so it's better to just do that when we have the instruction.  On gen6
and above, we may as well just use the right instruction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>
2020-06-23 17:43:54 +00:00
Jason Ekstrand
e0ab48e3ea intel/eu: Set the right subnr for ALIGN16 destinations
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>
2020-06-23 17:43:54 +00:00
Jason Ekstrand
c48f42e178 intel/fs: Emit HALT for discard on Gen4-5
Using HALT to immediately jump to the end of the shader is required to
implement GL_EXT_gpu_shader4 and OpenGL 3.0.  However, vanilla OpenGL
1.2 doesn't forbid it and it likely makes something somewhere faster.
We should be consistent and implement the same discard behavior on all
hardware if we can.

The rules for HALT on Gen4-5 are a bit different from Gen6+.  On the
older hardware, there is no stack for HALT; instead it's up to software
to save and restore mask registers.  However, there's no real saving
needed since we only use HALT to jump to the end of the program where
we're about about to do our FB writes.  All we need to do is reset AMask
to DMask, the value it was initialized to at the start of the thread.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5244>
2020-05-30 06:21:15 +00:00
Caio Marcelo de Oliveira Filho
f858fa26b4 intel/fs,vec4: Pull stall logic for memory fences up into the IR
Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.

For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).

Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics.  The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).

This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.

Shader-db results for IVB (i965):

    total instructions in shared programs: 11971190 -> 11971200 (<.01%)
    instructions in affected programs: 11482 -> 11492 (0.09%)
    helped: 0
    HURT: 8
    HURT stats (abs)   min: 1 max: 3 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
    95% mean confidence interval for instructions value: 0.66 1.84
    95% mean confidence interval for instructions %-change: 0.01% 0.27%
    Instructions are HURT.

  Unlike the previous code, that used the `mov g1 g2` trick to force
  both `g1` and `g2` to stall, the scheduling fence will generate `mov
  null g1` and `mov null g2`.  During review it was decided it was not
  worth keeping the special codepath for the small effect will have.

Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below

    total cycles in shared programs: 341738444 -> 341710570 (<.01%)
    cycles in affected programs: 7240002 -> 7212128 (-0.38%)
    helped: 46
    HURT: 5
    helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
    helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
    HURT stats (abs)   min: 2 max: 1768 x̄: 646.40 x̃: 362
    HURT stats (rel)   min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
    95% mean confidence interval for cycles value: -777.71 -315.38
    95% mean confidence interval for cycles %-change: -1.42% -0.83%
    Cycles are helped.

  This seems to be the effect of allocating two registers separatedly
  instead of a single one with size 2, which causes different register
  allocation, affecting the cycle estimates.

while ICL also has not change on instruction count but report changes
negative changes in cycles

    total cycles in shared programs: 352665369 -> 352707484 (0.01%)
    cycles in affected programs: 9608288 -> 9650403 (0.44%)
    helped: 4
    HURT: 104
    helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
    helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
    HURT stats (abs)   min: 2 max: 2016 x̄: 408.36 x̃: 48
    HURT stats (rel)   min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
    95% mean confidence interval for cycles value: 256.67 523.24
    95% mean confidence interval for cycles %-change: 0.63% 1.03%
    Cycles are HURT.

  AFAICT this is the result of the case above.

Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions

    total instructions in shared programs: 17690586 -> 17690597 (<.01%)
    instructions in affected programs: 64617 -> 64628 (0.02%)
    helped: 55
    HURT: 32
    helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
    helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
    HURT stats (abs)   min: 1 max: 65 x̄: 7.44 x̃: 2
    HURT stats (rel)   min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
    95% mean confidence interval for instructions value: -2.03 2.28
    95% mean confidence interval for instructions %-change: -0.41% 0.15%
    Inconclusive result (value mean confidence interval includes 0).

  Now that more is done in the IR, more dependencies are visible and
  more SWSB annotations are emitted.  Mixed with different register
  allocation decisions like above, some shaders will see more `sync
  nops` while others able to avoid them.

  Most of the new `sync nops` are also redundant and could be dropped,
  which will be fixed in a separate change.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00
Dylan Baker
f8e4542bad replace _mesa_logbase2 with util_logbase2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
2020-04-21 11:09:03 -07:00
Caio Marcelo de Oliveira Filho
c76f2292b5 intel/fs,vec4: Properly account SENDs in IVB memory fence
Change brw_memory_fence to return the number of messages emitted, and
use that to update the send_count statistic in code generation.

This will fix the book-keeping for IVB since the memory fences will
result in two SEND messages.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4646>
2020-04-20 09:29:09 -07:00
Jason Ekstrand
3b2eafbea9 intel/fs: Don't unnecessarily fall back to indirect sends on Gen12
The instruction encoding for SENDS changed on Gen12 and it now supports
embedding the entire extended message descriptor in the instruction if
it's an immediate.  Stop falling back to doing an indirect SEND just
because we had something in [15:12] of ex_desc.ud.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
2020-01-24 19:18:27 +00:00
Matt Turner
88a0523bd2 intel/compiler: Move Gen4/5 rounding to visitor
Gen4/5's rounding instructions operate differently than later Gens'.
They all return the floor of the input and the "Round-increment"
conditional modifier answers whether the result should be incremented by
1.0 to get the appropriate result for the operation (and thus its
behavior is determined by the round opcode; e.g., RNDZ vs RNDE).

Since this requires a second instruciton (a predicated ADD) that
consumes the result of the round instruction, the round instruction
cannot write its result directly to the (write-only) message registers.
By emitting the ADD in the generator, the backend thinks it's safe to
store the round's result directly to the message register file.

To avoid this, we move the emission of the ADD instruction to the NIR
translator so that the backend has the information it needs.

I suspect this also fixes code generated for RNDZ.SAT but since
Gen4/5 don't support GLSL 1.30 which adds the trunc() function, I
couldn't write a piglit test to confirm. My thinking is that if x=-0.5:

      sat(trunc(-0.5)) = 0.0

But on Gen4/5 where sat(trunc(x)) is implemented as

      rndz.r.f0  result, x             // result = floor(x)
                                       // set f0 if increment needed
      (+f0) add  result, result, 1.0   // fixup so result = trunc(x)

then putting saturate on both instructions will give the wrong result.

      floor(-0.5) = -1.0
      sat(floor(-0.5)) = 0.0
      // +1 increment would be needed since floor(-0.5) != trunc(-0.5)
      sat(sat(floor(-0.5)) + 1.0) = 1.0

Fixes: 6f394343b1 ("nir/algebraic: i2f(f2i()) -> trunc()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2355
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
2020-01-22 23:47:02 +00:00
Iván Briano
ca94717035 intel/compiler: Don't change hstride if not needed
Alignment requirements may have changed the horizontal stride already,
so don't set it if not required to avoid breaking said requirements.

Fixes several tests such as
dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t

Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-18 14:19:41 -08:00
Sagar Ghuge
bf943bdf24 intel/compiler: Set bits according to source file
On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Sagar Ghuge
c018c5a339 intel/compiler: Add Immediate support for 3 source instruction
On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Francisco Jerez
6b52f81395 intel/eu: Don't set notify descriptor field of gateway barrier message.
Apparently this field was removed on SKL, and according to the
hardware docs for previous platforms "This field is only valid for a
ForwardMsg message. It is ignored for other messages. The BarrierMsg
message always increments the N0 notification counter".

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
15e3a0d9d2 intel/eu/gen12: Set SWSB annotations in hand-crafted assembly.
Reviewers are encouraged to audit the code generation pass
independently for the case I missed some potential data hazard or new
code has been added in the meantime.

v2: Add SYNC instruction to cr0 workaround in brw_float_controls_mode().

v3: Drop likely redundant (and potentially harmful) RegDist SWSB
    annotation from ce0 read in brw_find_live_channel() (Caio).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Francisco Jerez
d3f3bdcd18 intel/eu/gen12: Add tracking of default SWSB state to the current brw_codegen instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Francisco Jerez
c22db5e188 intel/fs/gen12: Add codegen support for the SYNC instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
7499e10383 intel/eu/gen12: Don't set thread control, it's gone.
An effect similar to the one formerly provided by setting thread
control to "switch" can be achieved now by setting a RegDist of 1 on
the SWSB field.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
a66ea33991 intel/eu/gen12: Don't set DD control, it's gone.
A future lowering pass will simulate the same behavior originally
provided by NoDDChk/NoDDClr at the IR level by using appropriate SWSB
annotations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
8a5fad0d92 intel/eu/gen12: Use SEND instruction for split sends.
The new SEND instruction behaves like the former SENDS instruction.
The original single-payload SEND instruction is gone.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
6634ede7aa intel/eu/gen12: Codegen SEND descriptor regions correctly.
The SEND instruction is now four-source.  The descriptor is no longer
part of source 1, so avoid touching it to avoid corruption while
initializing the descriptor.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
2c4c9aba30 intel/eu/gen12: Codegen pathological SEND source and destination regions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
bafc9515db intel/eu/gen12: Codegen control flow instructions correctly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
6e1daba3b4 intel/eu/gen12: Codegen three-source instruction source and destination regions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
9fdb67aa09 intel/eu/gen12: Fix codegen of immediate source regions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
057902dcf8 intel/eu: Encode and decode native instruction opcodes from/to IR opcodes.
Change brw_inst_set_opcode() and brw_inst_opcode() to call
brw_opcode_encode/decode() transparently in order to translate between
hardware and IR opcodes, and update the EU compaction code in order to
do the same as needed, so we can eventually drop the one-to-one
correspondence between hardware and IR opcodes.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
35bcd08d61 intel/eu: Split brw_inst ex_desc accessors for SEND(C) vs. SENDS(C).
The brw_inst opcode accessors are going away in one of the following
commits.  We could potentially replace them with the new helpers that
do opcode remapping, but that would lead to a circular dependency
between brw_inst.h and brw_eu.h.  This way we also avoid ordering
issues that can cause the semantics of the ex_desc accessors to change
depending on whether the ex_desc field is set after or before the
opcode instruction field.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Samuel Iglesias Gonsálvez
28da9558f5 i965/fs/generator: refactor rounding mode helper in preparation for float controls
v2:
- Fix bug in defining BRW_CR0_FP_MODE_MASK.

v3:
- Update comment (Caio).

v4:
- Split the patch into the helper (this one) and the new
  opcode (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:19 +03:00
Kenneth Graunke
b6fe25c7f5 intel/compiler: Handle bits 15:12 in brw_send_indirect_split_message()
Annoyingly, these bits exist in some extended message descriptors
(in particular render target writes), but they don't have any
corresponding bits in the ISA encoding.  So we can't use an immediate
and have to fall back to an indirect extended descriptor.

Thanks to Jason Ekstrand for reminding me that you can still set these
bits via an indirect descriptor, even if they don't exist in the ISA.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-27 14:20:07 -07:00
Kenneth Graunke
c8c9c48684 intel/compiler: Fix src0/desc setter ordering
src0 vstride and type overlap with bits of the extended descriptor.
brw_set_desc() also sets the extended descriptor to 0.  So by setting
the descriptor, then setting src0, we were accidentally setting a bunch
of extended descriptor bits unintentionally.

When using this infrastructure for framebuffer writes (in a future
patch), this ended up setting the extended descriptor bit 20, which is
"Null Render Target" on Icelake, causing nothing to be written to the
framebuffer.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-27 14:20:07 -07:00
Caio Marcelo de Oliveira Filho
b390ff3517 intel/fs: Add support for SLM fence in Gen11
Gen11 SLM is not on L3 anymore, so now the hardware has two separate
fences.  Add a way to control which fence types to use.

At this time, we don't have enough information in NIR to control the
visibility of the memory being fenced, so for now be conservative and
assume that fences will need a stall.  With more information later
we'll be able to reduce those.

Fixes Vulkan CTS tests in ICL:

    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp

The whole set of supported tests in dEQP-VK.memory_model.* group
should be passing in ICL now.

v2: Pass BTI around instead of having an enum.  (Jason)
    Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets
    transformed into two.  (Jason)
    List tests fixed.  (Lionel)

v3: For clarity, split the decision of which fences to emit from the
    emission code.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-11 08:29:32 -07:00
Sagar Ghuge
83fdec0f0d intel/compiler: Enable the emission of ROR/ROL instructions
v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support
       align16 mode (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Sagar Ghuge
3016756398 intel/compiler: Fix assertions in brw_alu3
v2: Fix assertion for src1 (Ian Romanick)

Fixes: 3b967e17 (intel/compiler: Avoid false positive assertions)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-03 23:14:34 -07:00
Jason Ekstrand
9e403dc56e intel/fs: Do a stalling MFENCE in endInvocationInterlock()
Fixes: 939312702e "i965: Add ARB_fragment_shader_interlock support"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Jason Ekstrand
859de4a748 intel/fs,vec4: Use g0 as the header for MFENCE
We set header_present but then pass it some random garbage.  Give it g0
instead.  I'm not actually sure this does anything but g0 is the usual
header data and this is what the windows driver does so it seems like a
good idea.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Iago Toral Quiroga
fb990bd76e intel/eu: force stride of 2 on NULL register for Byte instructions
The hardware only allows a stride of 1 on a Byte destination for raw
byte MOV instructions. This is required even when the destination
is the NULL register.

Rather than making sure that we emit a proper NULL:B destination
every time we need one, just fix it at emission time.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
8f40d392b9 intel/compiler: set correct precision fields for 3-source float instructions
Source0 and Destination extract the floating-point precision automatically
from the SrcType and DstType instruction fields respectively when they are
set to types :F or :HF. For Source1 and Source2 operands, we use the new
1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
means half-precision. Since we always use the type of the destination for
all operands when we emit 3-source instructions, we only need set Src1Type
and Src2Type to 1 when we are emitting a half-precision instruction.

v2:
 - Set the bit separately for each source based on its type so we can
   do mixed floating-point mode in the future (Topi).

v3:
 - Use regular citation style for the comment referencing the PRM (Matt).
 - Decided not to add asserts in the emission code to check that only
   mixed HF/F types are used since such checks would break negative tests
   for brw_eu_validate.c (Matt)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
e6b7410187 intel/compiler: allow half-float on 3-source instructions since gen8
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
4588f4a604 intel/compiler: handle extended math restrictions for half-float
Extended math with half-float operands is only supported since gen9,
but it is limited to SIMD8. In gen8 we lower it to 32-bit.

v2: quashed together the following patches (Jason):
  - intel/compiler: allow extended math functions with HF operands
  - intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
  - intel/compiler: extended Math is limited to SIMD8 on half-float

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
  (allow extended math functions with HF operands,
   extended Math is limited to SIMD8 on half-float)
2019-04-18 11:05:18 +02:00
Jason Ekstrand
10b7d14c31 intel/vec4: Drop dead code for handling typed surface messages
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Jason Ekstrand
c4fb6b0c81 intel/eu: Add an EOT parameter to send_indirect_[split]_message
For split indirect sends we have to put the EOT parameter in the
extended descriptor as well as the instruction itself so just calling
brw_inst_set_eot is insufficient.  Moving the EOT handling handling into
the send_indirect_[split]_message helper lets us handle it properly.
2019-02-25 11:35:12 -06:00
Jason Ekstrand
8babaa84e8 intel/eu: Add support for the SENDS[C] messages
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand
d6a6e10390 intel/inst: Indent some code
We're about to add some more if cases so let's have the giant re-indent
in it's own patch to make review easier.

Acked-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand
d2d3e04501 intel/fs: Use SHADER_OPCODE_SEND for surface messages
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00