Commit graph

697 commits

Author SHA1 Message Date
Jason Ekstrand
75209d5bd1 intel/fs: Add and implement intel-specific ray-tracing intrinsics
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
2020-11-25 05:37:10 +00:00
Jason Ekstrand
1f6e70c85a intel/fs: Add and implement a load_global_const_block intrinsic
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
2020-11-25 05:37:09 +00:00
Jason Ekstrand
7280b0911d intel/compiler: Add support for bindless shaders
The Intel bindless thread dispatch model is very simple.  When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs.  These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned.  Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke.  When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers.  The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
2020-11-25 05:37:09 +00:00
Kenneth Graunke
97ebb896af intel/compiler: Do interpolateAtOffset coordinate scaling in NIR
In our source languages, interpolateAtOffset() takes a floating point
offset in the range [-0.5, +0.5].  However, the hardware takes integer
valued offsets in the range [-8, 7], in units of 1/16th of a pixel.

So, we need to multiply and clamp the coordinates.  We were doing this
in the FS backend, but with the advent of IBC, I'd like to avoid doing
it twice.  This patch instead moves the lowering to NIR so we can reuse
it across both backends.

v2: Use nir_shader_instructions_pass (suggested by Eric Anholt).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6193>
2020-11-18 23:26:53 +00:00
Jason Ekstrand
68092df8d8 intel/nir: Lower 8-bit ops to 16-bit in NIR on Gen11+
Intel hardware supports 8-bit arithmetic but it's tricky and annoying:

  - Byte operations don't actually execute with a byte type.  The
    execution type for byte operations is actually word.  (I don't know
    if this has implications for the HW implementation.  Probably?)

  - Destinations are required to be strided out to at least the
    execution type size.  This means that B-type operations always have
    a stride of at least 2.  This means wreaks havoc on the back-end in
    multiple ways.

  - Thanks to the strided destination, we don't actually save register
    space by storing things in bytes.  We could, in theory, interleave
    two byte values into a single 2B-strided register but that's both a
    pain for RA and would lead to piles of false dependencies pre-Gen12
    and on Gen12+, we'd need some significant improvements to the SWSB
    pass.

  - Also thanks to the strided destination, all byte writes are treated
    as partial writes by the back-end and we don't know how to copy-prop
    them.

  - On Gen11, they added a new hardware restriction that byte types
    aren't allowed in the 2nd and 3rd sources of instructions.  This
    means that we have to emit B->W conversions all over to resolve
    things.  If we emit said conversions in NIR, instead, there's a
    chance NIR can get rid of some of them for us.

We can get rid of a lot of this pain by just asking NIR to get rid of
8-bit arithmetic for us.  It may lead to a few more conversions in some
cases but having back-end copy-prop actually work is probably a bigger
bonus.  There is still a bit we have to handle in the back-end.  In
particular, basic MOVs and conversions because 8-bit load/store ops
still require 8-bit types.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>
2020-11-09 18:58:51 +00:00
Jason Ekstrand
b98f0d3d7c intel/nir: Lower 8-bit scan/reduce ops to 16-bit
We can't really support these directly on any platform.  May as well let
NIR lower them.  The NIR lowering is potentially one more instruction
for scan/reduce ops thanks to not being able to do the B->W conversion
as part of SEL_EXEC.  For imax/imin exclusive scan, it's yet another
instruction thanks to the extra imax/imin NIR has to insert to deal with
the fact that the first live channel will contain the identity value
which, when signed, will cast wrong.  However, it does let us drop some
complexity from our back-end so it's probably worth it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>
2020-11-09 18:58:51 +00:00
Caio Marcelo de Oliveira Filho
5d5f3e3a47 intel/fs: Implement nir_intrinsic_{load,store}_shared_block_intel
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>
2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho
9fe158e1d1 intel/fs: Implement nir_intrinsic_{load,store}_ssbo_block_intel
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>
2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho
296137df53 intel/fs: Implement nir_intrinsic_{load,store}_global_block_intel
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>
2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho
ce0b72a13a intel/fs: Don't emit_uniformize when getting a constant SSBO index
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7340>
2020-10-29 21:54:01 +00:00
Caio Marcelo de Oliveira Filho
e7e24d5039 intel/fs: Handle nir_intrinsic_terminate
For terminate operation, jump the invocation without predicating on
the rest of the quad being disabled -- which is what is done for
demote and discard.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7150>
2020-10-15 21:40:09 +00:00
Timur Kristóf
2be99012e9 nir: Add ability to count emitted GS primitives.
Add an option to nir_lower_gs_intrinsics which tells it to track
the number of emitted primitives, not just vertices. Additionally,
also make it per-stream.

Also rename the set_vertex_count intrinsic to
set_vertex_and_primitive_count.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
2020-10-09 15:26:14 +02:00
Jason Ekstrand
0d462dbee5 intel/fs: Add an alignment to VARYING_PULL_CONSTANT_LOAD_LOGICAL
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3932>
2020-10-08 01:14:46 -05:00
Danylo Piliaiev
77486db867 intel/fs: Disable sample mask predication for scratch stores
Scratch stores are being lowered to the instructions with side-effects,
however they should be enabled in fs helper invocations, since they
are produced from operations which don't imply side-effects.

To fix this - we move the decision of whether the sample mask predication
is enable to the point where logical brw instructions are created.

GLSL example of the issue:

 int tmp[1024];
 ...
 do {
   // changes to tmp
 } while (some_condition(tmp))

If `tmp` is lowered to scrach memory, `some_condition` would be
undefined if scratch write is predicated on sample mask, making
possible for the while loop to become infinite and hang the GPU.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3256
Fixes: 53bfcdeecf
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6056>
2020-09-25 09:48:06 +00:00
Jason Ekstrand
9750164c09 nir: Rename get_buffer_size to get_ssbo_size
This makes it explicit that this intrinsic is only for SSBOs.  For the
v3dv driver, we'll be adding a get_ubo_size intrinsic and we want to be
able to distinguish between the two.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6812>
2020-09-22 13:34:12 +00:00
Marcin Ślusarz
18eb853ac8 intel/compiler: quiet Coverity warnings
Coverity complains about possible out-of-bounds write & read, because
it thinks that "loc + i" can be bigger than sizes of the 2 used arrays.

It's not obvious from the code it cannot happen, so add asserts here.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6667>
2020-09-10 12:16:58 +00:00
Jason Ekstrand
fe18a0fd45 intel/nir: Lower load_num_work_groups to 32-bit if needed
For OpenCL-style kernels, this builtin is 64-bit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6570>
2020-09-02 20:38:22 +00:00
Jason Ekstrand
5799da47c7 intel/fs: Use a single untyped surface read for load_num_work_groups
There's no good reason to split this into three.  Sure, CS indirects are
only guaranteed by the spec to be DWORD aligned, but that's all untyped
surface reads require anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6570>
2020-09-02 20:38:22 +00:00
Jason Ekstrand
91becd84ae intel/fs: Add support for a new load_reloc_const intrinsic
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
2020-09-02 19:48:44 +00:00
Marcin Ślusarz
ed9ac3d60c intel/fs,vec4: remove unused assignments
Reported by Coverity.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6126>
2020-09-02 15:08:01 +00:00
Jason Ekstrand
ff2f44d865 intel/fs: Implement nir_intrinsic_load_global_constant
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6379>
2020-09-01 20:50:04 +00:00
Jason Ekstrand
55ae704513 intel/fs: Add support for vec8 and vec16 ops
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6502>
2020-08-31 17:04:40 +00:00
Jason Ekstrand
febe762246 intel/fs: Fix an assert in load_scratch
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6405>
2020-08-21 22:49:54 +00:00
Karol Herbst
e5899c1e88 nir: rename nir_op_fne to nir_op_fneu
It was always fneu but naming it fne causes confusion from time to time. So
lets rename it. Later we also want to add other unordered and fne, this is
a smaller preparation for that.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>
2020-08-21 17:26:21 +00:00
Jason Ekstrand
1ccd681109 nir: Add an LOD parameter to image_*_size
The OpenCL image_width/height/depth functions have variants which can
take an LOD parameter.  More importantly, LLVM-SPIRV-Translator always
generates OpImageQuerySizeLod even if the LOD is guaranteed to be zero.
Given that over half the hardware out there has an LOD field for image
size queries (based on a rudimentary scan through their NIR -> whatever
code), we may as well just add the source to the NIR intrinsic.  If this
is ever a problem for anyone, the lowering is pretty trivial.

I've also added asserts to everyone's drivers that should alert them if
they ever see an LOD other than zero.  This will never happen with GL or
Vulkan so there's no need for panic.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6396>
2020-08-20 20:48:10 +00:00
Jason Ekstrand
003b04e266 intel/compiler: Allow MESA_SHADER_KERNEL
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>
2020-08-12 10:11:06 +00:00
Jason Ekstrand
2956d53400 nir: Add nir_foreach_shader_in/out_variable helpers
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
2020-07-29 17:38:57 +00:00
Jason Ekstrand
675d7b19a9 intel/fs: Use the correct logical op for global float atomics
Fixes: e644ed468f "intel/fs: Implement nir_intrinsic_global_atomic_*"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5992>
2020-07-21 05:01:34 +00:00
Caio Marcelo de Oliveira Filho
fe214d60bc intel/fs: Add Fall-through comment
Just to clarify the missing break is intentional.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5365>
2020-06-08 15:49:24 +00:00
Boris Brezillon
345b5847b4 nir: Replace the scoped_memory barrier by a scoped_barrier
SPIRV OpControlBarrier can have both a memory and a control barrier
which some hardware can handle with a single instruction. Let's
turn the scoped_memory_barrier into a scoped barrier which can embed
both barrier types. Note that control-only or memory-only barriers can
be supported through this new intrinsic by passing NIR_SCOPE_NONE to the
unused barrier type.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4900>
2020-06-03 07:39:52 +00:00
Jason Ekstrand
c48f42e178 intel/fs: Emit HALT for discard on Gen4-5
Using HALT to immediately jump to the end of the shader is required to
implement GL_EXT_gpu_shader4 and OpenGL 3.0.  However, vanilla OpenGL
1.2 doesn't forbid it and it likely makes something somewhere faster.
We should be consistent and implement the same discard behavior on all
hardware if we can.

The rules for HALT on Gen4-5 are a bit different from Gen6+.  On the
older hardware, there is no stack for HALT; instead it's up to software
to save and restore mask registers.  However, there's no real saving
needed since we only use HALT to jump to the end of the program where
we're about about to do our FB writes.  All we need to do is reset AMask
to DMask, the value it was initialized to at the start of the thread.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5244>
2020-05-30 06:21:15 +00:00
Caio Marcelo de Oliveira Filho
5e0525e145 intel/fs: Remove unused emission of load_simd_with_intel
The nir_intrinsic_load_simd_width_intel is always lowered by the
brw_nir_lower_simd() pass before the emission happens.  This is likely
a "leftover" from patch rewriting/squashing that happened when this
intrinsic was added.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5213>
2020-05-26 20:35:03 +00:00
Caio Marcelo de Oliveira Filho
6a6c36e977 intel/fs: Use writes_memory from shader_info
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4815>
2020-05-18 21:09:17 +00:00
Arcady Goldmints-Orlov
95fd950d35 intel/compiler: fix alignment assert in nir_emit_intrinsic
Fixes: c643979228 (intel/fs: Choose memory message type based on bit size)
Fixes: dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i8vec2

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5000>
2020-05-12 22:14:31 +00:00
Caio Marcelo de Oliveira Filho
2663759af0 intel/fs: Add and use a new load_simd_width_intel intrinsic
Intrinsic to get the SIMD width, which not always the same as subgroup
size.  Starting with a small scope (Intel), but we can rename it later
to generalize if this turns out useful for other drivers.

Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of
a width will be passed as argument.  The pass also used to optimized
load_subgroup_id for the case that the workgroup fitted into a single
thread (it will be constant zero).  This optimization moved together
with lowering of the SIMD.

This is a preparation for letting the drivers call it before the
brw_compile_cs() step.

No shader-db changes in BDW, SKL, ICL and TGL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:37 -07:00
Caio Marcelo de Oliveira Filho
4b000b491a intel/fs: Add an option to lower variable group size in backend
Adding this since Iris will handle variable group size parameters by
itself.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:28 -07:00
Caio Marcelo de Oliveira Filho
0edb58a84e intel/fs: Clean up variable group size handling in backend
Just use the information from NIR shader_info.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:28 -07:00
D Scott Phillips
7bd15135a6 intel/fs: Update location of Render Target Array Index for gen12
Render Target Array Index has moved from R0.0[26:16] to
R1.1[26:16] on gen12.

Fixes dEQP-VK.multiview.input_attachments.*

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4836>
2020-05-01 08:48:22 -07:00
Caio Marcelo de Oliveira Filho
a3cba3c771 intel/fs: Only stall after sending all memory fence messages
In Gen11+, when emitting a fence for both L3 and SLM, the generated
code would look like

    SEND, MOV (for stall), SEND, MOV (for stall)

This commit change that so two SENDs are emitted before the MOVs for
stall.  This is similar to the approach used in Ivy Bridge for the
render fence.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho
f858fa26b4 intel/fs,vec4: Pull stall logic for memory fences up into the IR
Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.

For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).

Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics.  The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).

This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.

Shader-db results for IVB (i965):

    total instructions in shared programs: 11971190 -> 11971200 (<.01%)
    instructions in affected programs: 11482 -> 11492 (0.09%)
    helped: 0
    HURT: 8
    HURT stats (abs)   min: 1 max: 3 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
    95% mean confidence interval for instructions value: 0.66 1.84
    95% mean confidence interval for instructions %-change: 0.01% 0.27%
    Instructions are HURT.

  Unlike the previous code, that used the `mov g1 g2` trick to force
  both `g1` and `g2` to stall, the scheduling fence will generate `mov
  null g1` and `mov null g2`.  During review it was decided it was not
  worth keeping the special codepath for the small effect will have.

Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below

    total cycles in shared programs: 341738444 -> 341710570 (<.01%)
    cycles in affected programs: 7240002 -> 7212128 (-0.38%)
    helped: 46
    HURT: 5
    helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
    helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
    HURT stats (abs)   min: 2 max: 1768 x̄: 646.40 x̃: 362
    HURT stats (rel)   min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
    95% mean confidence interval for cycles value: -777.71 -315.38
    95% mean confidence interval for cycles %-change: -1.42% -0.83%
    Cycles are helped.

  This seems to be the effect of allocating two registers separatedly
  instead of a single one with size 2, which causes different register
  allocation, affecting the cycle estimates.

while ICL also has not change on instruction count but report changes
negative changes in cycles

    total cycles in shared programs: 352665369 -> 352707484 (0.01%)
    cycles in affected programs: 9608288 -> 9650403 (0.44%)
    helped: 4
    HURT: 104
    helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
    helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
    HURT stats (abs)   min: 2 max: 2016 x̄: 408.36 x̃: 48
    HURT stats (rel)   min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
    95% mean confidence interval for cycles value: 256.67 523.24
    95% mean confidence interval for cycles %-change: 0.63% 1.03%
    Cycles are HURT.

  AFAICT this is the result of the case above.

Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions

    total instructions in shared programs: 17690586 -> 17690597 (<.01%)
    instructions in affected programs: 64617 -> 64628 (0.02%)
    helped: 55
    HURT: 32
    helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
    helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
    HURT stats (abs)   min: 1 max: 65 x̄: 7.44 x̃: 2
    HURT stats (rel)   min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
    95% mean confidence interval for instructions value: -2.03 2.28
    95% mean confidence interval for instructions %-change: -0.41% 0.15%
    Inconclusive result (value mean confidence interval includes 0).

  Now that more is done in the IR, more dependencies are visible and
  more SWSB annotations are emitted.  Mixed with different register
  allocation decisions like above, some shaders will see more `sync
  nops` while others able to avoid them.

  Most of the new `sync nops` are also redundant and could be dropped,
  which will be fixed in a separate change.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00
Kenneth Graunke
4459a70a6e intel/compiler: Delete abs/neg handling in fsign code
This should have gone away when removing source modifiers.  They won't
be set any longer, so this is simply dead code.

Fixes: b7c47c4f7c ("intel/compiler: Drop nir_lower_to_source_mods() and related handling.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4691>
2020-04-22 17:04:37 -07:00
Kenneth Graunke
902c8731f4 intel/compiler: Put back saturate on [iu]add_sat opcodes
I deleted one too many inst->saturate = ... lines.  This one must stay.

Fixes: b7c47c4f7c ("intel/compiler: Drop nir_lower_to_source_mods() and related handling.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4669>
2020-04-22 00:47:40 +00:00
Kenneth Graunke
b7c47c4f7c intel/compiler: Drop nir_lower_to_source_mods() and related handling.
I think we're unanimous in wanting to drop nir_lower_to_source_mods.
It's a bit of complexity to handle in the backend, but perhaps more
importantly, would be even more complexity to handle in nir_search.

And, it turns out that since we made other compiler improvements in the
last few years, they no longer appear to buy us anything of value.
Summarizing the results from shader-db from this patch:

 - Icelake (scalar mode)

   Instruction counts:

   - 411 helped, 598 hurt (out of 139,470 shaders)
   - 99.2% of shaders remain unaffected.  The average increase in
     instruction count in hurt programs is 1.78 instructions.
   - total instructions in shared programs: 17214951 -> 17215206 (<.01%)
   - instructions in affected programs: 1143879 -> 1144134 (0.02%)

   Cycles:

   - 1042 helped, 1357 hurt
   - total cycles in shared programs: 365613294 -> 365882263 (0.07%)
   - cycles in affected programs: 138155497 -> 138424466 (0.19%)

 - Haswell (both scalar and vector modes)

   Instruction counts:

   - 73 helped, 1680 hurt (out of 139,470 shaders)
   - 98.7% of shaders remain unaffected.  The average increase in
     instruction count in hurt programs is 1.9 instructions.
   - total instructions in shared programs: 14199527 -> 14202262 (0.02%)
   - instructions in affected programs: 446499 -> 449234 (0.61%)

   Cycles:

   - 5253 helped, 5559 hurt
   - total cycles in shared programs: 359996545 -> 360038731 (0.01%)
   - cycles in affected programs: 155897127 -> 155939313 (0.03%)

Given that ~99% of shader-db remains unaffected, and the affected
programs are hurt by about 1-2 instructions - which are all cheap
ALU instructions - this is unlikely to be measurable in terms of
any real performance impact that would affect users.

So, drop them and simplify the backend, and hopefully enable other
future simplifications in NIR.

Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4616>
2020-04-21 21:42:21 +00:00
Jason Ekstrand
7c43b8ce1b nir: Delete the fnoise opcodes
As of the previous commit, they are never used.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
2020-04-21 06:16:13 +00:00
Plamena Manolova
c77dc51203 intel/compiler: Add support for variable workgroup size
Add new builtin parameters that are used to keep track of the group
size.  This will be used to implement ARB_compute_variable_group_size.

The compiler will use the maximum group size supported to pick a
suitable SIMD variant.  A later improvement will be to keep all SIMD
variants (like FS) so the driver can select the best one at dispatch
time.

When variable workgroup size is used, the small workgroup optimization
is disabled as it we can't prove at compile time that the barriers
won't be needed.

Extracted from original i965 patch with additional changes by
Caio Marcelo de Oliveira Filho.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
2020-04-09 19:23:12 -07:00
Jason Ekstrand
c643979228 intel/fs: Choose memory message type based on bit size
Thanks to the NIR vectorizing pass, we're about to see alignments that
are higher than the bit size.  Previously, we could use either and we
just happened to choose alignment (probably the wrong choice) so it's
harmless to switch to detecting based on bit size.  This commit changes
things to take both into account which is more accurate to what the
messages we're using do.  We also beef up the asserts and make them more
consistent, more accurate, and more complete.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
2020-04-03 20:26:54 +00:00
Ian Romanick
c144875f62 intel/fs: Allow NOT instructions in conditional discard optimization
I don't know why I explicitly disallowed NOT in the first place. :(

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14549846 -> 14549770 (<.01%)
instructions in affected programs: 12934 -> 12858 (-0.59%)
helped: 76
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.13% max: 5.56% x̄: 1.04% x̃: 0.90%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.25% -0.84%
Instructions are helped.

total cycles in shared programs: 203793967 -> 203792696 (<.01%)
cycles in affected programs: 77920 -> 76649 (-1.63%)
helped: 67
HURT: 1
helped stats (abs) min: 2 max: 36 x̄: 19.00 x̃: 16
helped stats (rel) min: 0.04% max: 4.68% x̄: 2.35% x̃: 2.28%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -20.75 -16.63
95% mean confidence interval for cycles %-change: -2.57% -2.05%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3965>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3965>
2020-03-09 16:46:28 -07:00
Eric Anholt
3e16434acd nir: Move intel's intrinsic_image_coordinate_components() to core nir.
This is a query that both Intel and freedreno need to do.  We can simplify
it a lot with the new glsl_get_sampler_dim_coordinate_components()

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
2020-02-24 18:25:02 +00:00
Ian Romanick
273b8cd1ca intel/fs: Correctly handle multiply of fsign with a source modifier
The other source of the multiply will be interpreted as a uint32_t in an
XOR instruction.  Any source modifiers with either not be interpreted at
all or will be misinterpreted due to the differing types.

If the other operand of the multiplication has a source modifier, just
emit an extra move to resolve the source modifiers.

The negation source modifier problem is difficult to reproduce due to an
algebraic optimization that changes (-a*b) to -(a*b).  However, changes
in MR !1359 push the negations back down.

On Gen7+ it might be possible to do slightly better for an abs() source
modifier by using BFI2 as a glorified copysign().

On Gen8+ it might be possible to do slightly better for a neg() source
modifier by emitting (~a ^ b).

There were no shader-db changes on any Intel platform, so I think we can
deal with that problem when it arises.

See also piglit!224.

Fixes: 06d2c11641 ("intel/fs: Add a scale factor to emit_fsign")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
2020-02-19 23:51:42 +00:00
Francisco Jerez
8d3b86e34a intel/fs/gen7+: Implement discard/demote for SIMD32 programs.
At this point this simply involves fixing the initialization of the
sample mask flag register to take the right dispatch mask from the
thread payload, and fixing sample_mask_reg() to return f1.1 for the
second half of a SIMD32 thread.  This improves Manhattan 3.1
performance by 2.4%±0.31% (N>40) on my ICL with SIMD32 enabled
relative to falling back to SIMD16 for the shaders that use discard.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:49 -08:00