Commit graph

452 commits

Author SHA1 Message Date
Jason Ekstrand
a1de102479 intel/fs: Use compare_func for wm_prog_key::alpha_test_func
Because 0 is no longer a recognizable value (it's NEVER, which isn't a
good default), we add an emit_alpha_test bool to tell the back-end when
to bother alpha testing.  This lets us only touch crocus with the
change.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14157>
2022-01-14 15:08:09 +00:00
Jordan Justen
d57b10ab98 intel/compiler: Adjust TCS instance-id for dg2+
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14385>
2022-01-05 16:13:28 -08:00
Marcin Ślusarz
a48f1d51e2 intel/compiler: disable workaround not applicable to gfx >= 11
There's nothing in bspec that would suggest this is still needed.
It only affected gfx 9 and 10.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14280>
2021-12-22 10:13:25 +00:00
Jason Ekstrand
eebb2dedb2 intel/fs: Add a NONE scheduling mode
While our LIFO scheduling mode attempts to optimize for register
pressure, it's often hard for a scheduling algorithm to do better than
the instruction order provided by the shader author.  Shader authors
often do perfectly reasonable things like using texture results
immediately after fetching them or constructing texture coordinates
immediately before the texture op.  When we throw all the instruction
ordering information away, we loose any help the author may have given
us.  By attempting NONE before we fall back to the worst case LIFO mode.

And, yes, I tried this with NONE both before and after LIFO and doing
NONE before LIFO is substantially better, according to shader-db.

    total instructions in shared programs: 19673152 -> 19665202 (-0.04%)
    instructions in affected programs: 33669 -> 25719 (-23.61%)
    helped: 20
    HURT: 0
    helped stats (abs) min: 15 max: 4609 x̄: 397.50 x̃: 107
    helped stats (rel) min: 2.33% max: 67.50% x̄: 14.60% x̃: 9.12%
    95% mean confidence interval for instructions value: -867.61 72.61
    95% mean confidence interval for instructions %-change: -21.74% -7.46%
    Inconclusive result (value mean confidence interval includes 0).

    total cycles in shared programs: 935562500 -> 935020920 (-0.06%)
    cycles in affected programs: 18620349 -> 18078769 (-2.91%)
    helped: 104
    HURT: 48
    helped stats (abs) min: 88 max: 60986 x̄: 8031.48 x̃: 3680
    helped stats (rel) min: 0.61% max: 51.44% x̄: 14.95% x̃: 8.87%
    HURT stats (abs)   min: 10 max: 54724 x̄: 6118.62 x̃: 1530
    HURT stats (rel)   min: 0.13% max: 46.45% x̄: 10.28% x̃: 6.46%
    95% mean confidence interval for cycles value: -5724.34 -1401.71
    95% mean confidence interval for cycles %-change: -9.86% -4.10%
    Cycles are helped.

    total spills in shared programs: 12158 -> 10327 (-15.06%)
    spills in affected programs: 1831 -> 0
    helped: 20
    HURT: 0

    total fills in shared programs: 14749 -> 12635 (-14.33%)
    fills in affected programs: 2114 -> 0
    helped: 20
    HURT: 0

    LOST:   8
    GAINED: 649

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
e6ddee764e intel/fs: Reset instruction order before re-scheduling
The way the current scheduler loop is implemented, each scheduling pass
starts with what the previous pass had.  This means that, if PRE screwed
everything up majorly, PRE_NON_LIFO would have to try to fix it.  It
also meant that tiny changes to one pass would affect every later pass.
Instead, reset the order of the instructions before each scheduling
pass.  This makes the passes entirely independent of each other.

Shader-db results on Ice Lake:

    total instructions in shared programs: 19670486 -> 19670648 (<.01%)
    instructions in affected programs: 25317 -> 25479 (0.64%)
    helped: 2
    HURT: 7
    helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
    helped stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07%
    HURT stats (abs)   min: 8 max: 70 x̄: 24.29 x̃: 12
    HURT stats (rel)   min: 0.41% max: 4.95% x̄: 1.47% x̃: 0.87%
    95% mean confidence interval for instructions value: -1.28 37.28
    95% mean confidence interval for instructions %-change: -0.04% 2.30%
    Inconclusive result (value mean confidence interval includes 0).

    total cycles in shared programs: 935535948 -> 935490243 (<.01%)
    cycles in affected programs: 421994824 -> 421949119 (-0.01%)
    helped: 1269
    HURT: 879
    helped stats (abs) min: 1 max: 12008 x̄: 259.38 x̃: 52
    helped stats (rel) min: <.01% max: 28.02% x̄: 1.12% x̃: 0.14%
    HURT stats (abs)   min: 1 max: 29931 x̄: 322.46 x̃: 20
    HURT stats (rel)   min: <.01% max: 32.17% x̄: 1.74% x̃: 0.22%
    95% mean confidence interval for cycles value: -71.37 28.81
    95% mean confidence interval for cycles %-change: -0.11% 0.21%
    Inconclusive result (value mean confidence interval includes 0).

    total spills in shared programs: 12403 -> 12430 (0.22%)
    spills in affected programs: 1355 -> 1382 (1.99%)
    helped: 2
    HURT: 7

    total fills in shared programs: 15128 -> 15182 (0.36%)
    fills in affected programs: 3294 -> 3348 (1.64%)
    helped: 2
    HURT: 7

    LOST:   21
    GAINED: 28

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
d49d092259 Revert "intel/fs: Do cmod prop again after scheduling"
This reverts commit ba2fa1ceaf.  Doing
optimizations after scheduling but before RA means doing them in the
middle of the scheduling loop which introduces additional dependencies
between one scheduling iteration and the next.  That won't work if we
want to make the scheduling modes independent, at least not unless we
have some way of fully cloning the IR.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
cf98a3cc19 intel/fs: Use OPT() for split_virtual_grfs
Now that we're being conservative in the pass, it's easy to tell when it
makes progress and we can put it in the OPT() macro.  This way, we get
nice INTEL_DEBUG=optimizer dumps for it.  While we're here, fix the
header comment which is massively out-of-date.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
38fa18a7a3 intel/fs: Be more conservative in split_virtual_grfs
Instead of modifying every single instruction, keep track of which VGRFs
are actually split in a bit-set, and only modify the instructions that
actually touch split regs.

This cuts the runtime of the following Vulkan CTS on my SKL box by 45%
from 3:21 to 1:51:  dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
3c89dbdbfe intel/fs: Implement the sample_pos_or_center system value
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Jason Ekstrand
a580fd55e1 intel/fs: Rework emit_samplepos_setup()
This rolls compute_sample_position into emit_samplepos_setup, its only
caller, by using a loop instead of calling it twice.  We also
early-return for the !persample_dispatch case instead of doing it as
part of the sample calculation.  This means that we don't call
fetch_payload_reg() to get sample_pos_reg unless we're actually going to
use it so the function is safe to call even if we haven't set up
sample_pos_reg.  This will be important for the next commit.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Jason Ekstrand
ac7255ed1e intel/fs: Return fs_reg directly from builtin setup helpers
There's no good reason why we're allocating them on the heap and
returning a pointer.  Return the fs_reg directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Caio Oliveira
2ad11b39bd intel/compiler: Use a struct for brw_compile_bs parameters
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14139>
2021-12-13 01:08:16 +00:00
Jason Ekstrand
278d12f991 intel/fs,vec4: Drop prog_data binding tables
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Jason Ekstrand
4fa58d27a5 intel/fs,vec4: Drop support for shader time
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Jason Ekstrand
8f3c100d61 intel/fs,vec4: Drop uniform compaction and pull constant support
The only driver using these was i965 and it's gone now.  This is all
dead code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Caio Oliveira
1f438eb033 intel/compiler: Implement Mesh Output
Use the same URB access helpers that were added for Task Output.  The
Arrayed I/O (per-primitive and per-vertex) is handled by applying the
pitch from the MUE layout into the NIR intrinsics and including the
non-arrayed offset on top of it.  After that, the index src can be
used directly for lowering.

Because we keep around the non-arrayed offset AND the pitch is
aligned, we can identify cases where the access is indirect but
guaranteed to be aligned, and dispatch a single message.  Added a TODO
to explore that later.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
db23c41537 intel/compiler: Add backend compiler basics for Task/Mesh
Task/Mesh stages are CS-like stages, and include many
builtins (e.g. workgroup ID/index) and intrinsics (e.g. workgroup
memory primitives) originally present only in CS.

This commit add two new stages (task and mesh) that 'inherit' from CS
by embedding a brw_cs_prog_data in their own prog_data structure, so
that CS functionality can be easily reused.  They also currently use
the same helpers to select the SIMD variant to use -- that was
recently added for CS.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
827cf65a26 intel/compiler: Export brw_nir_lower_simd
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
09dd05a219 intel/compiler: Make MUE available when setting up FS URB access
Allows to assert its existence for per-primitive variables and will
later be useful to implement the "more than 16 attributes" case for
Mesh.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
be89ea3231 intel/compiler: Handle per-primitive inputs in FS
In Fragment Shader, regular inputs are laid out in the thread payload
in a one dword per each half-GRF, that gives room for having the two
delta dwords needed for interpolation.

Per-primitive inputs are laid out before the regular inputs, and since
there's no need to have delta information, they are packed.  So
half-GRF will be fully filled with 4 dwords of input.

When num_per_primitive_inputs is zero (the default case), behavior
should be the same as before.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
7938c38778 intel/compiler: Properly lower WorkgroupId for Task/Mesh
Task/Mesh currently only support a single dimension (both in NV API
and HW), so make Y and Z be zero.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Topi Pohjolainen
261dd6c8f8 intel/compiler: Add new variant for TXF_CMS_W
This allows, for example, fs_inst::components_read() without passing
devinfo as extra argument.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
2021-11-22 21:27:30 -08:00
Topi Pohjolainen
24831bbd40 intel/compiler: Prepare ld2dms_w for 4 mcs components
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
2021-11-22 21:27:30 -08:00
Topi Pohjolainen
dfe0ba9080 intel/compiler: Demote sampler params to 16-bit for CMS/UMS/MCS
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
2021-11-22 21:27:30 -08:00
Topi Pohjolainen
0374b56faa intel/compiler/fs: Add support for 16-bit sampler msg payload
For SIMD8 half float payload, each component takes a full register, so
we can use existing LOAD_PAYLOAD infrastruture for required padding by
alternating plain 8-wide half float vector and null vector.

Also this patch removes an unwanted assertion from
opt_copy_propagation_local for LOAD_PAYLOAD.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
2021-11-22 21:27:30 -08:00
Sagar Ghuge
936412af27 intel/compiler: Add helper to support half float payload with padding
To support SIMD8 half float payloads, each component takes one full
32bit wide register in both SIMD8H and SIMD16H mode. So we can make use
of existing LOAD_PAYLOAD infrastructure alternating a half float vector
and a null vector, in order to handle required padding.

v2: (Francisco)
- Skip header sources
- Fix comparision units
- Don't allocate VGRF for padded source

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
2021-11-22 21:27:30 -08:00
Sagar Ghuge
be2bfe5fe8 intel/compiler: Don't hardcode padding source type to 32bit
We can use LOAD_PAYLOAD infrastructure in order to handle 16bit float
payload. Let's rely on source type for padding sources, if not set
previously then default one would be 32-bit.

This patch will be used later in the series to handle 16-bit float
payloads.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
2021-11-22 21:27:30 -08:00
Lionel Landwerlin
361b3fee3c intel: move away from booleans to identify platforms
v2: Drop changes around GFX_VERx10 == 75 (Luis)

v3: Replace
   (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT)
   by
   (devinfo->platform == INTEL_PLATFORM_IVB)
   Replace
   (devinfo->ver >= 5 || devinfo->platform == INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 >= 45)
   Replace
   (devinfo->platform != INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 != 45)

v4: Fix crocus typo

v5: Rebase

v6: Add GFX3, ILK & I965 platforms (Jordan)
    Move ifdef to code expressions (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>
2021-11-08 16:48:06 +00:00
Caio Oliveira
858424bd2e intel/compiler: Use gl_shader_stage_uses_workgroup() helpers
Instead of checking for MESA_SHADER_COMPUTE (and KERNEL).  Where
appropriate, also use gl_shader_stage_is_compute().

This allows most of the workgroup-related lowering to be applied to
Task and Mesh shaders.  These will be added later and "inherit" from
cs_prog_data structure.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13629>
2021-11-03 11:09:48 -07:00
Caio Oliveira
c4355d3f24 intel/compiler: Make brw_nir_populate_wm_prog_data() static
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13629>
2021-11-03 11:09:48 -07:00
Vinson Lee
d863e62b32 intel/compiler: Change selected_simd return type to int.
brw_simd_select return type is int.

Fix defect reported by Coverity Scan.

Unsigned compared against 0 (NO_EFFECT)
unsigned_compare: This less-than-zero comparison of an unsigned value is never true. selected_simd < 0U.

Fixes: 7dda0cf2b8 ("intel/compiler: Use SIMD selection helpers for CS")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13606>
2021-11-03 01:48:46 +00:00
Caio Marcelo de Oliveira Filho
4e7b71e00c intel/compiler: Use SIMD selection helpers for variable workgroup size
Variable workgroup size works by compiling as much SIMD variants as
possible and then selecting the right one during dispatch (when the
actual workgroup size is passed to us).

Instead of replicating the logic in a separate function, reuse the
same logic for regular SIMD selection.  And move function for that
together with the remaining simd selection functions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13249>
2021-10-26 17:49:09 +00:00
Caio Marcelo de Oliveira Filho
7dda0cf2b8 intel/compiler: Use SIMD selection helpers for CS
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13249>
2021-10-26 17:49:09 +00:00
Sagar Ghuge
b83c9b21a6 intel/compiler: Set correct cache policy for A64 byte scattered read
This doesn't impact any performance since the previous typo value
matches the current cache control value.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13458>
2021-10-21 17:32:23 +00:00
Marcin Ślusarz
d05f7b4a2c intel: fix INTEL_DEBUG environment variable on 32-bit systems
INTEL_DEBUG is defined (since 4015e1876a) as:

 #define INTEL_DEBUG __builtin_expect(intel_debug, 0)

which unfortunately chops off upper 32 bits from intel_debug
on platforms where sizeof(long) != sizeof(uint64_t) because
__builtin_expect is defined only for the long type.

Fix this by changing the definition of INTEL_DEBUG to be function-like
macro with "flags" argument. New definition returns 0 or 1 when
any of the flags match.

Most of the changes in this commit were generated using:
for c in `git grep INTEL_DEBUG | grep "&" | grep -v i915 | awk -F: '{print $1}' | sort | uniq`; do
    perl -pi -e "s/INTEL_DEBUG & ([A-Z0-9a-z_]+)/INTEL_DBG(\1)/" $c
    perl -pi -e "s/INTEL_DEBUG & (\([A-Z0-9_ |]+\))/INTEL_DBG\1/" $c
done
but it didn't handle all cases and required minor cleanups (like removal
of round brackets which were not needed anymore).

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13334>
2021-10-15 19:55:14 +00:00
Marcin Ślusarz
9e22e0838a intel/compiler: use nir_shader_instructions_pass in brw_nir_demote_sample_qualifiers
Changes:
- nir_metadata_preserve(..., nir_metadata_block_index | nir_metadata_dominance)
  is called only when pass makes progress
- nir_metadata_preserve(..., nir_metadata_all) is called when pass doesn't
  make progress
- pass returns true ONLY when it makes progress ("progress" was initialized incorrectly)

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Lionel Landwerlin
4e4560ab6f intel/compiler: add missing line returns to logs
In the upcoming intel_clc tool, we're allowing to print these messages
out and some of them just don't look right.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13165>
2021-10-05 07:31:52 +00:00
Marcin Ślusarz
6229d40fbf intel/compiler: drop redundant likely's around INTEL_DEBUG
They are not needed since 4015e1876a.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13115>
2021-10-01 07:57:20 +00:00
Jordan Justen
51528aeb60 intel/compiler: Use INTEL_DEBUG=blorp to dump blorp compute shaders
Make INTEL_DEBUG=blorp dump the blorp compute shaders instead using
the general INTEL_DEBUG=cs which is now reserved for actual compute
programs.

Ref: 05933fb0f7 ("intel/compiler: Use INTEL_DEBUG=blorp to dump blorp shaders")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
2021-09-30 17:41:33 +00:00
Jason Ekstrand
35ac184de0 intel/fs: Handle required subgroup sizes specified in the SPIR-V
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12959>
2021-09-21 18:34:59 +00:00
Sagar Ghuge
75e28b8777 intel/compiler: Add support to handle 64-bit atomics with A32 messages
v1: (Jason)
- Fix parentheses

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12566>
2021-09-09 23:34:33 +00:00
Sagar Ghuge
527468f56f intel/compiler: Add 64-bit A64 float logical opcode support
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12566>
2021-09-09 23:34:33 +00:00
Jason Ekstrand
7b21def9c2 intel/fs: Add support for atomic_fadd
Rework:
- Enable float32 atomic add with LSC (Sagar)
- disassemble new opcode (Caio)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12566>
2021-09-09 23:34:33 +00:00
Lionel Landwerlin
838c0e5eef intel/fs: fix framebuffer reads
We're missing some restrictions on those messages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5292
Cc: mesa-stable
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12615>
2021-08-31 11:52:39 +00:00
Ian Romanick
a9120eccff intel/compiler: Move type_is_unsigned_int to brw_reg_type.h
...and rename it to brw_reg_type_is_unsigned_integer.  It is now next to
brw_reg_type_is_floating_point and brw_reg_type_is_integer.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Ian Romanick
38807ceeae intel/fs: sel.cond writes the flags on Gfx4 and Gfx5
On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented
using a separte cmpn and sel instruction.  This lowering occurs in
fs_vistor::lower_minmax which is called very, very late... a long, long
time after the first calls to opt_cmod_propagation.  As a result,
conditional modifiers can be incorrectly propagated across sel.cond on
those platforms.

No tests were affected by this change, and I find that quite shocking.
After just changing flags_written(), all of the atan tests started
failing on ILK.  That required the change in cmod_propagatin (and the
addition of the prop_across_into_sel_gfx5 unit test).

Shader-db results for ILK and GM45 are below.  I looked at a couple
before and after shaders... and every case that I looked at had
experienced incorrect cmod propagation.  This affected a LOT of apps!
Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2,
Gang Beasts, and on and on... :(

I discovered this bug while working on a couple new optimization
passes.  One of the passes attempts to remove condition modifiers that
are never used.  The pass made no progress except on ILK and GM45.
After investigating a couple of the affected shaders, I noticed that
the code in those shaders looked wrong... investigation led to this
cause.

v2: Trivial changes in the unit tests.

v3: Fix type in comment in unit tests.  Noticed by Jason and Priit.

v4: Tweak handling of BRW_OPCODE_SEL special case.  Suggested by Jason.

Fixes: df1aec763e ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Dave Airlie <airlied@redhat.com>

Iron Lake
total instructions in shared programs: 8180493 -> 8181781 (0.02%)
instructions in affected programs: 541796 -> 543084 (0.24%)
helped: 28
HURT: 1158
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50%
HURT stats (abs)   min: 1 max: 3 x̄: 1.14 x̃: 1
HURT stats (rel)   min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23%
95% mean confidence interval for instructions value: 1.06 1.11
95% mean confidence interval for instructions %-change: 0.31% 0.38%
Instructions are HURT.

total cycles in shared programs: 239420470 -> 239421690 (<.01%)
cycles in affected programs: 2925992 -> 2927212 (0.04%)
helped: 49
HURT: 157
helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70
helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96%
HURT stats (abs)   min: 2 max: 48 x̄: 27.34 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20%
95% mean confidence interval for cycles value: -0.80 12.64
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4985517 -> 4986207 (0.01%)
instructions in affected programs: 306935 -> 307625 (0.22%)
helped: 14
HURT: 625
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49%
HURT stats (abs)   min: 1 max: 3 x̄: 1.13 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.29% 0.36%
Instructions are HURT.

total cycles in shared programs: 153827268 -> 153828052 (<.01%)
cycles in affected programs: 1669290 -> 1670074 (0.05%)
helped: 24
HURT: 84
helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67
helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94%
HURT stats (abs)   min: 2 max: 48 x̄: 27.71 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14%
95% mean confidence interval for cycles value: -1.94 16.46
95% mean confidence interval for cycles %-change: -0.29% 0.11%
Inconclusive result (value mean confidence interval includes 0).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>
2021-08-11 13:09:20 -07:00
Ian Romanick
5ffbee84a4 intel/compiler: Add id parameter to shader_perf_log callback
There are two problems with the current architecture.

In OpenGL, the id is supposed to be a unique identifier for a particular
log source.  This is done so that applications can (theoretically)
filter particular log messages.  The debug callback infrastructure in
Mesa assigns a uniqe value when a value of 0 is passed in.  This causes
the id to get set once to a unique value for each message.

By passing a stack variable that is initialized to 0 on every call,
every time the same message is logged, it will have a different id.
This isn't great, but it's also not catastrophic.

When threaded shader compiles are used, the id *pointer* is saved and
dereferenced at a possibly much later time on a possibly different
thread.  This causes one thread to access the stack from a different
thread... and that stack frame might not be valid any more. :(

I have not observed any crashes related to this particular issue.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12136>
2021-08-01 23:58:08 +00:00
Dave Airlie
c8783001c7 intel/fs: restrict max push length on older GPUs to a smaller amount
Fixes crash in dEQP-GLES2.functional.uniform_api.random.79

Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12093>
2021-07-30 15:17:21 +10:00
Sagar Ghuge
ef29bb6bc5 intel/compiler: Handle ternary add in lower_simd_width
We need to lower the add3 instruction simd width otherwise in simd32
mode, we endup writing 4 register wide data which is not allowed.

Reported-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11985>
2021-07-22 23:38:04 +00:00
Jason Ekstrand
6642749458 intel/dev: Add a max_cs_workgroup_threads field
This is distinct form max_cs_threads because it also encodes
restrictions about the way we use GPGPU/COMPUTE_WALKER.  This gets rid
of the MIN2(64, devinfo->max_cs_threads) we have scattered all over the
driver and puts it in a central place.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11861>
2021-07-14 23:02:34 +00:00