Commit graph

1857 commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho
29177c7cee intel/compiler: Build all tests in a single binary
With gtest is possible to filter execution and run only a specific
test suite or individual test, so there's no particular reason here to
generate multiple binaries for the tests of a single module.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13303>
2021-10-15 10:06:51 -07:00
Caio Marcelo de Oliveira Filho
35b6990706 intel/compiler: Rename vec4 test fixtures
Include vec4 in their names to avoid same names as the fs
counterparts.  This will allow compiling all the tests together in the
future.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13303>
2021-10-15 10:06:51 -07:00
Jason Ekstrand
e6cce80976 intel/fs: Stop emitting TGM fences for nir_var_mem_ssbo
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:56 +00:00
Caio Marcelo de Oliveira Filho
2d7065ef04 intel/fs: Consider nir_var_mem_image for TGM fences
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:55 +00:00
Dave Airlie
c4323dc846 brw/nir: remove unused function prototypes.
These got moved into common code a good while ago.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13328>
2021-10-13 22:52:59 +00:00
Caio Marcelo de Oliveira Filho
94e07058ee intel/compiler: Remove unused ret declaration
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13340>
2021-10-13 17:24:29 +00:00
Caio Marcelo de Oliveira Filho
bd2cc4b916 intel/compiler: Convert test_eu_compact to use gtest
Be consistent with the other test suites in intel/compiler.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13340>
2021-10-13 17:24:29 +00:00
Lionel Landwerlin
fa251cf111 intel/nir: allow unknown format in lowering of storage images
We're about to allow unknown format for specific formats in Anv.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13198>
2021-10-11 10:29:09 -05:00
Marcin Ślusarz
1faebd0936 intel/compiler: use nir_metadata_none instead of its value
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Marcin Ślusarz
71bec85db0 intel/compiler: use nir_shader_instructions_pass in brw_nir_opt_peephole_ffma
No functional changes.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Marcin Ślusarz
9e6acd801d intel/compiler: use nir_shader_instructions_pass in brw_nir_lower_storage_image
No functional changes.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Marcin Ślusarz
10a33b046e intel/compiler: use nir_shader_instructions_pass in brw_nir_lower_scoped_barriers
No functional changes.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Marcin Ślusarz
3d0332eb12 intel/compiler: use nir_shader_instructions_pass in brw_nir_lower_mem_access_bit_sizes
No functional changes.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Marcin Ślusarz
183987d438 intel/compiler: use nir_shader_instructions_pass in brw_nir_lower_conversions
No functional changes.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Marcin Ślusarz
5b8c993d50 intel/compiler: use nir_shader_instructions_pass in brw_nir_clamp_image_1d_2d_array_sizes
No functional changes.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Marcin Ślusarz
9e22e0838a intel/compiler: use nir_shader_instructions_pass in brw_nir_demote_sample_qualifiers
Changes:
- nir_metadata_preserve(..., nir_metadata_block_index | nir_metadata_dominance)
  is called only when pass makes progress
- nir_metadata_preserve(..., nir_metadata_all) is called when pass doesn't
  make progress
- pass returns true ONLY when it makes progress ("progress" was initialized incorrectly)

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13189>
2021-10-05 10:02:54 +00:00
Lionel Landwerlin
4e4560ab6f intel/compiler: add missing line returns to logs
In the upcoming intel_clc tool, we're allowing to print these messages
out and some of them just don't look right.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13165>
2021-10-05 07:31:52 +00:00
Marcin Ślusarz
6229d40fbf intel/compiler: drop redundant likely's around INTEL_DEBUG
They are not needed since 4015e1876a.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13115>
2021-10-01 07:57:20 +00:00
Jordan Justen
51528aeb60 intel/compiler: Use INTEL_DEBUG=blorp to dump blorp compute shaders
Make INTEL_DEBUG=blorp dump the blorp compute shaders instead using
the general INTEL_DEBUG=cs which is now reserved for actual compute
programs.

Ref: 05933fb0f7 ("intel/compiler: Use INTEL_DEBUG=blorp to dump blorp shaders")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11564>
2021-09-30 17:41:33 +00:00
Jason Ekstrand
0737b37dcd intel/fs: Emit URB fences when we have LSC
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
2021-09-29 20:52:54 +00:00
Jason Ekstrand
e6a9501aa2 intel/fs: Add the URB fence message
When they re-arranged all the dataport stuff and added the LSC, doing
URB fencing through the dataport no longer makes sense.  Instead, there
is now a fence message on the URB shared function.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
2021-09-29 20:52:54 +00:00
Jason Ekstrand
eb53d82d2d intel/fs: Ignore SLM fences if shared is unused
Found this nugget while looking at the ACO driver.  It seems sensible to
avoid SLM fences if there is no SLM.  This also makes the check depend
on SLM usage rather than just shader stage which will be useful if we
ever implement task/mesh because task shaders also have SLM.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
2021-09-29 20:52:54 +00:00
Jason Ekstrand
f726246297 intel/fs: Rework fence handling in brw_fs_nir.cpp
Start off making everything look like LSC where we have three types of
fences: TGM, UGM, and SLM.  Then, emit the actual code in a generation-
aware way.  There are three HW generation cases we care about:
XeHP+ (LSC), ICL-TGL, and IVB-SKL.  Even though it looks like there's a
lot to deduplicate, it only increases the number of ubld.emit() calls
from 5 to 7 and entirely gets rid of the SFID juggling and other
weirdness we've introduced along the way to make those cases "general".
While we're here, also clean up the code for stalling after fences and
clearly document every case where we insert a stall.

There are only three known functional changes from this commit:

 1. We now avoid the render cache fence on IVB if we don't need image
    barriers.

 2. On ICL+, we no longer unconditionally stall on barriers.  We still
    stall if we have more than one to help tie them together but
    independent barriers are independent.  Barrier instructions will
    still operate in write-commit mode and still be scheduling barriers
    but won't necessarily stall.

 3. We now assert-fail for URB fences on LSC platforms.  We'll be adding
    in the new URB fence message for those platforms in a follow-on
    commit.

It is a big enough refactor, however, that other minor changes may be
present.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13092>
2021-09-29 20:52:54 +00:00
Christian Gmeiner
3d65cea6ee util/bitset: s/BITSET_SET_RANGE/BITSET_SET_RANGE_INSIDE_WORD
Prep work for the next commit.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11321>
2021-09-21 20:25:31 +00:00
Jason Ekstrand
35ac184de0 intel/fs: Handle required subgroup sizes specified in the SPIR-V
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12959>
2021-09-21 18:34:59 +00:00
Ian Romanick
3281ccf4b1 iris: Calculate uses_atomic_load_store after all lowering
The lowering passes will soon be moved to another function, so there
won't be any choice.

As a side benefit, this allows eliminating the uses_atomic_load_store
**pointer** parameter from brw_nir_lower_storage_image.  For some reason
crocus was passing false instead of NULL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
2021-09-17 16:36:08 -07:00
Emma Anholt
aed4c0b5a9 nir: Drop the unused instr arg for src/dest copy functions.
Now that we don't use ralloc, we don't need this arg to get at the right
ralloc ctx.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:06 +00:00
Sagar Ghuge
75e28b8777 intel/compiler: Add support to handle 64-bit atomics with A32 messages
v1: (Jason)
- Fix parentheses

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12566>
2021-09-09 23:34:33 +00:00
Sagar Ghuge
527468f56f intel/compiler: Add 64-bit A64 float logical opcode support
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12566>
2021-09-09 23:34:33 +00:00
Jason Ekstrand
7b21def9c2 intel/fs: Add support for atomic_fadd
Rework:
- Enable float32 atomic add with LSC (Sagar)
- disassemble new opcode (Caio)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12566>
2021-09-09 23:34:33 +00:00
Lionel Landwerlin
838c0e5eef intel/fs: fix framebuffer reads
We're missing some restrictions on those messages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5292
Cc: mesa-stable
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12615>
2021-08-31 11:52:39 +00:00
Ian Romanick
a8d0c0af86 intel/fs: Remove type-based restriction for cmod propagation to saturated operations
Previously, we misunderstood how conditional modifiers and saturate
interacted.  We thought the condition was evaulated before the saturate
was applied.  For the floating point cases, we went to some heroics to
modify the condition to maintain the same results.  For integer cases,
it was not clear that this could even work.  We had no use-cases and no
tests-cases, so we just disallowed everything.

Now we understand that the condition is evaluated after the saturate.
Earlier commits in this series removed the various floating point
heroics.  It is easier to just delete the code that prevents some cases
that just work.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Ian Romanick
5ad88fd499 intel/fs: Remove after parameter from test_saturate_prop
Originally this was part of "intel/fs: Remove condition-based
restriction for cmod propagation to saturated operations".  With some
additional changes to that commit, it caused a lot of extra churn in the
unit tests.  I felt that made it harder to see the actual changes in the
unit tests, so I split it out.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Ian Romanick
e6373923a7 intel/fs: Remove condition-based restriction for cmod propagation to saturated operations
I don't know why the float_saturate_l_mov test was #if'ed out, but it
passes... so this commit enables it.

No shader-db or fossil-db changes.

In a previous iteration of this MR, this commit helped ~200 shaders in
shader-db.  Now all of those same shaders are helped by "intel/fs: cmod
propagate from MOV with any condition".  All of these shaders come from
Mad Max.  After initial translation from NIR to assembly, these shader
contain patterns like:

    mul(8)            g90<1>F       g88<8,8,1>F     0x40400000F  /* 3F */
    ...
    mov.sat(8)        g90<1>F       g90<8,8,1>F
    ...
    cmp.nz.f0(8)      null<1>F      g90<8,8,1>F     0 /* 0F */

An initial pass of cmod propagation converts this to

    mul(8)            g90<1>F       g88<8,8,1>F     0x40400000F  /* 3F */
    ...
    mov.sat.XX.f0(8)  g90<1>F       g90<8,8,1>F

Without this commit, XX is G.  With this commit, XX is NZ.  Saturate
propagation moves the saturate:

    mul.sat(8)        g90<1>F       g88<8,8,1>F     0x40400000F  /* 3F */
    ...
    mov.XX.f0(8)      g90<1>F       g90<8,8,1>F

Without this commit (but with "intel/fs: cmod propagate from MOV with
any condition"), the G gets propagated:

    mul.sat.g.f0(8)   g90<1>F       g88<8,8,1>F     0x40400000F  /* 3F */

With this commit (with or without "intel/fs: cmod propagate from MOV
with any condition"), the NZ gets propagated:

    mul.sat.nz.f0(8)  g90<1>F       g88<8,8,1>F     0x40400000F  /* 3F */

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Ian Romanick
47f0cdc449 intel/fs: cmod propagate from MOV with any condition
There were tests related to propagating conditional modifiers from a MOV
to an instruction with a .SAT modifier for a very long time, but they
were #if'ed out.

There are restrictions later in the function that limit the kinds of MOV
instructions that can propagate.  This avoids the dangers of
type-converting MOVs that may generate flags in different ways.

v2: Update the added comment to look more like the existing comment.
That makes the small differences between the two cases more obvious.
Noticed by Marcin.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19827127 -> 19826924 (<.01%)
instructions in affected programs: 62024 -> 61821 (-0.33%)
helped: 201
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1
helped stats (rel) min: 0.13% max: 0.60% x̄: 0.35% x̃: 0.36%
95% mean confidence interval for instructions value: -1.02 -1.00
95% mean confidence interval for instructions %-change: -0.36% -0.34%
Instructions are helped.

total cycles in shared programs: 954655879 -> 954655356 (<.01%)
cycles in affected programs: 1212877 -> 1212354 (-0.04%)
helped: 155
HURT: 6
helped stats (abs) min: 1 max: 6 x̄: 3.65 x̃: 4
helped stats (rel) min: <.01% max: 0.17% x̄: 0.07% x̃: 0.07%
HURT stats (abs)   min: 2 max: 12 x̄: 7.00 x̃: 8
HURT stats (rel)   min: 0.04% max: 0.23% x̄: 0.14% x̃: 0.15%
95% mean confidence interval for cycles value: -3.60 -2.90
95% mean confidence interval for cycles %-change: -0.07% -0.05%
Cycles are helped.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Ian Romanick
a9120eccff intel/compiler: Move type_is_unsigned_int to brw_reg_type.h
...and rename it to brw_reg_type_is_unsigned_integer.  It is now next to
brw_reg_type_is_floating_point and brw_reg_type_is_integer.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Ian Romanick
b23432c540 intel/fs: Fix a cmod prop bug when the source type of a mov doesn't match the dest type of scan_inst
We were previously operating with the mindset "a MOV is just a compare
with zero."  As a result, we were trying to share as much code between
the MOV path and the CMP path as possible.  However, MOV instructions
can perform type conversions that affect the result of the comparison.
There was some code added to better handle this for cases like

    and(16)         g31<1>UD       g20<8,8,1>UD   g22<8,8,1>UD
    mov.nz.f0(16)   null<1>F       g31<8,8,1>D

The flaw in these changed special cases is that it allowed things like

    or(8)           dest:D  src0:D  src1:D
    mov.nz(8)       null:D  dest:F

Because both destinations were integer types, the propagation was
allowed.  The source type of the MOV and the destination type of the OR
do not match, so type conversion rules have to be accounted for.

My solution was to just split the MOV and non-MOV paths with completely
separate checks.  The "else" path in this commit is basically the old
code with the BRW_OPCODE_MOV special case removed.

The new MOV code further splits into "destination of scan_inst is float"
and "destination of scan_inst is integer" paths.  For each case I
enumerate the rules that I belive apply.  For the integer path, only the
"Z or NZ" rules are listed as only NZ is currently allowed (hence the
conditional_mod assertion in that path).  A later commit relaxes this
and adds the rule.

The new rules slightly relax one of the previous rules.  Previously the
sizes of the MOV destination and the MOV source had to be the same.  In
some cases now the sizes can be different by the following conditions:

  - Floating point to integer conversion are not allowed.

  - If the conversion is integer to floating point, the size of the
    floating point value does not matter as it will not affect the
    comparison result.

  - If the conversion is float to float, the size of the destination
    must be greater than or equal to the size of the source.

  - If the conversion is integer to integer, the size of the destination
    must be greater than or equal to the size of the source.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Ian Romanick
0797388dc2 intel/fs: Add many cmod propagation tests involving MOV instructions
Of particular interest are the tests where the MOV performs a type
conversion.  If the restriction on conditional modifier for a MOV is
ever relaxed, some of these cases must still be disallowed.

v2: s/NZ/Z/ in one of the comments.  Notice by Marcin.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Ian Romanick
4f6c5da025 intel/fs: Remove redundant inst->opcode checks in cmod prop
This foreach_inst_in_block_reverse_starting_from loop only applies
CMP, MOV, and AND.  AND instructions break out of the loop before this
point.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Ian Romanick
3afefb0818 intel/fs: Refactor some cmod propagation tests
This will simplify some later changes to these tests.

v2: Combine test_positive_saturate_prop and test_negative_saturate_prop
into a single function.  Suggested by Marcin.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12045>
2021-08-30 14:00:14 -07:00
Marcin Ślusarz
e0533ebf16 intel/compiler: INT DIV function does not support source modifiers
BSpec says that for all generations.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5281
CC: mesa-stable

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12518>
2021-08-26 07:51:44 +00:00
Ian Romanick
0f809dbf40 intel/compiler: Basic support for DP4A instruction
v2: Very significant rebase on changes to previous commits.
Specifically, brw_fs_nir.cpp changes were pretty much rewritten from
scratch after changing the NIR opcode names and types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Jason Ekstrand
31fdd26d01 intel/compiler: Add unified barrier support for CS
Program CS barrier message fields for producers/consumers.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11963>
2021-08-24 01:31:48 +00:00
Jordan Justen
6a950bab0c intel/compiler: Add unified barrier support for TCS
Program the producers/consumer fields for TCS Barrier messages.
Producer and consumer fields are set to number of TCS threads.

Ref: Bspec 54006 for Barrier Data Payload
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11963>
2021-08-24 01:31:48 +00:00
Jordan Justen
b4055a020f intel/compiler: Regroup TCS barrier code paths
Rearrange if/else fragments to unify case for Gen11 or later
platforms. This will help the code look cleaner for adding
unified barrier support to TCS.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11963>
2021-08-24 01:31:48 +00:00
Ian Romanick
5ce3bfcdf3 intel/compiler: Lower 8-bit ops to 16-bit in NIR on all platforms
This fixes the Crucible func.shader.shift.int8_t test on Gen8 and Gen9.
See https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/76.

With the previous optimizations in place, this change seems to improve
the quality of the generated code.  Comparing a couple Vulkan CTS tests
on Skylake had the following results.

dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:
SIMD8 shader: 36 instructions. 1 loops. 3822 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 27 instructions. 1 loops. 2742 cycles. 0:0 spills:fills, 5 sends

dEQP-VK.spirv_assembly.type.vec3.i8.max_frag:
SIMD8 shader: 39 instructions. 1 loops. 3922 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 37 instructions. 1 loops. 3682 cycles. 0:0 spills:fills, 5 sends

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Ian Romanick
f0a8a9816a nir: intel/compiler: Add and use nir_op_pack_32_4x8_split
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO.  This results
in a lot of shifts and MOVs.  When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.

v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Ian Romanick
7c83aa0518 intel/fs: Emit better code for u2u of extract
Emitting the instructions one by one results in two MOV instructions
that won't be propagated.  By handling both instructions at once, a
single MOV is emitted.  For example, on Ice Lake this helps
dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:

SIMD8 shader: 49 instructions. 1 loops. 4044 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends

Without "intel/fs: Allow copy propagation between MOVs of mixed sizes,"
the improvement is still 8 instructions, but there are more instructions
to begin with:

SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 44 instructions. 1 loops. 3944 cycles. 0:0 spills:fills, 5 sends

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Ian Romanick
e3f502e007 intel/fs: Allow copy propagation between MOVs of mixed sizes
This eliminates some spurious, size-converting moves.  For example, on
Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag:

SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends
SIMD8 shader: 49 instructions. 1 loops. 4044 cycles. 0:0 spills:fills, 5 sends

Unfortunately, this doesn't clean everything up.  Here's a subset of the
"before" assembly:

send(8)         g11<1>UW        g2<0,1,0>UD     0x02106e02
                            dp data 1 MsgDesc: ( untyped surface read, Surface = 2, SIMD8, Mask = 0xe) mlen 1 rlen 1 { align1 1Q };
mov(8)          g7<4>UB         g11<8,8,1>UD                    { align1 1Q };
mov(8)          g12<1>UB        g7<32,8,4>UB                    { align1 1Q };
send(8)         g13<1>UW        g2<0,1,0>UD     0x02106e03
                            dp data 1 MsgDesc: ( untyped surface read, Surface = 3, SIMD8, Mask = 0xe) mlen 1 rlen 1 { align1 1Q };
mov(8)          g15<1>UW        g12<8,8,1>UB                    { align1 1Q };
mov(8)          g8<4>UB         g13<8,8,1>UD                    { align1 1Q };
mov(8)          g14<1>UB        g8<32,8,4>UB                    { align1 1Q };
mov(8)          g16<1>UW        g14<8,8,1>UB                    { align1 1Q };
xor(8)          g17<1>UW        g15<8,8,1>UW    g16<8,8,1>UW    { align1 1Q };

And here's the same subset of the "after" assembly:

send(8)         g11<1>UW        g2<0,1,0>UD     0x02106e02
                            dp data 1 MsgDesc: ( untyped surface read, Surface = 2, SIMD8, Mask = 0xe) mlen 1 rlen 1 { align1 1Q };
mov(8)          g7<4>UB         g11<8,8,1>UD                    { align1 1Q };
send(8)         g13<1>UW        g2<0,1,0>UD     0x02106e03
                            dp data 1 MsgDesc: ( untyped surface read, Surface = 3, SIMD8, Mask = 0xe) mlen 1 rlen 1 { align1 1Q };
mov(8)          g15<1>UW        g7<32,8,4>UB                    { align1 1Q };
mov(8)          g8<4>UB         g13<8,8,1>UD                    { align1 1Q };
mov(8)          g16<1>UW        g8<32,8,4>UB                    { align1 1Q };
xor(8)          g17<1>UW        g15<8,8,1>UW    g16<8,8,1>UW    { align1 1Q };

There are a lot of regioning and type restrictions in
fs_visitor::try_copy_propagate, and I'm a little nervious about messing
with them too much.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Ian Romanick
f9665040f1 intel/compiler: Document and assert some aspects of 8-bit integer lowering
In the vec4 compiler, 8-bit types should never exist.

In the scalar compiler, 8-bit types should only ever be able to exist on
Gfx ver 8 and 9.

Some instructions are handled in non-obvious ways.

Hopefully this will save the next person some time.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00