Commit graph

648 commits

Author SHA1 Message Date
Jason Ekstrand
d0b154319d intel/fs: Simplify persample_dispatch
Thanks to the previous commit, we no longer need this check.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand
fd17aaf430 intel/fs: Use nir_lower_single_sampled
This lets us drop demote_sample_qualifiers as well as a back-end check
for key->multisample_fbo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand
ca9f0f72db intel/fs: Use shader_info::fs::uses_sample_shading
NIR constructs this information for us as part of nir_gather_info these
days so we can simplify our logic a bit.  This will also let us be more
correct once we move uses_sample_shading scraping earlier.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
2022-07-13 20:28:42 +00:00
Jason Ekstrand
530de844ef intel,anv,iris,crocus: Drop subgroup size from the shader key
Use nir->info.subgroup_size instead.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17337>
2022-07-08 22:47:22 +00:00
Ian Romanick
bbcb881f46 intel/fs: Remove non-_LOGICAL URB messages
The _LOGICAL versions are lowered direct to SEND, so nothing can ever
generate these messages.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick
a477587b4a intel/fs: Add _LOGICAL versions of URB messages
The lowering is currently fake.  It just changes the opcode from the
_LOGICAL version to the non-_LOGICAL version.

v2: Remove some rebase cruft.  's/gfx8_//;s/simd8_/' in
brw_instruction_name.  Both suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Ian Romanick
07b9bfacc7 intel/compiler: Move logical-send lowering to a separate file
brw_fs.cpp was 10kloc.  Now it's only 7.5kloc.  Ugh.

v2: Rebase on 9680e0e4a2.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Lionel Landwerlin
9680e0e4a2 intel/fs: ray query fix for global address
With stages dispatching with a mask, we can run into situations where
we don't have the global address in all lanes. The existing code
always assumed we had the addres in at least lane0.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bb40e999d1 ("intel/nir: use a single intel intrinsic to deal with ray traversal")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17330>
2022-07-08 00:36:04 +00:00
Kenneth Graunke
589b03d02f intel/fs: Opportunistically split SEND message payloads
While we've taken advantage of split-sends in select situations, there
are many other cases (such as sampler messages, framebuffer writes, and
URB writes) that have never received that treatment, and continued to
use monolithic send payloads.

This commit introduces a new optimization pass which detects SEND
messages with a single payload, finds an adjacent LOAD_PAYLOAD that
produces that payload, splits it two, and updates the SEND to use both
of the new smaller payloads.

In places where we manually used split SENDS, we rely on underlying
knowledge of the message to determine a natural split point.  For
example, header and data, or address and value.

In this pass, we instead infer a natural split point by looking at the
source registers.  Often times, consecutive LOAD_PAYLOAD sources may
already be grouped together in a contiguous block, such as a texture
coordinate.  Then, there is another bit of data, such as a LOD, that
may come from elsewhere.  We look for the point where the source list
switches VGRFs, and split it there.  (If there is a message header, we
choose to split there, as it will naturally come from elsewhere.)

This not only reduces the payload sizes, alleviating register pressure,
but it means that we may be able to eliminate some payload construction
altogether, if we have a contiguous block already and some extra data
being tacked on to one side or the other.

shader-db results for Icelake are:

   total instructions in shared programs: 19602513 -> 19369255 (-1.19%)
   instructions in affected programs: 6085404 -> 5852146 (-3.83%)
   helped: 23650 / HURT: 15
   helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3
   helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15%
   HURT stats (abs)   min: 1 max: 44 x̄: 7.20 x̃: 2
   HURT stats (rel)   min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00%
   95% mean confidence interval for instructions value: -10.16 -9.55
   95% mean confidence interval for instructions %-change: -3.84% -3.72%
   Instructions are helped.

   total cycles in shared programs: 848180368 -> 842208063 (-0.70%)
   cycles in affected programs: 599931746 -> 593959441 (-1.00%)
   helped: 22114 / HURT: 13053
   helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22
   helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75%
   HURT stats (abs)   min: 1 max: 94022 x̄: 526.67 x̃: 22
   HURT stats (rel)   min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61%
   95% mean confidence interval for cycles value: -222.87 -116.79
   95% mean confidence interval for cycles %-change: -1.44% -1.20%
   Cycles are helped.

   total spills in shared programs: 8387 -> 6569 (-21.68%)
   spills in affected programs: 5110 -> 3292 (-35.58%)
   helped: 359 / HURT: 3

   total fills in shared programs: 11833 -> 8218 (-30.55%)
   fills in affected programs: 8635 -> 5020 (-41.86%)
   helped: 358 / HURT: 3

   LOST:   1 SIMD16 shader, 659 SIMD32 shaders
   GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders

   Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%)

Examining these results: the few shaders where spills/fills increased
were already spilling significantly, and were only slightly hurt.  The
applications affected were also helped in countless other shaders, and
other shaders stopped spilling altogether or had 50% reductions.  Many
SIMD16 shaders were gained, and overall we gain more SIMD32, though many
close to the register pressure line go back and forth.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
2022-07-01 02:05:45 +00:00
Kenneth Graunke
72e9843991 intel/compiler: Introduce a new brw_isa_info structure
This structure will contain the opcode mapping tables in the next
commit.  For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Marcin Ślusarz
f4386b81e6 intel: fix typos found by codespell
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17191>
2022-06-27 10:20:55 +00:00
Kenneth Graunke
a8e718c7e5 intel/compiler: Fix A64 header construction with a uniform address
fs_visitor::assign_curb_setup() maps UNIFORM registers to HW regs,
and contains the following assert:

            assert(inst->src[i].stride == 0);

emit_a64_oword_block_header's striding tricks run afoul of this
restriction, by producing stride 1 values on a 64-bit UNIFORM source.

Work around this by copying the UNIFORM value to a VGRF first.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16938>
2022-06-10 02:14:57 +00:00
Paulo Zanoni
2256314b08 intel/compiler: split handling of 64 bit floats and ints
In opt_algebraic(), handle TYPE_DF in a different check than TYPE_Q. We have a
separate flag for each type, use separate checks so platforms where one is true
and the other is not can work properly.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15835>
2022-06-02 23:04:39 +00:00
Jason Ekstrand
dfedeccc13 intel: Only set VectorMaskEnable when needed
For cases with lots of very small primitives, this may improve
performance because we're not executing those dead channels all the
time.

Shader-db reports no instruction or cycle-count changes.  However, by
hacking up the driver to report when this optimization triggers, it
appears to affect about 10% of shader-db.

v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now,
because using VMask on those platforms allows us to perform the
eliminate_find_live_channel() optimization.  However, XeHP doesn't
seem to have packed fragment shader dispatch, so we lose that
optimization regardless, and there's no reason not to avoid vmask.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
2022-05-27 21:52:48 +00:00
Jason Ekstrand
1b9248e761 intel/fs: Copy color_outputs_valid into wm_prog_data
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
2022-05-27 14:33:53 +00:00
Kenneth Graunke
59bfc9c6cb intel: Fix analysis invalidation in eliminate_find_live_channel
If we saw a HALT instruction, we would forget to invalidate our analysis
pass information before returning progress.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16677>
2022-05-24 22:36:39 +00:00
Marcin Ślusarz
9acb30c8c4 intel/compiler: implement primitive shading rate for mesh
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16030>
2022-05-13 13:05:51 +00:00
Marcin Ślusarz
65ff6932dc intel/compiler: handle gl_Viewport and gl_Layer in FS URB setup
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>
2022-05-13 09:43:02 +00:00
Ian Romanick
c08302670b intel/compiler: Fix sample_d messages on DG2
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces.  The
maximum number of gradient components and coordinate components should
be 2.  In spite of this limitation, the Bspec lists a mysterious R
component before the min_lod, so the maximum coordinate components is 3.

Fixes the following Vulkan CTS failures on DG2:

    dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment

The Fixes: tag below is a bit misleading. This commit fixes some test
cases similar to ones fixed by the Fixes: commit.  I just want to make
sure this commit gets applied everywhere that commit was also applied.

Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>
2022-04-07 17:09:28 +00:00
Kenneth Graunke
1967fd3b10 intel/compiler: Call inst->resize_sources before setting the sources
You should probably resize the sources array before accessing entries
that might be out of bounds.  inst->resize_sources() always allocates
enough space for at least 3 sources, so this is really only an issue
when there are 4+ sources.

Fixes: a920979d4f ("intel/fs: Use split sends for surface writes on gen9+")
Fixes: 4f86a70599 ("intel/fs: Lower DW untyped r/w messages to LSC when available")
Fixes: d372abe397 ("intel/fs: Add surface OWORD BLOCK opcodes")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15632>
2022-03-29 13:06:17 -07:00
Kenneth Graunke
6fa66ac228 intel/compiler: Implement nir_intrinsic_last_invocation
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V.  However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().

We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL.  A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
2022-03-26 00:28:19 +00:00
Caio Oliveira
f82731d0d7 intel/fs: Fix IsHelperInvocation for the case no discard/demote are used
Use emit_predicate_on_sample_mask() helper that does check where to
get the correct mask depending on whether discard/demote was used or
not.

Fixes: 45f5db5a84 ("intel/fs: Implement "demote to helper invocation"")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>
2022-03-25 08:20:27 +00:00
Caio Oliveira
bb311c22df intel/fs: Initialize the sample mask in flags register when using demote
Without this change, a check for "is helper invocation" could read
uninitialized values.

Fixes: 45f5db5a84 ("intel/fs: Implement "demote to helper invocation"")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>
2022-03-25 08:20:27 +00:00
Mark Janes
85e314db5d Revert "intel/fs: handle interpolation modes for at_sample and at_offset too"
This reverts commit 5afbb0e730.

Closes: #6198
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15534>
2022-03-23 11:03:47 -07:00
Iván Briano
5afbb0e730 intel/fs: handle interpolation modes for at_sample and at_offset too
Cc: mesa-stable

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15424>
2022-03-22 19:05:05 +00:00
Lionel Landwerlin
ec6e247a40 intel/fs: handle inline data on OpenCL style kernels
This is for Gfx12.5 with the COMPUTE_WALKER::Inline Data payload. We
do this in a similar way to the compute kernels.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13171>
2022-03-21 11:26:44 +00:00
Lionel Landwerlin
4ec5da7270 intel/nir/fs: replace COMPUTE || KERNEL by gl_shader_stage_is_compute()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13171>
2022-03-21 11:26:44 +00:00
Sagar Ghuge
6031ad4bf6 intel/fs: Add Wa_22013689345
v2: Use a simpler framework (Lionel)

v3: Rebase, add task/mesh (Lionel)

v4: Fixup fence exec size (SIMDX -> SIMD1)

v5: Fix invalidate_analysis, add finishme comment (Curro)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: 22.0 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14947>
2022-03-17 14:18:02 +00:00
Lionel Landwerlin
96c8880900 intel/fs: fix total_scratch computation
We only have a single prog_data::total_scratch for all shader variants
(SIMD 8, 16, 32). Therefore we should always max the total_scratch on
top of existing variant.

We probably haven't run into that issue before because we compile by
increasing SIMD size and higher SIMD size is more likely to spill. But
for bindless shaders with return shaders, if the last return part
doesn't spill, we completely ignore the previous parts' scratch
computation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15193>
2022-03-02 13:13:03 +00:00
Marcin Ślusarz
e5c39bc427 intel/compiler: optimize flat inputs mask calculation
Don't bother looking at urb if variable is not flat.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15169>
2022-02-25 22:34:22 +00:00
Marcin Ślusarz
e2cb562dd1 intel/compiler: ignore per-primitive attrs when calculating flat input mask
If we say that per-primitive attributes are flat (which is communicated by
3DSTATE_SBE.ConstantInterpolationEnable), GPU freaks out and applies it
to other (non-flat) attributes.

Fixes: be89ea3231 ("intel/compiler: Handle per-primitive inputs in FS")

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15169>
2022-02-25 22:34:22 +00:00
Marcin Ślusarz
f91bfc80ba intel/compiler: remove redundant code from fs_visitor::run_*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15079>
2022-02-22 09:09:05 +00:00
Lionel Landwerlin
2763a8af5a anv/genxml/intel/fs: fix binding shader record entry
Bit is flipped compared to all the other packets.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 705395344d ("intel/fs: Add support for compiling bindless shaders with resume shaders")
Fixes: c3ac9afca3 ("anv: Create and return ray-tracing pipeline SBT handles")
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15078>
2022-02-19 13:50:56 +00:00
Iván Briano
db48dcb4f3 intel/compiler: remove what looks like a bad rebase
This bit in the compiler looks like it was added by accident on one of
the latest versions of the original commit, but it clearly doesn't
belong there.

Fixes: 03e1e19246 ("anv: Refactor descriptor copy")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15016>
2022-02-15 01:04:47 +00:00
Ian Romanick
38a94c82e6 intel/fs: Don't optimize out 1.0*x and -1.0*x
This (sort of) matches the behavior of nir_opt_algebraic.  This ensures
that subnormal values are properly flushed to zero.

With the aid of "nir/search: Float sources of texture instructions are
float users" and "nir/search: Transitively apply is_only_used_as_float",
there would have been no shader-db regressions on Intel platforms.
However, those caused a significant increase in compile time.  Since the
instruction regressions were so small, I just dropped those commits
rather than improve them.

All Haswell and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20125042 -> 20125094 (<.01%)
instructions in affected programs: 7184 -> 7236 (0.72%)
helped: 0
HURT: 32
HURT stats (abs)   min: 1 max: 4 x̄: 1.62 x̃: 2
HURT stats (rel)   min: 0.11% max: 1.49% x̄: 0.85% x̃: 0.78%
95% mean confidence interval for instructions value: 1.39 1.86
95% mean confidence interval for instructions %-change: 0.74% 0.96%
Instructions are HURT.

total cycles in shared programs: 862745586 -> 862746551 (<.01%)
cycles in affected programs: 109872 -> 110837 (0.88%)
helped: 12
HURT: 23
helped stats (abs) min: 2 max: 774 x̄: 90.83 x̃: 19
helped stats (rel) min: 0.07% max: 25.23% x̄: 3.06% x̃: 0.40%
HURT stats (abs)   min: 2 max: 1106 x̄: 89.35 x̃: 12
HURT stats (rel)   min: 0.08% max: 45.40% x̄: 3.01% x̃: 0.47%
95% mean confidence interval for cycles value: -60.09 115.23
95% mean confidence interval for cycles %-change: -2.21% 4.07%
Inconclusive result (value mean confidence interval includes 0).

All of the shaders hurt are in either UE4 shooter-game or shooter_demo.

Tiger Lake
Instructions in all programs: 159893213 -> 159893290 (+0.0%)
SENDs in all programs: 6936431 -> 6936431 (+0.0%)
Loops in all programs: 38385 -> 38385 (+0.0%)
Cycles in all programs: 7019259514 -> 7019260087 (+0.0%)
Spills in all programs: 101389 -> 101389 (+0.0%)
Fills in all programs: 131532 -> 131532 (+0.0%)

Ice Lake
Instructions in all programs: 143624164 -> 143624235 (+0.0%)
SENDs in all programs: 6980289 -> 6980289 (+0.0%)
Loops in all programs: 38383 -> 38383 (+0.0%)
Cycles in all programs: 8440082767 -> 8440083238 (+0.0%)
Spills in all programs: 102246 -> 102246 (+0.0%)
Fills in all programs: 131908 -> 131908 (+0.0%)

Skylake
Instructions in all programs: 134185424 -> 134185495 (+0.0%)
SENDs in all programs: 6938790 -> 6938790 (+0.0%)
Loops in all programs: 38356 -> 38356 (+0.0%)
Cycles in all programs: 8222366529 -> 8222366923 (+0.0%)
Spills in all programs: 98821 -> 98821 (+0.0%)
Fills in all programs: 125218 -> 125218 (+0.0%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Fixes: f5dd6dfe01 ("anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13999>
2022-02-10 18:15:39 +00:00
Rohan Garg
03e1e19246 anv: Refactor descriptor copy
Refactor descriptor copies to use the existing helper functions instead
of rolling our own. In order to facilitate this, we need to store the
appropriate buffer views for the relevant descriptors internally and
reuse them in the helpers.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14909>
2022-02-09 09:24:37 +00:00
Lionel Landwerlin
bb40e999d1 intel/nir: use a single intel intrinsic to deal with ray traversal
In the future we'll want to reuse this intrinsic to deal with ray
queries. Ray queries will use a different global pointer and
programmatically change the control/level arguments of the trace send
instruction.

v2: Comment on barrier after sync trace instruction (Caio)
    Generalize lsc helper (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:25 +00:00
Lionel Landwerlin
2665595244 intel/fs: limit FS dispatch to SIMD16 when using ray queries
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:25 +00:00
Lionel Landwerlin
57eed6698b intel/compiler: tracker number of ray queries in prog_data
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:25 +00:00
Lionel Landwerlin
9d22f8ed23 intel/fs: add support for ACCESS_ENABLE_HELPER
v2: Factor out fragment shader masking on send messages (Caio)
    Update comments (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:24 +00:00
Lionel Landwerlin
c199f44d17 intel/fs: name sources for A64 opcodes
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>
2022-02-08 12:55:24 +00:00
Marcin Ślusarz
1d9f47325b intel/compiler: handle gl_[Clip|Cull]Distance from mesh in fragment shaders
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>
2022-01-29 06:32:19 +00:00
Caio Oliveira
856a0cacb1 intel/compiler: Merge Per-Primitive attribute handling in Mesh case
Just a refactor, no behavior change.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>
2022-01-29 06:32:19 +00:00
Caio Oliveira
2b8b884bcd intel/compiler: Have specific mesh handling in calculate_urb_setup()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>
2022-01-29 06:32:19 +00:00
Kenneth Graunke
d475e839da intel/fs: Reuse the same FS input slot for VUE header fields.
VARYING_SLOT_{VIEWPORT,LAYER,PSIZ} all live in the same VUE header slot,
and the FS is already set up to read the x/y/z/w component of that vec4.

However, we were setting up the SBE to pass each of those items as a
separate FS input, so hypothetically if a shader read all three, we
would burn 3 FS inputs with redundant data.  Not only was this passing
extra data to the FS, but it would count as extra input slots for the
"Do we have 16 or fewer attributes?" check for using SBE swizzling to
rearrange them in a convenient manner.

Now we make them share a single FS attribute and only count them once.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14210>
2022-01-19 01:31:47 +00:00
Lionel Landwerlin
30a8b8d2df intel/fs: disable VRS when omask is written
As indicated by
VkPhysicalDeviceFragmentShadingRatePropertiesKHR::fragmentShadingRateWithShaderSampleMask
our implementation will clamp to 1x1 when reading samplemask or
writing to samplemask.

This fixes vkd3d-proton tests test_sample_mask_dxbc & test_sample_mask_dxil

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b6332fc4a8 ("intel/compiler: handle coarse pixel in render target writes descriptors")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14553>
2022-01-14 19:14:06 +00:00
Jason Ekstrand
a1de102479 intel/fs: Use compare_func for wm_prog_key::alpha_test_func
Because 0 is no longer a recognizable value (it's NEVER, which isn't a
good default), we add an emit_alpha_test bool to tell the back-end when
to bother alpha testing.  This lets us only touch crocus with the
change.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14157>
2022-01-14 15:08:09 +00:00
Jordan Justen
d57b10ab98 intel/compiler: Adjust TCS instance-id for dg2+
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14385>
2022-01-05 16:13:28 -08:00
Marcin Ślusarz
a48f1d51e2 intel/compiler: disable workaround not applicable to gfx >= 11
There's nothing in bspec that would suggest this is still needed.
It only affected gfx 9 and 10.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14280>
2021-12-22 10:13:25 +00:00
Jason Ekstrand
eebb2dedb2 intel/fs: Add a NONE scheduling mode
While our LIFO scheduling mode attempts to optimize for register
pressure, it's often hard for a scheduling algorithm to do better than
the instruction order provided by the shader author.  Shader authors
often do perfectly reasonable things like using texture results
immediately after fetching them or constructing texture coordinates
immediately before the texture op.  When we throw all the instruction
ordering information away, we loose any help the author may have given
us.  By attempting NONE before we fall back to the worst case LIFO mode.

And, yes, I tried this with NONE both before and after LIFO and doing
NONE before LIFO is substantially better, according to shader-db.

    total instructions in shared programs: 19673152 -> 19665202 (-0.04%)
    instructions in affected programs: 33669 -> 25719 (-23.61%)
    helped: 20
    HURT: 0
    helped stats (abs) min: 15 max: 4609 x̄: 397.50 x̃: 107
    helped stats (rel) min: 2.33% max: 67.50% x̄: 14.60% x̃: 9.12%
    95% mean confidence interval for instructions value: -867.61 72.61
    95% mean confidence interval for instructions %-change: -21.74% -7.46%
    Inconclusive result (value mean confidence interval includes 0).

    total cycles in shared programs: 935562500 -> 935020920 (-0.06%)
    cycles in affected programs: 18620349 -> 18078769 (-2.91%)
    helped: 104
    HURT: 48
    helped stats (abs) min: 88 max: 60986 x̄: 8031.48 x̃: 3680
    helped stats (rel) min: 0.61% max: 51.44% x̄: 14.95% x̃: 8.87%
    HURT stats (abs)   min: 10 max: 54724 x̄: 6118.62 x̃: 1530
    HURT stats (rel)   min: 0.13% max: 46.45% x̄: 10.28% x̃: 6.46%
    95% mean confidence interval for cycles value: -5724.34 -1401.71
    95% mean confidence interval for cycles %-change: -9.86% -4.10%
    Cycles are helped.

    total spills in shared programs: 12158 -> 10327 (-15.06%)
    spills in affected programs: 1831 -> 0
    helped: 20
    HURT: 0

    total fills in shared programs: 14749 -> 12635 (-14.33%)
    fills in affected programs: 2114 -> 0
    helped: 20
    HURT: 0

    LOST:   8
    GAINED: 649

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00