Commit graph

1957 commits

Author SHA1 Message Date
Francisco Jerez
44e48751d2 intel/fs: Teach the lower_regioning pass how to split instructions of unsuported exec type.
This adds some generic infrastructure that allows splitting any
instruction into a number of instructions of a smaller legal execution
type.  This is meant to replace several instances of handcrafted 64bit
type lowering done manually in the code generator, which is rather
error-prone, prevents scheduling of the lowered instructions, and
makes them invisible to the SWSB pass on Gfx12+ platforms, which will
become especially problematic on Gfx12.5+ since the EUs introduce
multiple asynchronous execution pipelines which the SWSB pass needs to
be able to synchronize to one another, so it's critical for the real
execution type of the instruction to be visible to the SWSB pass.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
2022-01-25 22:40:44 +00:00
Francisco Jerez
539c879a6b intel/fs: Move legal exec type calculation into helper function in lower_regioning pass.
Right now the execution type lowering functionality of this pass
assumes that an integer type of the original bit size is always
acceptable, however we'll want more complex behavior than that in
order to leverage this pass to automate the lowering of unsupported
64-bit operations into multiple 32-bit operations.

In order to do that calculate the closest legal execution type from a
new helper function, and take advantage of that function from the
has_invalid_exec_type() helper, along the lines of other
lower_regioning() helpers structured as a pair of has_invalid_foo() +
required_foo() functions.

This shouldn't have any functional changes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
2022-01-25 22:40:44 +00:00
Francisco Jerez
3886e63033 intel/fs/xehp: Merge repeated in-order read dependencies instead of replacement.
Previously the software scoreboard structure would drop previous
dependencies for a given register and replace them with the most
recent one for the same register when a new instruction (or set of
instructions) is processed.  This worked correctly on the Gfx12LP
platforms this code was originally designed for, because a repeated
dependency on the same register would either require the second
instruction to synchronize against the first (so the first dependency
could be disregarded from that point on) *or* require the dependency
to be RaR and in-order, which allows the synchronization to be
optimized out (the first dependency could still be disregarded as
well, since the pipeline is in-order).  However the latter assumption
will break on upcoming Gfx12HP platforms, because they have multiple
asynchronous FPU pipelines, so whenever we hit a RaR dependency we
need to propagate forward both dependencies, since the order in which
both reads will complete is not guaranteed by the hardware in cases
where they occur from different asynchronous pipelines.

Note that this dependency propagation change requires us to change the
definition of dependency::done as well, since that constant is defined
to discard any previous dependency information when used as argument
for shadow().

This has been reported to fix the following conformance failures on DG2:

   KHR-GL46.shaders.uniform_block.random.all_per_block_buffers.19
   dEQP-GLES3.functional.shaders.derivate.fwidth.*

Reported-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5670
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
2022-01-25 22:40:44 +00:00
Ian Romanick
945fb51fb5 intel/fs: Fix gl_FrontFacing optimization on Gfx12+
It's not obvious why the (gl_FrontFacing ? -1.0 : 1.0) case was handled
different for Gfx12+ than for previous generations, and it's not
correct.  It tries to negate the result as an integer, and it does this
before the mask operation that clears the other bits in the value.

When we eventually support dual-SIMD8 dispatch, the other front-facing
bit is in g1.6 at bit 15, so similar code should be possible there.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: c92fb60007 ("intel/fs/gen12: Implement gl_FrontFacing on gen12+.")
Closes: #5876
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14625>
2022-01-20 22:37:18 +00:00
Dave Airlie
f83f72be8e intel/brw: drop gl header from the brw backend.
This shouldn't be used anywhere now once we drop the GLbitfield64 types.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
2022-01-19 21:54:58 +00:00
Dave Airlie
1352e0ba0c mesa/*: add a shader primitive type to get away from GL types.
This creates an internal shader_prim enum, I've fixed up most
users to use it instead of GL types.

don't store the enum in shader_info as it changes size, and confuses
other things.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
2022-01-19 21:54:58 +00:00
Dave Airlie
d54c07b4c4 mesa/*: use an internal enum for tessellation primitive types.
To avoid dragging gl.h into places it has no business being,
defined tessellation primitive mode to an enum.

This has a lot of fallout all over the place.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
2022-01-19 21:54:58 +00:00
Kenneth Graunke
d475e839da intel/fs: Reuse the same FS input slot for VUE header fields.
VARYING_SLOT_{VIEWPORT,LAYER,PSIZ} all live in the same VUE header slot,
and the FS is already set up to read the x/y/z/w component of that vec4.

However, we were setting up the SBE to pass each of those items as a
separate FS input, so hypothetically if a shader read all three, we
would burn 3 FS inputs with redundant data.  Not only was this passing
extra data to the FS, but it would count as extra input slots for the
"Do we have 16 or fewer attributes?" check for using SBE swizzling to
rearrange them in a convenient manner.

Now we make them share a single FS attribute and only count them once.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14210>
2022-01-19 01:31:47 +00:00
Dave Airlie
f9f7f326fa intel/compiler: add clamp_pointside to vs/tcs/tes keys.
This will be used by crocus and iris to clamp pointsizes only
on the last stage of the shader compile.

Fixes: 3077d96856 ("crocus: Clamp VS point sizes to the HW limits as required.")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14359>
2022-01-18 22:53:45 +00:00
Lionel Landwerlin
30a8b8d2df intel/fs: disable VRS when omask is written
As indicated by
VkPhysicalDeviceFragmentShadingRatePropertiesKHR::fragmentShadingRateWithShaderSampleMask
our implementation will clamp to 1x1 when reading samplemask or
writing to samplemask.

This fixes vkd3d-proton tests test_sample_mask_dxbc & test_sample_mask_dxil

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b6332fc4a8 ("intel/compiler: handle coarse pixel in render target writes descriptors")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14553>
2022-01-14 19:14:06 +00:00
Jason Ekstrand
a1de102479 intel/fs: Use compare_func for wm_prog_key::alpha_test_func
Because 0 is no longer a recognizable value (it's NEVER, which isn't a
good default), we add an emit_alpha_test bool to tell the back-end when
to bother alpha testing.  This lets us only touch crocus with the
change.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14157>
2022-01-14 15:08:09 +00:00
Jason Ekstrand
460a953df5 intel/compiler: Stop using GLuint in brw_compiler.h
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14157>
2022-01-14 15:08:09 +00:00
Francisco Jerez
c6455cfec9 intel/fs: Don't assume packed dispatch for fragment shaders on XeHP.
The current packed dispatch assumptions for fragment shaders seem to
be the reason that the fs-readFirstInvocation-uint-loop Piglit
test-case for the ARB_shader_ballot extension fails on DG2 in
combination with the patches in this series that enable pixel pipe
hashing (thanks Jordan for reporting the regression).  I've confirmed
that the brw_fs_test_dispatch_packing() test fails on DG2 hardware for
fragment shaders, while it succeeds for other shader stages,
indicating that the PSD hardware no longer guarantees packed dispatch.
Disable it.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
2022-01-10 18:27:41 -08:00
Danylo Piliaiev
b8d486f298 nir/algebraic: Separate has_dot_4x8 into has_sdot_4x8 and has_udot_4x8
Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but
not signed dot product.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
2022-01-10 13:20:39 +02:00
Rohan Garg
af13119993 intel/fs: OpImageQueryLod does not support arrayed images as an operand
When we lower SPIR-V to NIR for textures in vtn_handle_texture, we only
bump the number of coordinate components when the op is not a lod query.
Update the assert to take this into account.

This fixes:
  - dEQP-VK.robustness.robustness2.bind.template.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.bind.template.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
  - dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag

Fixes: 231337a1 ("intel/fs/xehp: Assert that the compiler is sending all 3 coords for cubemaps.")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13925>
2022-01-07 10:53:35 +00:00
Jordan Justen
d57b10ab98 intel/compiler: Adjust TCS instance-id for dg2+
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14385>
2022-01-05 16:13:28 -08:00
Henry Goffin
fe617bcca0 intel/compiler/test: Fix build with GCC 7
Without this change, test_fs_scoreboard.cpp does not compile on GCC 7
due to the use of C99 initializers in a C++ source file.

Fixes: c847bfaaf5 ("intel/fs/gen12: Add tests for scoreboard pass")

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14349>
2021-12-30 19:59:52 +00:00
Dave Airlie
4392c24844 intel/compiler: drop unused decleration
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>
2021-12-22 21:37:55 +00:00
Dave Airlie
2692a5f8db intel/compiler: don't lower swizzles in backend.
These are lowered by crocus in the frontend, the key
entries are still used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>
2021-12-22 21:37:55 +00:00
Dave Airlie
e12b0d0d60 intel/compiler: remove gfx6 gather wa from backend.
Crocus lowers this in the frontend, they key member is still used
but reset prior to backend.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>
2021-12-22 21:37:55 +00:00
Marcin Ślusarz
a48f1d51e2 intel/compiler: disable workaround not applicable to gfx >= 11
There's nothing in bspec that would suggest this is still needed.
It only affected gfx 9 and 10.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14280>
2021-12-22 10:13:25 +00:00
Francisco Jerez
e7470a40c5 intel/fs: Add physical fall-through CFG edge for unconditional BREAK instruction.
This adds a missing CFG edge that represents a possible physical
control flow path the EU might take under some conditions which isn't
part of the logical CFG of the program.  This possibility shouldn't
have led to problems on platforms prior to Gfx12, since the missing
control flow edge cannot possibly influence liveness intervals.
However on Gfx12+ it becomes the compiler's responsibility to resolve
data dependencies across instructions, and the missing physical
control flow paths may lead to a WaR data hazard currently not visible
to the software scoreboard pass, which could lead to data corruption.

Worse, the possibility for this path to be taken by the EU increases
on Gfx12+ due to a hardware bug affecting EU fusion -- However the
same physical path can be potentially taken on earlier platforms as
well, so this patch extends the CFG on all platforms for consistency,
even though the lack of this edge shouldn't lead to any functional
issues on platforms earlier than Gfx12.  There are no shader-db
changes on earlier platforms, so there seems to be no disadvantage
from using the same CFG representation as on later platforms.

This issue has ben reported on TGL with the following conformance
test, thanks to Ian for bringing the FULSIM dependency check warning
to my attention:

   dEQP-VK.graphicsfuzz.spv-stable-pillars-volatile-nontemporal-store

Fixes: 4d1959e693 ("intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4940
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Reported-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14248>
2021-12-21 00:43:29 +00:00
Jason Ekstrand
eebb2dedb2 intel/fs: Add a NONE scheduling mode
While our LIFO scheduling mode attempts to optimize for register
pressure, it's often hard for a scheduling algorithm to do better than
the instruction order provided by the shader author.  Shader authors
often do perfectly reasonable things like using texture results
immediately after fetching them or constructing texture coordinates
immediately before the texture op.  When we throw all the instruction
ordering information away, we loose any help the author may have given
us.  By attempting NONE before we fall back to the worst case LIFO mode.

And, yes, I tried this with NONE both before and after LIFO and doing
NONE before LIFO is substantially better, according to shader-db.

    total instructions in shared programs: 19673152 -> 19665202 (-0.04%)
    instructions in affected programs: 33669 -> 25719 (-23.61%)
    helped: 20
    HURT: 0
    helped stats (abs) min: 15 max: 4609 x̄: 397.50 x̃: 107
    helped stats (rel) min: 2.33% max: 67.50% x̄: 14.60% x̃: 9.12%
    95% mean confidence interval for instructions value: -867.61 72.61
    95% mean confidence interval for instructions %-change: -21.74% -7.46%
    Inconclusive result (value mean confidence interval includes 0).

    total cycles in shared programs: 935562500 -> 935020920 (-0.06%)
    cycles in affected programs: 18620349 -> 18078769 (-2.91%)
    helped: 104
    HURT: 48
    helped stats (abs) min: 88 max: 60986 x̄: 8031.48 x̃: 3680
    helped stats (rel) min: 0.61% max: 51.44% x̄: 14.95% x̃: 8.87%
    HURT stats (abs)   min: 10 max: 54724 x̄: 6118.62 x̃: 1530
    HURT stats (rel)   min: 0.13% max: 46.45% x̄: 10.28% x̃: 6.46%
    95% mean confidence interval for cycles value: -5724.34 -1401.71
    95% mean confidence interval for cycles %-change: -9.86% -4.10%
    Cycles are helped.

    total spills in shared programs: 12158 -> 10327 (-15.06%)
    spills in affected programs: 1831 -> 0
    helped: 20
    HURT: 0

    total fills in shared programs: 14749 -> 12635 (-14.33%)
    fills in affected programs: 2114 -> 0
    helped: 20
    HURT: 0

    LOST:   8
    GAINED: 649

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
e6ddee764e intel/fs: Reset instruction order before re-scheduling
The way the current scheduler loop is implemented, each scheduling pass
starts with what the previous pass had.  This means that, if PRE screwed
everything up majorly, PRE_NON_LIFO would have to try to fix it.  It
also meant that tiny changes to one pass would affect every later pass.
Instead, reset the order of the instructions before each scheduling
pass.  This makes the passes entirely independent of each other.

Shader-db results on Ice Lake:

    total instructions in shared programs: 19670486 -> 19670648 (<.01%)
    instructions in affected programs: 25317 -> 25479 (0.64%)
    helped: 2
    HURT: 7
    helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
    helped stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07%
    HURT stats (abs)   min: 8 max: 70 x̄: 24.29 x̃: 12
    HURT stats (rel)   min: 0.41% max: 4.95% x̄: 1.47% x̃: 0.87%
    95% mean confidence interval for instructions value: -1.28 37.28
    95% mean confidence interval for instructions %-change: -0.04% 2.30%
    Inconclusive result (value mean confidence interval includes 0).

    total cycles in shared programs: 935535948 -> 935490243 (<.01%)
    cycles in affected programs: 421994824 -> 421949119 (-0.01%)
    helped: 1269
    HURT: 879
    helped stats (abs) min: 1 max: 12008 x̄: 259.38 x̃: 52
    helped stats (rel) min: <.01% max: 28.02% x̄: 1.12% x̃: 0.14%
    HURT stats (abs)   min: 1 max: 29931 x̄: 322.46 x̃: 20
    HURT stats (rel)   min: <.01% max: 32.17% x̄: 1.74% x̃: 0.22%
    95% mean confidence interval for cycles value: -71.37 28.81
    95% mean confidence interval for cycles %-change: -0.11% 0.21%
    Inconclusive result (value mean confidence interval includes 0).

    total spills in shared programs: 12403 -> 12430 (0.22%)
    spills in affected programs: 1355 -> 1382 (1.99%)
    helped: 2
    HURT: 7

    total fills in shared programs: 15128 -> 15182 (0.36%)
    fills in affected programs: 3294 -> 3348 (1.64%)
    helped: 2
    HURT: 7

    LOST:   21
    GAINED: 28

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
d49d092259 Revert "intel/fs: Do cmod prop again after scheduling"
This reverts commit ba2fa1ceaf.  Doing
optimizations after scheduling but before RA means doing them in the
middle of the scheduling loop which introduces additional dependencies
between one scheduling iteration and the next.  That won't work if we
want to make the scheduling modes independent, at least not unless we
have some way of fully cloning the IR.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
e6f0def97d intel/eu: Don't double-loop as often in brw_set_uip_jip
brw_find_next_block_end() scans through the instructions to find the end
of the block.  We were calling it for every instruction in the program
which is, if you have a single basic block, makes the whole mess a nice
clean O(n^2) when it really doesn't need to be.  Instead, only call
brw_find_next_block_end() as-needed.  This brings it back to O(n) like
it should have been.

This cuts the runtime of the following Vulkan CTS on my SKL box by 5%
from 1:51 to 1:45:  dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
cf98a3cc19 intel/fs: Use OPT() for split_virtual_grfs
Now that we're being conservative in the pass, it's easy to tell when it
makes progress and we can put it in the OPT() macro.  This way, we get
nice INTEL_DEBUG=optimizer dumps for it.  While we're here, fix the
header comment which is massively out-of-date.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
38fa18a7a3 intel/fs: Be more conservative in split_virtual_grfs
Instead of modifying every single instruction, keep track of which VGRFs
are actually split in a bit-set, and only modify the instructions that
actually touch split regs.

This cuts the runtime of the following Vulkan CTS on my SKL box by 45%
from 3:21 to 1:51:  dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
3c89dbdbfe intel/fs: Implement the sample_pos_or_center system value
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Jason Ekstrand
a580fd55e1 intel/fs: Rework emit_samplepos_setup()
This rolls compute_sample_position into emit_samplepos_setup, its only
caller, by using a loop instead of calling it twice.  We also
early-return for the !persample_dispatch case instead of doing it as
part of the sample calculation.  This means that we don't call
fetch_payload_reg() to get sample_pos_reg unless we're actually going to
use it so the function is safe to call even if we haven't set up
sample_pos_reg.  This will be important for the next commit.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Jason Ekstrand
ac7255ed1e intel/fs: Return fs_reg directly from builtin setup helpers
There's no good reason why we're allocating them on the heap and
returning a pointer.  Return the fs_reg directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Kenneth Graunke
7325179bcb intel/compiler: Use uppercase enum values in brw_ir_performance.cpp
This is by far the more common style in Mesa.  It also gives a cue that
e.g. num_dependency_ids is a fixed definition rather than some kind of
local variable maintaining a count.

While hre, we also rename the enums to have full prefixes to prepare for
a future where we use them in multiple files for future backend work.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14182>
2021-12-16 09:00:57 +00:00
Kenneth Graunke
d3f4f23ca3 intel/vec4: Inline emit_texture and move helpers to brw_vec4_nir.cpp
emit_texture() only has one caller, nir_emit_texture().  We may as well
inline that.  Move the associated helper functions for emitting sampler
messages there as well, to keep associated code nearby.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5183
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14191>
2021-12-16 00:09:45 -08:00
Kenneth Graunke
92d194427d intel/vec4: Use nir_texop in emit_texture instead of translating
We eliminated the GLSL IR -> vec4 backend ages ago, so the only caller
uses a nir_texop enum.  Drop a layer of translating.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14191>
2021-12-16 00:09:44 -08:00
Kenneth Graunke
2729a741fc intel/vec4: Use ir_texture_opcode less in emit_texture()
This replaces a bunch of uses of the GLSL IR ir_texture_opcode enum with
the backend opcode, in preparation for removing it altogether.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14191>
2021-12-16 00:09:36 -08:00
Ian Romanick
2ca13abcce intel/fs: Use HF as destination type for F32TOF16 in fquantize2f16
Having an integer destination type instead of a float destination type
confuses the SWSB code.  This causes problems on some Intel GPUs.  Fix
this by using the correct type in the destination of the F32TOF16
opcode.

Gfx7 doesn't have the HF type, so continue to emit W on that platform.
The assertions in brw_F32TO16 (brw_eu_emit.c) are updated to reflect
this.  In scalar mode, UD is never emitted as a destination type for
this opcode, so remove it from the allowed types in the assertion.

I also condidered doing something like de55fd358f ("intel/fs/xehp:
Teach SWSB pass about the exec pipeline of
FS_OPCODE_PACK_HALF_2x16_SPLIT."), but Curro recommended that just using
the correct types is a better fix.  I agree.

v2: Add missing changes to fs_generator::generate_pack_half_2x16_split.
I'm not sure how I (and the Intel CI) missed that the first time. :(

v3: Fix copy-and-paste issue in the v2 fix. Noticed by Tapani.

Reviewed-by: Francisco Jerez <currojerez@riseup.net> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14181>
2021-12-15 20:03:51 +00:00
Rafael Antognolli
a026d2d11c intel/compiler: Assert that unsupported tg4 offsets were lowered for XeHP
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14142>
2021-12-13 16:59:44 -08:00
Jordan Justen
52a55f097f intel/compiler: Use nir_lower_tex_options::lower_offset_filter for tg4 on XeHP
Based on Rafael's:
 * "nir/lower_tex: Add option to lower offset for tg4 too."
 * "intel/compiler: Lower offsets for tg4 on gen9+."
 * "WIP: Do not lower basic offsets."
 * "WIP: intel/compiler: Enable lowering offsets restriction."

But, with these changes:
 * Fixed range checking to be signed 4 bits
 * Converted to filter
 * Apply only to gfx12.5+
 * Use nir_src_is_const / nir_src_comp_as_int (s-b Jason)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14142>
2021-12-13 16:59:37 -08:00
Caio Oliveira
2ad11b39bd intel/compiler: Use a struct for brw_compile_bs parameters
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14139>
2021-12-13 01:08:16 +00:00
Caio Oliveira
58c4a95320 intel/compiler: Use a struct for brw_compile_gs parameters
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14139>
2021-12-13 01:08:16 +00:00
Caio Oliveira
acf2d3c78b intel/compiler: Use a struct for brw_compile_tes parameters
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14139>
2021-12-13 01:08:16 +00:00
Caio Oliveira
7372a48a4a intel/compiler: Use a struct for brw_compile_tcs parameters
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14139>
2021-12-13 01:08:16 +00:00
Jason Ekstrand
b8d04863e2 intel/fs: Drop high_quality_derivatives
We've never bothered to hook it up in crocus or iris.  If we do in the
future, it should probably be a NIR pasa anyway.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Jason Ekstrand
6dc9958bf3 intel/compiler: Get rid of wm_prog_key::frag_coord_adds_sample_pos
This hasn't actually done anything for a while.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Jason Ekstrand
278d12f991 intel/fs,vec4: Drop prog_data binding tables
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Jason Ekstrand
4fa58d27a5 intel/fs,vec4: Drop support for shader time
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Jason Ekstrand
8f3c100d61 intel/fs,vec4: Drop uniform compaction and pull constant support
The only driver using these was i965 and it's gone now.  This is all
dead code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Ian Romanick
ff74d5dd1b intel/compiler: Don't store "scalar stage" bits on Gfx8 or Gfx9
Since 1d71b1a311, only Gfx7 and earlier have any vec4 stages ever.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14128>
2021-12-08 14:59:32 -08:00
Ian Romanick
4563261ad1 intel/compiler: Don't predicate a WHILE if there is a CONT
Previously a predicated BREAK that appeared immediately before the WHILE
would get merged into the WHILE.  This doesn't work if other flow
control (e.g., a CONT) can transfer directly to the WHILE.

On Intel platforms, this fixes the CTS test
dEQP-VK.graphicsfuzz.stable-binarysearch-tree-nested-if-and-conditional.

No shader-db changes on any Intel platform.

When this commit was first created (over a month before it is going to
land), there were some regressions that were prevented by other commits
in MR !13095.  That does not appear to be the case now, so I don't know
what changed.  Basically, the treatment of discard as a combination of
demote and terminate causes additional continues in some loops, and
those continues trigger this bug.  The other commits from that MR
prevent those continues from being generated in the first place.

All Intel platforms had simlar fossil-db results. (Ice Lake shown)
Instructions in all programs: 144419989 -> 144419995 (+0.0%)
SENDs in all programs: 6947332 -> 6947332 (+0.0%)
Loops in all programs: 38277 -> 38277 (+0.0%)
Spills in all programs: 204075 -> 204075 (+0.0%)
Fills in all programs: 319480 -> 319480 (+0.0%)

A few shaders in Doom 2016 were hurt by one instruction each.  It seems
likely that these shaders would have experienced at least some
mis-rendering.

Closes: #4213
Fixes: d13bcdb3a9 ("i965/fs: Extend predicated break pass to predicate WHILE.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14128>
2021-12-08 14:56:32 -08:00
Francisco Jerez
de55fd358f intel/fs/xehp: Teach SWSB pass about the exec pipeline of FS_OPCODE_PACK_HALF_2x16_SPLIT.
This virtual instruction is translated into multiple half float
physical instructions, even though its destination is typically of
integer type, which prevents the software scoreboard pass from
inferring the correct execution pipeline for the virtual instruction
on XeHP+ platforms.  Teach the SWSB lowering pass about this
inconsistency between the IR and physical instruction types.

Fixes among other tests:

dEQP-GLES31.functional.shaders.builtin_functions.pack_unpack.packhalf2x16_compute

Fixes: d4537770bb ("intel/fs: Add helper functions inferring sync and exec pipeline of an instruction.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5685
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14002>
2021-12-08 02:47:11 +00:00