Commit graph

2524 commits

Author SHA1 Message Date
Ian Romanick
7873edee6e intel/fs: Use specialized version of regions_overlap in opt_copy_propagation
Since one of the register must always be either VGRF or FIXED_GRF, much
of regions_overlap and reg_offset can be elided.

On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-0.29% ± 0.097% (n = 5, pooled s = 0.361697).

Using a release build, improves performance of compiling shaders from
batman_arkham_city_goty.foz by -3.3% ± 0.04% (n = 5, pooled s =
0.178312).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
2023-04-06 19:07:50 +00:00
Ian Romanick
43cb42df7c intel/compiler: Micro optimize inst_is_in_block
This function only exists in builds with assertions, so it only matters
there.

On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a debugoptimized build, improves
performance of Vulkan CTS "deqp-vk --deqp-case='dEQP-VK.*spir*'" by
-5.2% ± 0.16% (n = 5, pooled s = 0.657887).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
2023-04-06 19:07:50 +00:00
Ian Romanick
d47f521ee4 intel/compiler: Use NIR_PASS instead of NIR_PASS_V
Reduce debug log spam by only logging the shader if a pass made some
changes. This can also elide some nir_validate calls in debug builds.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
2023-04-06 19:07:50 +00:00
Ian Romanick
fb950a9edf intel/compiler: Remove one overload of backend_instruction::insert_before
The version that takes a list of instructions is not used. I did not do
any archaeology to find out when the last user was removed.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
2023-04-06 19:07:50 +00:00
Emma Anholt
f1ea6c1b40 intel: Always call nir_lower_frexp.
We have NIR lowering for Vulkan, and rely on GLSL's lowering in the
frontend, but this will let us drop the GLSL lowering.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
2023-04-06 02:32:01 +00:00
Jordan Justen
eef7a117a1 intel/compiler: Support fmul_fsign opt for fp64 when int64 isn't supported
MTL support fp64, but not int64. The fsign(double(x))*FOO optimization
would try to use a 64-bit int xor operation to conditionally toggle
the sign bit off the result.

Since this only affects high bit of the result, we can do a 32-bit
move of the low dword, and a 32-bit xor on the high dword.

Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp64.input_args.modf_denorm_flush_to_zero
on MTL.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22259>
2023-04-05 18:48:21 +00:00
Lionel Landwerlin
e25aee8e34 intel/fs: also allow vec8+ vectorization of load_global_const_block_intel
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
2023-04-05 12:32:56 +00:00
Lionel Landwerlin
a358b97c58 intel/fs: optimize uniform SSBO & shared loads
Using divergence analysis, figure out when SSBO & shared memory loads
are uniform and carry the data only once in register space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
2023-04-05 12:32:56 +00:00
Lionel Landwerlin
275ad509c1 intel/fs: factor out lsc surface descriptor settings
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
2023-04-05 12:32:56 +00:00
Lionel Landwerlin
76698f3abd intel/fs: copy instruction sources in logical send lowering
Having references to inst->src[X] when you're also modifying
inst->src[X] is a receipe for disaster. Making changes to the lowering
code I've been bitten quite a few times by this take copies of all
sources to do the lowering.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
2023-04-05 12:32:56 +00:00
Lionel Landwerlin
adb8c30436 intel/fs: UNDEF fixup_nomask_control_flow temp register
Ensure that the register's liveness is not expanded to loops.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
2023-04-05 12:32:56 +00:00
Lionel Landwerlin
362a07db3a intel/fs: don't consider fixup_nomask_control_flow SENDs predicate
Those SENDs are still doing a full register write. We just inserted
some predication for a workaround.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
2023-04-05 12:32:56 +00:00
Lionel Landwerlin
34d8bfe65f intel/fs: run VGRF compaction just before max live register accounting
There are a number of instances of the dead code elimination pass that
could reduce the count. For some reason this also seems to affect
register allocation itself.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
2023-04-05 12:32:56 +00:00
Ian Romanick
2016d9f46c intel/fs: Rework the loop of opt_combine_constants that collects constants
This is a bit more wordy, but it will greatly simplify some future
changes.

v2: Rebase on ADD3 changes.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>
2023-04-03 21:50:06 +00:00
Ian Romanick
9e4bb4bfcf intel/fs: Refactor part of opt_combine_constants to a separate function
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>
2023-04-03 21:50:06 +00:00
Ian Romanick
593cde0432 intel/fs: Output opt_combine_constants debug to stderr
It's a lot more useful to have it in the same stream with the
INTEL_DEBUG=fs output.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22274>
2023-04-03 21:50:06 +00:00
Patrick Lerda
5d85966805 intel: fix memory leak related to brw_nir_create_passthrough_tcs()
Indeed, the parameter "mem_ctx" was not processed.

For instance, this issue is triggered with the crocus driver and
"piglit/bin/shader_runner tests/spec/arb_tessellation_shader/execution/compatibility/tes-clip-vertex-different-from-position.shader_test -auto -fbo":
SUMMARY: AddressSanitizer: 235216 byte(s) leaked in 48 allocation(s).

Fixes: 96ba0344db ("intel: Use common helpers for TCS passthrough shaders")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22173>
2023-03-30 10:52:07 +00:00
Ian Romanick
782de1932c intel/fs: Don't copy propagate from saturate to sel
There are already NIR algebraic optimizations (see also ac6646129f
("nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.") that
will try to remove the saturate from things like

    fmax(0.5, fsat(x))

This basically reverts 40aeb558ce ("i965/fs: Allow propagation of
instructions with saturate flag to sel"). That commit message had no
shader-db information, so it's unclear whether this actually helped
anything ever.

No shader-db changes on any Intel platform.

One shader in Far Cry New Dawn was affected.

Cycles in all programs: 10933090738 -> 10933090736 (-0.0%)
Cycles helped: 1

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>
2023-03-29 23:48:19 +00:00
Marcin Ślusarz
32107d8b5a intel/compiler: compactify locations of mesh outputs
Needed in support of anv code for Wa_14015590813.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17622>
2023-03-29 18:35:55 +00:00
Faith Ekstrand
789992b7c9 intel: Drop some author comments and update Faith's name
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22120>
2023-03-26 00:16:25 +00:00
Sagar Ghuge
cece2aa2c1 intel/compiler: Add Wa_14014063774 for slm_fence
Before SLM fence compiler needs to insert SYNC.ALLWR in order to avoid
the SLM data race.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22050>
2023-03-25 00:45:04 +00:00
Mark Janes
33d03e57ad intel/fs: use generated helpers for Wa_14013363432 / Wa_14012688258
Wa_14013363432 is a clone of Wa_14012688258.  It does not apply to all
gfx 12.5 platforms.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21745>
2023-03-23 19:13:09 +00:00
Tapani Pälli
6538c5bcd4 intel/fs: restore message layout changes for cube array
This reverts commit bc04e2daca that handled the change as a WA while
this is about a new feature, change done in message layout. Patch also
changes the original comment to not refer to Wa but bspec page.

Fixes: bc04e2daca ("intel/fs: use generated helpers for Wa_1209978020 / Wa_18012201914")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22068>
2023-03-22 20:18:11 +00:00
Lionel Landwerlin
2acc2f18ea intel/compiler: report max dispatch width statistic
Most tools looking at shader stats assume that there is only a single
resulting binary shader out of a single input. On Intel HW this is not
always the case. So having a statistic on each variant that reports
the maximum dispatch width helps showing improvement on a single
shader in terms of how large we manage to compile it.

For shaders that can be compiled in multiple SIMD width (like fragment
shaders), this will report the maximum dispatch width in the
statistics of each variants.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22014>
2023-03-21 11:53:04 +00:00
Iván Briano
4dd81b4e2f intel/fs: handle interpolation modes for at_sample and at_offset too
Fixes dEQP-VK.draw.*.linear_interpolation.*

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19647>
2023-03-18 10:18:15 +00:00
Francisco Jerez
76b4255cd8 intel/fs: Fix register coalesce in presence of force_writemask_all copy source writes.
This fixes the behavior of register coalesce in cases where the source
of a copy is written elsewhere in the program by a force_writemask_all
instruction, which could cause the overwrite to be executed for an
inactive channel under non-uniform control flow, causing
can_coalesce_vars() to give incorrect results.  This has been reported
in cases like:

> while (true) {
>    x = imageSize(img);
>    if (non_uniform_condition()) {
>       y = x;
>       break;
>    }
> }
> use(y);

Currently the register coalesce pass would coalesce x and y in the
example above, which is invalid since in the example above imageSize()
is implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence coalescing y and x into the same register changes the
behavior of the program.

Note that this is a regression introduced by commit a4b36cd3dd.  In
order to avoid the problem without reverting that patch, we prevent
register coalesce if there is an overwrite of the source with
force_writemask_all behavior inconsistent with the copy and this
occurs anywhere in the intersection of the live ranges of source and
destination, even if it occurs lexically before the copy, since it
might be physically executed after the copy under divergent loop
control flow.

Fixes: a4b36cd3dd ("intel/fs: Coalesce when the src live range is contained in the dst")
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
2023-03-17 03:05:24 -07:00
Francisco Jerez
d4015bcb38 intel/fs: Fix copy propagation dataflow analysis in presence of force_writemask_all ACP overwrites.
This fixes the behavior of copy propagation in cases where either the
source or destination of an ACP is overwritten elsewhere in the
program by a force_writemask_all instruction, which could cause the
overwrite to be executed for an inactive channel under non-uniform
control flow, causing the current per-channel dataflow propagation to
give incorrect results.  This has been reported in cases like:

> while (true) {
>    x = imageSize(img);
>    if (non_uniform_condition()) {
>       y = x;
>       break;
>    }
> }
> use(y);

Currently the copy propagation pass would propagate copy 'y = x' into
'use(y)', which is invalid since in the example above imageSize() is
implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence replacing 'y' with 'x' at 'use(y)' changes the behavior
of the program.

This patch extends the global dataflow analysis algorithm to determine
whether there is any control flow path from a given copy to an
overwrite of its source or destination which has force_writemask_all
behavior inconsistent with the copy, and in such case prevents copy
propagation for that ACP entry at any point of the program which can
be reached from the overwrite, even if the copy is statically
re-executed along all such control flow paths (as in the example
above), since the execution of the overwrite for a given channel i may
corrupt other channels j!=i inactive for the subsequently re-executed
copy.

Note that a simpler solution has been attempted which fully shuts down
copy propagation if such a force_writemask_all ACP overwrite is
present /anywhere/ in the program regardless of its location in the
control flow graph, however that led to large shader-db regressions in
some programs from shader-db (like a CS from Car Chase which would
emit 53% more instructions).  With this solution the only handful of
shaders that suffer instruction count regressions seem to be getting
misoptimized right now (e.g. some compute shaders from Deus Ex
Mankind).  This solution doesn't seem to affect the run-time of
shader-db significantly, it's less than 1% higher with the fix
applied.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
2023-03-17 03:05:20 -07:00
Francisco Jerez
1c1be23497 intel/fs: Track force_writemask_all behavior of copy propagation ACP entries.
force_writemask_all determines whether all channels of the copy are
actually valid, and may be required to be set for it to be propagated
safely in cases where the destination of the copy is used by another
force_writemask_all instruction, or when the copy occurs in a
divergent control flow block different from its use.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
2023-03-17 03:05:18 -07:00
Kenneth Graunke
14f9f98dcb i965/vec4: Implement uclz in the vec4 backend
Commit 28311f9d02 moved ufind_msb lowering to NIR and started emitting
uclz.  Unfortunately, the vec4 backend never actually implemented uclz.

It's trivial to do.  Now it does.

Fixes: 28311f9d02 ("nir: intel/compiler: Move ufind_msb lowering to NIR")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
2023-03-17 09:01:18 +00:00
Kenneth Graunke
e7ea2aa46c intel/fs: Make bld.F16TO32 actually emit F16TO32 not F32TO16
Ahem, "add builder helpers that work on Gfx7"...now might actually work.
Too much copy and paste...

Fixes: 966995d911 ("intel/fs: Add builder helpers for F32TO16/F16TO32 that work on Gfx7.x")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
2023-03-17 09:01:18 +00:00
Kenneth Graunke
84197bc0a4 intel/vec4: Retype texture/sampler indexes to UD
generate_tex() asserts that sampler_index.type == UD, but commit
83fd7a5ed1 removed the uint temporary, which caused us to see D at
some points.  Really, either should be fine, but let's just put the
UD retype back.  This fixes a ton of things in crocus.

Fixes: 83fd7a5ed1 ("intel: Use nir_lower_tex_options::lower_index_to_offset")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
2023-03-17 09:01:18 +00:00
Lionel Landwerlin
56474fae93 intel/fs: fix subgroup invocation read bounds checking
nir->info.subgroup_size can be set to an enum :
  SUBGROUP_SIZE_VARYING = 0
  SUBGROUP_SIZE_UNIFORM = 1
  SUBGROUP_SIZE_API_CONSTANT = 2
  SUBGROUP_SIZE_FULL_SUBGROUPS = 3

So compute the API subgroup size value and compare it to the dispatch
size to determine whether we need some bound checking.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9ac192d79d ("intel/fs: bound subgroup invocation read to dispatch size")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21856>
2023-03-14 12:15:48 +00:00
Lionel Landwerlin
bf59cfcee1 intel/fs: prevent large vector ops generated by peephole_ffma
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>
2023-03-14 10:38:50 +00:00
Lionel Landwerlin
bc08f43991 intel/fs: add MOV source count validation
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>
2023-03-14 10:38:50 +00:00
Lionel Landwerlin
ed3c2f73db intel/fs: fixup sources number from opt_algebraic
Fixes issues with register_coalesce :

fossilize-replay: brw_fs_register_coalesce.cpp:297: bool fs_visitor::register_coalesce(): Assertion `mov[i]->sources == 1' failed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>
2023-03-14 10:38:50 +00:00
Lionel Landwerlin
18bdc71459 intel/fs: fix nir_opt_peephole_ffma max vec assumption
There can be larger vec than vec4.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>
2023-03-14 10:38:50 +00:00
Lionel Landwerlin
efde1917c9 intel/fs: don't SEND messages as partial writes
For instance, to load uniform data with the LSC we usually rely on
tranpose messages which have to execute in SIMD1. Those end up being
considered as partial writes so within loops their life span spread to
the whole loop, increasing register pressure.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21867>
2023-03-14 10:10:32 +00:00
Ian Romanick
28311f9d02 nir: intel/compiler: Move ufind_msb lowering to NIR
Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Cycles in all programs: 9098346105 -> 9098333765 (-0.0%)
Cycles helped: 6

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
08ca862ef8 intel/compiler: Tighter src and dest size bounds checking for some opcodes
Enforce the sizes listed in the Skylake PRM:

BFREV:
    source types: *D
    destination types: *D

CBIT:
    source types: UB, UW, UD
    destination types: UD

FBH:
    source types: D, UD
    destination types: UD

FBL:
    source types: UD
    destination types: UD

LZD:
    source types: D, UD
    destination types: UD

v2: Update BFREV commit message documentation. Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
0cc7bf63b7 nir: intel/compiler: Move ifind_msb lowering to NIR
Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so
no @32 annotation is required.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Ian Romanick
15c6c859cf intel/compiler: Lower find_lsb in NIR
No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>
2023-03-10 15:27:17 +00:00
Eric Engestrom
f5d3d1e7ed meson: inline gtest_test_protocol now that it's always 'gtest'
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21485>
2023-03-10 07:20:29 +00:00
Sagar Ghuge
9a34b2ab0e intel/compiler: Add swsb_stall debug option
When enabled, on gfx12 plus, we will add the sync nop instruction after
each instruction to make sure that current instruction depends on the
previous instruction explicitly.

This option will help us to get a hint if something is missing or broken
in software scoreboard pass.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21797>
2023-03-10 06:55:39 +00:00
Kenneth Graunke
dfe652fb03 intel/eu: Simplify brw_F32TO16 and brw_F16TO32
Now that we aren't using them on Gfx8+ we can drop a lot of cruft.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00
Kenneth Graunke
c590a3eadf intel/fs: Move packHalf2x16 handling to lower_pack()
This mainly lets the software scoreboarding pass correctly mark the
instructions, without needing to resort to fragile manual handling in
the generator.

We can also make small improvements.  On Gfx 8LP-12.0, we no longer have
the restrictions about DWord alignment, so we can simply write each half
into its intended location, rather than writing it to the low DWord and
then shifting it in place.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00
Kenneth Graunke
f5e5705c91 intel/fs: Use F32TO16/F16TO32 helpers in fquantize16 handling
I originally thought that we were intentionally emitting the legacy
opcodes here to make them opaque to the optimizer, so that it wouldn't
eliminate the explicit type conversions, as they're actually required
to do the quantization.  But...we don't actually optimize those away
currently anyway.  So...go ahead and use the helpers for consistency.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00
Kenneth Graunke
44c6ccb197 Revert "intel/fs: Fix inferred_sync_pipe for F16TO32 opcodes"
With the previous patch, we no longer need to special case this, as we
emit a MOV with an HF source, rather than F16TO32 with an UW source,
on all platforms that need scoreboarding.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00
Kenneth Graunke
309ec3725a intel/fs: Use new F16TO32 helpers for unpack_half_split_* opcodes
This gets us a MOV at the IR level on Gfx8+ which should be more
optimizable than F16TO32.  It also removes confusion about which
pipe which the instruction will run on.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00
Kenneth Graunke
78bf53904e intel/fs: Delete a TODO about using brw_F32TO16.
We can just use the new builder helpers to get the optimization
advantages of a MOV on Gfx8+ while also getting the necessary F32TO16
on Gfx7.x and yet not worry too hard about it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00
Kenneth Graunke
966995d911 intel/fs: Add builder helpers for F32TO16/F16TO32 that work on Gfx7.x
These take care of emitting the F32TO16/F16TO32 instructions on Gfx7.x
but otherwise just emit a type converting MOV on Gfx8+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>
2023-03-09 23:26:17 +00:00