v2 (idr): So much rebasing. Deleted a bunch of code that we're not
going to need yet.
v3 (Ken): bfn inst encoding fix
v4 (idr): Add BFN to brw_get_lowered_simd_width.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
Intel platforms will soon implement both bfi and bitfield_select. bfi is
more efficient for bitfield_insert.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
The eliminated SENDs are from a single app that has a bunch of
fragment shaders with a sequence like:
con 32 %495 = fmul! %203.i, %1 (0.000000)
con 32 %496 = ffma! %203.j, %1 (0.000000), %495
con 32 %497 = ffma! %203.k, %1 (0.000000), %496
con 32 %498 = ffma! %203.l, %1 (0.000000), %497
con 32 %499 = @load_reloc_const_intel (param_idx=1, base=0)
con 32 %500 = @load_reloc_const_intel (param_idx=0, base=0)
con 32 %501 = f2u32 %498
con 32 %502 = umin %501, %172 (0x4)
con 32 %503 = ishl %502, %172 (0x4)
con 32 %504 = load_const (0x00000040 = 64)
con 32 %505 = umin %503, %504 (0x40)
con 32 %506 = iadd %500, %505
The `f2u` is replaced with 0, and that makes the `ffma` dot-product
sequence be unused. Since it is unused, most of the preceeding block
gets eliminated. A lot of instructions after the `f2u` are also
eliminated by other algebraic optimizations. Most importantly, %203 is
the result of a `load_ubo_uniform_block_intel` that is eliminated.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00%
Send messages: 40892036 -> 40887569 (-0.01%)
Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00%
Max live registers: 190030365 -> 190030367 (+0.00%)
Max dispatch width: 47415040 -> 47415024 (-0.00%)
Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00%
Totals from 2234 (0.11% of 1955134) affected shaders:
Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00%
Send messages: 44179 -> 39712 (-10.11%)
Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00%
Max live registers: 367357 -> 367359 (+0.00%)
Max dispatch width: 39184 -> 39168 (-0.04%)
Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00%
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
If the source -1.0 < x < 0.0, simply removing the ftrun will introduce
undefined behavior. By chance of how at least Intel and NVIDIA GPUs
implement f2u, this has Just Worked.
No shader-db changes on any Intel platform.
fossil-db:
Lunar Lake
Totals:
Instrs: 913264354 -> 913264366 (+0.00%)
Cycle count: 104953995530 -> 104953996854 (+0.00%)
Max live registers: 189266026 -> 189266058 (+0.00%)
Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%)
Totals from 24 (0.00% of 1984794) affected shaders:
Instrs: 4669 -> 4681 (+0.26%)
Cycle count: 50610 -> 51934 (+2.62%)
Max live registers: 1222 -> 1254 (+2.62%)
Non SSA regs after NIR: 1174 -> 1126 (-4.09%)
Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001288026 -> 1001288038 (+0.00%)
Cycle count: 92813392671 -> 92813392791 (+0.00%)
Max live registers: 121935383 -> 121935399 (+0.00%)
Max dispatch width: 19949928 -> 19949912 (-0.00%)
Totals from 2 (0.00% of 2284670) affected shaders:
Instrs: 1380 -> 1392 (+0.87%)
Cycle count: 18940 -> 19060 (+0.63%)
Max live registers: 136 -> 152 (+11.76%)
Max dispatch width: 32 -> 16 (-50.00%)
No fossil-db changes on Skylake.
Suggested-by: Georg Lehmann
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
There are no shader-db changes on ELK platforms because those platforms
don't support 8- or 16-bit integer types.
v2: Restrict patterns generated such that the integer limits are exactly
representable in the specified floating point format. With the exception
of the value 0, this requires that float_sz > int_sz. This had no impact
on shader-db or fossil-db on any Intel platform. Noticed by Georg.
v3: Add a missing is_a_number.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 889936056 -> 889934082 (<.01%)
cycles in affected programs: 65806 -> 63832 (-3.00%)
helped: 2 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00%
Spill count: 519861 -> 519813 (-0.01%)
Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01%
Max live registers: 71738626 -> 71738696 (+0.00%)
Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%)
Totals from 1236 (0.16% of 790723) affected shaders:
Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01%
Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25%
Spill count: 16509 -> 16461 (-0.29%)
Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15%
Max live registers: 250112 -> 250182 (+0.03%)
Non SSA regs after NIR: 302368 -> 302114 (-0.08%)
Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01%
Spill count: 530603 -> 530615 (+0.00%)
Fill count: 613231 -> 613273 (+0.01%)
Max live registers: 46559041 -> 46559087 (+0.00%)
Totals from 1237 (0.14% of 905547) affected shaders:
Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01%
Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37%
Spill count: 17518 -> 17530 (+0.07%)
Fill count: 32273 -> 32315 (+0.13%)
Max live registers: 128360 -> 128406 (+0.04%)
Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00%
Max live registers: 46878430 -> 46878462 (+0.00%)
Totals from 1233 (0.14% of 905427) affected shaders:
Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00%
Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10%
Max live registers: 130782 -> 130814 (+0.02%)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
v2: Add a comment around has_f2[ui]_sat explaining which opcodes it
enables. Suggested by Georg. Cast u_uintN_max and friends to double in
nir_opcodes.py. This ensures that an exact conversion is made.
Eliminate duplicate conversions from half float to double. Both noticed
by Georg.
v3: Apply "NaN should be zero" fix suggested by Georg.
Co-authored-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
The imagination-experimental flag has been replaced with the imagination flag,
as the driver is now Vulkan conformant.
Signed-off-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37761>
The Imagination PowerVR Vulkan driver is now conformant on BXS-4-64, so drop the
PVR_I_WANT_A_BROKEN_VULKAN_DRIVER runtime check for this GPU. This eliminates
the need for the user to explicitly opt in via an environment variable.
Signed-off-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37761>
Although the PowerVR driver isn't passing Vulkan 1.2 conformance yet, all the
required support has been implemented and it's very close to passing all the
tests now.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37761>
ES version was missed from extension table
Fixes: 2ce20170 ("mesa: Add support for GL_EXT_shader_clock")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37794>
ES version was missed from extension table
Fixes: c5500cd1 ("mesa: Add support for GL_EXT_shader_realtime_clock")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37794>
With descriptor heap the driver will also have to emit indirect
descriptor heaps in some cases.
Rename couple of things to make them more generic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>
Emitting compute dispatches on SDMA would just hang.
This fixes pending depth/stencil copy tests on transfer queue with
RADV_PERFTEST=transfer_queue.
Fixes: e6c485afb0 ("radv: initialize HiZ metadata during image layout transitions")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37795>
The Intel EU fusion feature needs to be disabled on SEND messages
where either the texture handle, sampler handle, sampler header is not
identical on fused threads.
This is the case in particular with accesses on non-uniform
texture/sampler handles but could also strike with dynamic
programmable offsets (currently disabled).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
The Intel Gfx12.x generation of GPU has an architecture feature called
EU fusion in which 2 subgroups run lock step. A typical case where
this happens is a compute shader with 1x1x1 local workgroup size and a
dispatch command of 2x1x1. In that case 2 threads will be run in lock
step for each of the workgroup.
This has been the sources of some troubles in the backend because one
subgroup can run with all lanes disabled, requiring care for SEND
messages using the NoMask flag (execution regardless of the lane mask).
We found out that other things are happening when 2 subgroups run
together :
- the HW will use the surface/sampler handle from only one subgroup
- the HW will use the sampler header from only one subgroup
So one of the fused subgroup can access the wrong surface/sampler if
the value is different between the 2 subgroups and that can happen
even with subgroup uniform values.
Fortunately we can flag SEND instructions to disable the fusion
behavior (most likely at a performance cost).
This change introduce a new divergence mode that tries to compute
things divergent between subgroups so that we can flag instructions
accordingly.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
nir_recompute_io_bases will modify i/o intrinsics, which is not the
expected behaviour when the keep_intrinsics flag is set.
Fixes: 83aecc8f3f ("mesa/st, nir: commonize unlower_io_to_vars pass")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37725>
mesh doesn't use brw_vue_prog_data. Also, I had been catching TCS
shaders here, and shouldn't.
Fixes: bf76e86bc8 ("brw: Refactor clip/cull distance mask setting into a helper")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37809>
When comparing an vec3 and a vec4 array, scalar type is the same for
both (float). Instead use the array element type to compare (that is,
vec3 vs vec4).
Fixes
spec@glsl-1.20@compiler@invalid-vec4-array-to-vec3-array-conversion.vert
piglit test.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37783>
Sometimes we need to select between ishr/ushr based some condition; this
builder makes this less verbose.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>
Coercing the argument to a bool when we have __builtin_expect but
leaving it unmodified otherwise is a recipe for really subtle bugs. I
don't know if any bugs like that exist currently, but I almost
introduced one in panfrost.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37801>
We need to return true if we need the companion batch.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e60416b4e4 ("anv: use companion batch for operations with HIZ/STC_CCS destination")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37797>
This is nearly identical, except for bindless sampler/texture/image
handling. But we only use it for inputs/outputs, not uniforms, where
there are no bindless handles to worry about.
Deletes a lot of mostly-duplicated code.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>
There are no 64-bit renderable formats so we can't have FS outputs that
are dvecs. This dates back to 2016 and a ton of the backend has been
rewritten, so I think whatever this was trying to solve is no longer a
problem.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>