Previously we parsed a src non-terminal but did nothing with it. Since
the WAIT instruction is kind of weird, in that you have to give it the
same notification subregister for both destination and source, and it
always has an exec size of 1, let's parse a destination instead of a
source. This way, we can parse a writemask rather than a swizzle in
align16 mode, and easily convert the writemask to a swizzle to create
the source register.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5956>
brw_reg::subnr is in bytes, like the subnr field in the instruction
word, but we disassemble the subregister number in units of the type.
For example g0.3<1>F would have a subnr=12.
These non-terminals produce a brw_reg and feed into other non-terminals
that call brw_reg(), where they are passed the subnr that we set here.
brw_reg()'s subnr parameter is expected to be in terms of the register
type, and it is multiplied by the type size to calculate the subnr in
bytes.
In these non-terminals, we don't know the register type yet, so we
must store the subregister number as it was given to us in the .subnr
field and let the brw_reg() constructor handle the conversion to the
canonical byte-based subnr form when it knows the type.
Before this patch, subregister numbers applied to these registers would
be multiplied with the type size twice.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5956>
That's more future proof than setting each bit manually. Looks like we
already miss nir_lower_ufind_msb64 because of that.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5588>
This information is exposed through shader->options->lower_int64_options.
Removing the extra arg forces drivers to initialize this field correctly.
This also allows us to check the int64 lowering options from each int64
lowering helper and decide if we should lower the instructions we
introduce.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5588>
Header is defined at vkGetPipelineCacheData spec, in any vulkan
version, and anv, tu and radv were using the same struct, and v3dv was
about to do the same.
Defining the same struct four times seemed odd, so let's define on a
common place.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6058>
We've hand-rolled this loop 10 places and those are just the ones I
found easily.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
Instead of having separate lists of variables, roughly sorted by mode,
use a single list for all shader-level NIR variables. This makes a few
list walks a bit longer here and there but list walks aren't a very
common thing in NIR at all. On the other hand, it makes a lot of things
like validation, printing, etc. way simpler. Also, there are a number
of cases where we move variables from inputs/outputs to globals and this
makes it way easier because we no longer have to move them between
lists. We only have to deal with that if moving them from the shader to
a nir_function_impl.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
This one's a bit more complex because it filters off only those
variables with mode == nir_var_uniform. As such, it's not exactly a
drop-in replacement for nir_foreach_variable(var, &nir->uniforms).
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
Once we start going through the free list of the descriptor set pool,
we might use a free entry larger than the descriptor set we want to
allocate. When we free that descriptor set, we use the size of the set
rather than the size of the entry that was picked. This leads to leaks
of some amount of descriptor set pool.
This fix saves the size of the entry in the descriptor set so we know
what amount of the pool needs to freed.
v2: Don't bother adding a new size field
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3324
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6084>
We're about to make the SPIR-V -> NIR path generate a bit more complex
SSA chains for certain derefs. This will ensure we don't regress anyone
when we start making vec2's of derefs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
This avoids some performance regressions on Gen12 platforms caused by
SIMD32 fragment shaders reported in titles like Dota2, TF2, Xonotic,
and GFXBench5 Car Chase and Aztec Ruins.
The most obvious pattern in the regressing shaders I identified among
these workloads is that they all had non-uniform discard statements,
which are handled rather optimistically by the current IR analysis
pass: No penalty is currently applied to the SIMD32 variant of the
shader in the form of differing branching weights like we do for other
control flow instructions in order to account for the greater
likelihood of divergence of a SIMD32 shader.
Simply changing that by giving the same treatment to discard
statements as we give to other branching instructions seemed to hurt
more than it helped on platforms earlier than Gen12, since it reversed
most of the improvement obtained from SIMD32 fragment shaders in
Manhattan for no measurable benefit in other workloads (Manhattan has
a handful of shaders with statically non-uniform discard statements
which actually perform better in SIMD32 mode due to their approximate
dynamic uniformity). For that reason this change is applied to Gen12+
platforms only.
I've been running a number of tests trying to understand the
difference in behavior between Gen12 and earlier platforms, and most
of the evidence I've gathered seems to point at EU fusion being the
culprit: Unlike previous generations, on Gen12 EUs are arranged in
pairs which execute instructions in lockstep, giving an effective warp
size of 64 threads in SIMD32 mode, which seems to increase the
likelihood for control flow divergence in some of the affected shaders
significantly.
Fixes: 188a3659ae "intel/ir: Import shader performance analysis pass."
Reported-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5910>
Renaming makes it easier to relate a pciid with device configuration.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
From the Kaby Lake PRM Vol. 7 "Assigning Conditional Flags":
* Note that the [post condition signal] bits generated at
the output of a compute are before the .sat.
Paragraph about post_zero does not mention saturation, but
testing it on actual GPUs shows that conditional modifiers
are applied after saturation.
* post_zero bit: This bit reflects whether the final
result is zero after all the clamping, normalizing,
or format conversion logic.
For signed types we don't care about saturation: it won't
change the result of conditional modifier.
For floating and unsigned types there two special cases,
when we can remove inst even if scan_inst is saturated: G
and LE. Since conditional modifiers are just comparations
against zero, saturating positive values to the upper
limit never changes the result of comparation.
For negative values:
(sat(x) > 0) == (x > 0) --- false
(sat(x) <= 0) == (x <= 0) --- true
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2610
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4167>
If INTEL_DEBUG=no16 is used, then simd16 will not be attempted. This,
in turn prevents simd32 from running, because we attempt to skip
simd32 when simd16 fails to compile.
This change more accurately recognizes when we attempted simd16, but
simd16 failed.
One easy way to cause an issue is to set both no8 and no16. Before
this change, we would be left with no FS program, even though simd32
could still be generated in some cases.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5269>
If no16 was specified, and the shader can't run in simd8 due to the
local_size, then we need to generate a simd32 program.
If both no8 and no16 are specified, then we need to generate a simd32
program.
Rework:
* Drop update of `if` that would have changed `do32` to try simd32
even if simd16 spilled registers. (Caio)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5269>
perf_cfg is enough - it already contains almost all necessary
information and is constructed in a more optimal way (O(n) vs O(n^2)
- it uses hash table to build the unique counter list).
"Almost all", because it doesn't contain OA raw counters, but
we should have not exposed them anyway. Quoting Mark Janes:
"I see no reason to include the OA raw counters in the list that
are provided to the user. They are unusable.
The MDAPI library can be used to configure raw counters in a way
that provides esoteric metrics, but that library is written against
INTEL_performance_query."
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5399>