While the idea of being aggressive sounds like a good one, in practise it
creates vectorized load/stores that are not optimal.
This makes it so that we only ever create aligned and supported vector
sizes that prevents those issues.
Totals:
CodeSize: 8662362848 -> 8662362240 (-0.00%); split: -0.00%, +0.00%
Number of GPRs: 47508046 -> 47508014 (-0.00%)
Static cycle count: 4713321839 -> 4713285952 (-0.00%); split: -0.00%, +0.00%
Spills to memory: 45073 -> 45061 (-0.03%)
Fills from memory: 45073 -> 45061 (-0.03%)
Max warps/SM: 50564816 -> 50564832 (+0.00%)
Totals from 689 (0.06% of 1163204) affected shaders:
CodeSize: 26314320 -> 26313712 (-0.00%); split: -0.02%, +0.02%
Number of GPRs: 60914 -> 60882 (-0.05%)
Static cycle count: 156504342 -> 156468455 (-0.02%); split: -0.05%, +0.02%
Spills to memory: 15453 -> 15441 (-0.08%)
Fills from memory: 15453 -> 15441 (-0.08%)
Max warps/SM: 18640 -> 18656 (+0.09%)
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40293>
On Turing, the hardware rely on the viewport index for FSR.
If not all viewports are defined, we will end up not rendering
anything when selecting the primitive shading rate.
This patch makes it that we now broadcast the viewport and scissor 0
likes the proprietary driver.
This fixes "dEQP-VK.mesh_shader.ext.builtin.primitive_shading_rate_*" on
Turing.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 2fb4aed9 ("nvk: Advertise VK_KHR_fragment_shading_rate")
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40314>
We are going to need to reuse those functions to fix FSR support on
Turing.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40314>
This is intended for backends that do SIMD within a register, like we do.
Helps with register pressure. This will also prevent f2f from being
scalarized, which will help on Ampere+ as a later patch will use F2FP for
those.
Totals:
CodeSize: 8662362848 -> 8662332112 (-0.00%); split: -0.00%, +0.00%
Static cycle count: 4713321839 -> 4713330409 (+0.00%); split: -0.00%, +0.00%
Spills to reg: 149117 -> 149128 (+0.01%)
Fills from reg: 170680 -> 170693 (+0.01%)
Totals from 19 (0.00% of 1163204) affected shaders:
CodeSize: 732208 -> 701472 (-4.20%); split: -4.22%, +0.02%
Static cycle count: 1670226 -> 1678796 (+0.51%); split: -0.10%, +0.61%
Spills to reg: 517 -> 528 (+2.13%)
Fills from reg: 486 -> 499 (+2.67%)
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40298>
We requires this to be aligned to a 8 byte granuality.
This is something that came up with mesh shader enablement so let's
avoid this footgun with some assertion.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40407>
Test Kepler as it was commented out but everything is
running fine.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40407>
Not sure why it was missing but it should be part of it.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40407>
Needed for some FSR macro changes I want to test.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Fixes: 7d6cc15ab8 ("nvk/mme: Add a unit test framework for driver macros")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40407>
On SM86+, we can use a 16-bit unsigned offset along side the register
for it.
This adds a new base indice that will be used for it, integration with
nir_opt_offsets and a lowering pass to get ride of the base on
unsupported generations.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39716>
ISBERD/ISBEWR allow raw manipulation of the various ISBE spaces
where attributes are stored.
This extends the implementation of ISBERD to support the additional
elements added in its intrinsic and implement ISBEWR intrinsic while
extending the ISBE space sharing detection pass.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39716>
Adds a new intrinsic allowing to do raw write in the various ISBE spaces
where attributes are stored.
This also adapt isberd_nv to map to what we have since SM70+.
This will be used to support mesh shaders.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39716>
This is new in SM75 (Turing). Let's use it because it allows us to get rid
of the if/else around bound checked global loads.
There are some changes in fossils, but it seems that's mostly due to CFG
optimizations doing things a bit differently?
Totals:
CodeSize: 9442152688 -> 9442133184 (-0.00%); split: -0.00%, +0.00%
Static cycle count: 6120910991 -> 6120907718 (-0.00%); split: -0.00%, +0.00%
Spills to reg: 184789 -> 184810 (+0.01%)
Fills from reg: 223831 -> 223860 (+0.01%); split: -0.00%, +0.01%
Totals from 334 (0.03% of 1163204) affected shaders:
CodeSize: 22020752 -> 22001248 (-0.09%); split: -0.10%, +0.01%
Static cycle count: 26582978 -> 26579705 (-0.01%); split: -0.01%, +0.00%
Spills to reg: 3110 -> 3131 (+0.68%)
Fills from reg: 3401 -> 3430 (+0.85%); split: -0.03%, +0.88%
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
legalize_ext_instr wasn't doing anything besides lowering uniform sources
and panicing on a bunch of Source types.
Having a common helper looping over all sources doesn't make much sense,
because all the instructions are widly different in regards to UGPRs. The
panics will be hit while emitting the sources as well, so this helper
provided little help and wasn't flexible enough for what we need.
Furthermore some instructions like LDG also take an additional input
predicate that legalize_ext_instr can't handle.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
We'll start to lower load_global_bounded there and that will invalidate
loop analysis, because the amount of instructions will change within a
block.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272>
We can avoid the stalls from subc switches by avoiding using the copy engine
during vkCmdBeginConditionalRenderingEXT. Implement this by loading the
cond render value using the MME, since the hardware doesn't have a
suitable 32-bit comparison itself.
This brings the Sascha Willems conditionalrender demo from
from 1661 to 8334 fps on my blackwell system with all meshes disabled.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40277>
This adopts the device internal app workaround layer from radv
The layer allows to fix up game input in the layer instead of
adding workarounds within the driver.
Initially this only includes the workaround for Metro exodus as
I have verified that it fixes a crash on NVK. Follow up commits
can add the other relevant workarounds when the fixes are verified
to be needed for NVK.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39870>
P_IMMD() is annoying because it always uses two words in unoptimized
builds but sometimes uses one word with optimizations on. This can make
it difficult to allocate the correct pushbuf size.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39869>
These ULL literals are all meant to refer to 64-bit types. Use the
UINT64_C macro so they work on platforms with 128-bit long long.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39869>
This helps debugging, a LOT.
v2 (mhenning): Use nak_constant_offset_info,
enable based on NDEBUG, and
fix use after free in nir pass
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39869>
Move some of this constant data out of fs_key and into a constant
struct. This reduces the size of the fs_key and gives us a spot to add
additional non-fs offsets like printf buffers.
Passing this through a const global may be a little odd but it has the
benefit that we don't need to hash the offsets as additinal state
relevant to a compilation, since they can never change without
a modification to the binary.
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39869>
Follows what other drivers do, if we do not need to emit a draw, just
bail out early.
Signed-off-by: Mary Guillemard <mary@mary.zone>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39783>