This is supposed to be set when that stage needs the PrimID sysval
preloaded, except for the VS which doesn't have this bit and instead
infers it from the HS or GS bit (depending on whether tess/GS is
enabled). Therefore for HS, GS, and DS we should set it whenever the
corresponding sysval is there. This includes adding a missing
PC_HS_OUT_CNTL, which I confirmed is set when the HS reads PrimID from
the VS. Note that the DS sysval is currently always enabled whenever
there's a GS, if we were to fix that then we should also change the
logic here.
This doesn't fix anything that I know of, but aligns us more with what
the blob does.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
The previous handling conflated RelPatchID and PrimID, which would
result in incorrect gl_PrimitiveID when doing draw splitting and didn't
work with PrimID passthrough which fills the VPC slot with the "correct"
PrimID value from the tess factor BO which we left 0. Replace PrimID in
the tess lowering pass with a new RelPatchID sysval, and relace PrimID
with RelPatchID in the VS input code in turnip/freedreno at the same
time so that there is no net change in the tess lowering code. However,
now we have to add new mechanisms for getting the user-level PrimID:
- In the TCS it comes from the VS, just like gl_PrimitiveIDIn in the GS.
This means we have to add another register to our VS->TCS ABI. I
decided to put PrimID in r0.z, after the TCS header and RelPatchID,
because it might not be read in the TCS.
- If any stage after the TCS uses PrimID, the TCS stores it in the first
dword of the tess factor BO, and it is read by the fixed-function
tessellator and accessed in the TES via the newly-uncovered DSPRIMID
field. If we have tess and GS, the TES passes this value through to
the GS in the same way as the VS does. PrimID passthrough for reading
it in the FS when there's tess but no GS also "just works" once we
start storing it in the TCS. In particular this fixes
dEQP-VK.pipeline.misc.primitive_id_from_tess which tests exactly that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
DSPATCHID and HSPATCHID, which we mapped gl_PrimitiveID to, are actually
relative to the current subdraw. Subdraws aren't supported yet by turnip
but they are by freedreno for indirect draws.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12166>
If secondary command buffer is emitted within a subpass it may have
barriers which forces us to disable gmem for current renderpass.
Fixes: 20547a110e "tu: delay decision of forcing sysmem due to subpass self-dependencies"
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12219>
Use Piglit's replay profile to measure and store the time that frames
take to render in the GPU.
This job won't run automatically in regular pipelines, but will be
triggered automatically by a script for every successful pre-merge
pipeline.
This is because we want to generate performance data for every relevant
commit merged in main, but we don't want to keep a device busy during
the pre-merge run.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12185>
The previous one had all rendering and setup in a single frame, so
repeatedly replaying it for performance tracking was reaching OOM due to
the repeated creation of resources that weren't being released.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12195>
This is where it should be rather than having to pass it into the
optimisation pass every time.
It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
We would determine that it was unsupported, then ask for the size and
triggered the assertion checking that we never ask for the size of a
combined sampler.
Fixes: ee3495e465 ("turnip: Add support for VK_VALVE_mutable_descriptor_type")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12148>
Blob advertises { 1024, 1024, 64 }, but from tests they all
could be 1024.
Fixes tests:
dEQP-VK.compute.basic.max_local_size_x
dEQP-VK.compute.basic.max_local_size_y
dEQP-VK.compute.basic.max_local_size_z
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9409>
We were trying to calculate how much space they need, That was already
difficult and one of the most opaque and hard-to-verify uses of sub_cs,
but it will become even more difficult with the 3D path. What's worse is
that sometimes we have to touch that path when we start touching
registers that would affect rasterization, and there's no indication
that you have to then recalculate the size etc. Just rip this out and
start keeping a separate CS for it instead. Note that this adds a small
amount of memory wastage and extra buffers (at worst one buffer per
command buffer).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12102>
v2:
- Read mustpass files from vk-default.txt (Matt)
- Remove freedreno atomic geom tests from fail list (Emma)
- Move freedreno flake to separated line (Emma)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12069>
GitLab doesn't merge the rules array from a job that is extended, so we
were missing the changes rules.
To avoid this, create a .freedreno-rules-restricted job that includes
the changes rules and the restricted user checks.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Fixes: 92f9141f00 ("ci/freedreno: Test with non-redistributable traces")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5139
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12122>
This is much more maintainable, extensible, and easy to read than
hand-rolled structs approximating assembly. This also removes the last
use of the old hand-written packing structs. There are a few minor
differences:
- The shaders are larger because ir3 currently doesn't support (rpt),
which means that some shaders are larger than one instrlen and the
current logic has to be extended to allow for that. This seems a small
price to pay, ir3 will gain support for (rpt) eventually, and we
shouldn't have limitations like this baked in anyway. For example some
GL blob r8g8 <-> r16 copy shaders are apparently quite large.
- Due to the inability to switch inputs/outputs on the fly, we need to
split the VS into two variants. I made the layer-writing variant also
used for other clears, because the old method of overloading c0.z/c1.z
to mean both "src x coordinate" and "z clear value" in the same shader
seemed too clever and I didn't want to add yet another variant. This
means that non-layered clears will also write the layer (to 0), but
that shouldn't be a big deal performance-wise.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12079>
All vulkan drivers have been copying anv's code to convert
VkSpecializationInfo into nir_spirv_specialization.
Recently there was a Vulkan spec change on allowed values for
VkSpecializationInfo, and all drivers got affected.
This commits creates a new helper, and uses it on all Vulkan Mesa
drivers.
v2: use (uint8_t*) castings, instead of void*, to avoid C2036 with
MSVC (detected by the CI, inspired on what radv was doing)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12047>
This is required (at least for me on x86) to get the tool to pass it's
own test, otherwise it fails the build_id assertion.
Fixes: 1462b00391
("freedreno/ir3: Add a unit test for our disassembler.")
Acked-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12084>
DXVK always inserts vertex stage subpass self-dependency for every
subpass regardless of whether there actually would be a barrier.
This effectively disabled gmem rendering with DXVK.
Thus we delay the decision to disable gmem rendering until we
see a barrier with vertex stages.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12038>
Add a job to test with traces that we cannot redistribute, listed in a
separate file. Since those traces might not be accessible by everyone,
this job is created only when the pipeline is triggered by `marge-bot`.
This job is optional because otherwise it could be blocking a merge
request of someone who cannot really debug the issue due to lack of
access to these traces.
The documentation available under `docs/ci` goes into more details
explaining the rationale behind optional traces.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6441>
v1. Hyunjun Ko <zzoon@igalia.com>
- Add to hanlde VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE
- Don't support VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT and
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER
v2. Hyunjun Ko <zzoon@igalia.com>
- Fix some indentations and nitpicks.
- Add the extension to features.txt
v3. Hyunjun Ko <zzoon@igalia.com>
- Remove unnecessary asserts.
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9322>
If you didn't have a freed+ready instruction, you'd redo the live_effect
and check_instr() logic multiple times per instr. Replace the multiple
loops in each function with a ranking that I think is more readable,
reducing the overhead in the process.
debugoptimized dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
runtime goes from ~3.5s -> ~3.0s on my lazor. No shader-db change.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11855>
Newer a6xx devices drop this packet from the sqe firmware, and use
direct (pkt4) register writes instead for the few cases that previously
used CP_REG_WRITE.
The turnip part was adapted from Jonathans patch on !10892
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
A step towards getting rid of checks for gpu_id sprinkled around.
Checking major generation is ok, but checking for == or >= a specific
gpu_id is going to start getting messy as we add more a6xx.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
This way we can make the tables const. At the same time, for a6xx, this
introduces a "sub-generation template" to reduce the copy/paste for
parameters which are keyed to the sub-generation. It also explicitly
lists every supported GPU, to get rid of duplicate lists of supported
gpus between the device-info and drivers.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
Replace it with a calculation which works for all current GPUs.
Duplicated the calculation in both drivers because freedreno_dev_info isn't
meant for derived parameters (and drivers might want to just calculate on
the fly instead).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
- It hurts users with newer firmware who don't need the workaround
- Kernel now rejects older firmware due to security issues, so this will
prevent users from using older firmware anyway.
- Only whitelisting 650 enables the workaround by default for any new GPUs
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
No need to include the same BO multiple times in the long-lived ringbuffer
object's list of relocs to be added to the submit.
Improves non-TC drawoverhead -test 9 (8 tex updates) throughput by 1.4901%
+/- 0.8705% (n=20)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11697>
On drawoverhead -test 9 (8 texture changes), this saves us 172kb of
memory. That's only ~1% of the GEM memory while the test is running, but
more importantly it saves us 29% of the gem BO allocations.
non-TC drawoverhead -test 9 (8 texture change) throughput 0.449019% +/-
0.336296% (n=100), but this gets better as we get better suballocation
density.
Note that this means that all fd_ringbuffer_new_object calls can now
return data aligned to 64 bytes, instead of 4k. We may find that we need
to increase it if some of our objects (tex consts, sampler consts, etc.)
require more alignment than that. But, this may help non-drawoverhead
perf if any of our RB objects have a cache in front of them (indirect
consts?) and we don't have most of our data in the same cache set any
more.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11697>