Commit graph

157955 commits

Author SHA1 Message Date
Marek Olšák
a60181e8f2 radeonsi: use do..while loops and other cosmetic changes in display list path
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
e9a0cae1a1 radeonsi: use si_cp_dma_prefetch_inline for prefetching VBO descriptors
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
0e574c801c radeonsi: remove temporary si_context::vb_descriptor_user_sgprs
We were writing descriptors into si_context and then copying them into
the command buffer. Just write them into the command buffer directly.
Also set the pointer to VBO descriptors right after them.

When we start a new command buffer or we finish blitting, we no longer
restore precomputed VBO descriptors. Instead, we just reupload them again.
It's a compromise to have the common path simpler and faster (maybe).

This removes a lot of stuff. Now the VBO descriptor upload path looks
very similar to the display list path.

There was an accidental hidden optimization that is now documented as
"last_const_upload_buffer".

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
a5d37e161d radeonsi: remove vb_descriptors_gpu_list only used for debugging
While this is nice to have, it doesn't include VBO descriptors in user
SGPRs, and we need to remove it, so that we can simplify the VBO code.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
b4cef2487b radeonsi: add vertex buffers into the BO list in set_vertex_buffers
This is more straightforward. Also, radeon_add_to_buffer_list makes
writing VBO descriptors into the command buffer slower after that code
is reordered in following commits. This seems to be the only way that
isn't slower.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Marek Olšák
c4ffac8a17 radeonsi: merge both fail paths in si_set_vb_descriptor
I removed the assertion because apps are allowed to set an offset greater
than the size.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17933>
2022-08-08 19:12:12 +00:00
Connor Abbott
a7e64ab63c tu: Fix sysmem depth attachment clear flushing
We can't invalidate CCU if there is any dirty data that hasn't been
flushed yet. In the case where we clear depth, we know that the depth
attachment itself isn't dirty but there may be dirty data from other
renderpasses. Therefore we need to flush before invalidating depth.

Fixes: 487aa80 ("tu: Rewrite flushing to use barriers")
Closes: #6987
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17940>
2022-08-08 17:30:00 +00:00
Pierre-Eric Pelloux-Prayer
de55058cbc docs: document DRI_PRIME
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17298>
2022-08-08 16:52:44 +00:00
Pierre-Eric Pelloux-Prayer
903e99150f vulkan/device_select: allow DRI_PRIME=vendor_id:device_id
To match the GL side.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17298>
2022-08-08 16:52:44 +00:00
Pierre-Eric Pelloux-Prayer
a71b92fff8 vulkan/device_select: print the dri_prime warning only if needed
The next commit will allow a different DRI_PRIME syntax, so move
this printf in the right if block.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17298>
2022-08-08 16:52:44 +00:00
Pierre-Eric Pelloux-Prayer
4005ba3ed4 loader: allow DRI_PRIME=vendor_id:device_id syntax
This syntax allows to select a specific GPU without depending on
the pci bus information.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17298>
2022-08-08 16:52:44 +00:00
Pierre-Eric Pelloux-Prayer
6d50e4cdc1 loader: don't return empty string in loader_get_dri_config_device_id
The caller expects a NULL return value if the option isn't set.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17298>
2022-08-08 16:52:44 +00:00
Jesse Natalie
6daf99fcb2 ci/windows: Re-enable Windows runners
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17942>
2022-08-08 16:10:21 +00:00
Samuel Pitoiset
d4b8abe511 radv: simplify radv_bind_dynamic_state() slightly
This adds RADV_CMP_COPY to compact copies. Based on ANV.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17649>
2022-08-08 14:59:47 +00:00
Samuel Pitoiset
18e9ba3e3b radv: remove unused states parameter from some radv_emit_XXX() helpers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17649>
2022-08-08 14:59:47 +00:00
Samuel Pitoiset
1f6e32ff7c radv: simplify saving/restoring all dynamic states
Instead of copying every field individually, just use a whole memcpy.
This could be optimized but that's not the point here.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17649>
2022-08-08 14:59:47 +00:00
Samuel Pitoiset
1d82ec1b3f radv: remove RADV_META_SAVE_SAMPLE_LOCATIONS
We already save/restore all other dynamic states unconditionally, it's
not really useful to make an exception for sample locations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17649>
2022-08-08 14:59:47 +00:00
Rhys Perry
bf0af80045 aco: improve VcmpxPermlaneHazard workaround
According to LLVM, we only need to care about VOPC which writes exec.

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
2022-08-08 13:59:17 +00:00
Rhys Perry
5912c7d3fa aco: only add vscnt wait when visiting VMEM/DS
This prevents issues where we insert a s_waitcnt_vscnt(0) at the start of
a block or very end of the shader because we're joining two blocks (for
example, one with has_VMEM=true and the other with
has_branch_after_DS=true).

fossil-db (navi10):
Totals from 2441 (1.51% of 161220) affected shaders:
Instrs: 1383964 -> 1384094 (+0.01%); split: -0.07%, +0.08%
CodeSize: 7438212 -> 7438760 (+0.01%); split: -0.05%, +0.06%
Latency: 13780665 -> 13679664 (-0.73%); split: -1.53%, +0.80%
InvThroughput: 2950835 -> 2921511 (-0.99%); split: -1.06%, +0.07%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
2022-08-08 13:59:17 +00:00
Rhys Perry
52156d6b26 aco: set has_VMEM,has_DS=false after a branch
fossil-db (navi10):
Totals from 161 (0.10% of 161220) affected shaders:
Instrs: 206726 -> 207179 (+0.22%); split: -0.02%, +0.24%
CodeSize: 1114152 -> 1116032 (+0.17%); split: -0.01%, +0.18%
Latency: 2119380 -> 2147403 (+1.32%); split: -0.16%, +1.48%
InvThroughput: 462960 -> 461922 (-0.22%); split: -0.42%, +0.19%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
2022-08-08 13:59:17 +00:00
Rhys Perry
b17e59a03b aco: fix LdsBranchVmemWARHazard with 2+ branch chains
For example, "DS -> branch -> VMEM -> branch -> DS".

fossil-db (navi10):
Totals from 639 (0.40% of 161220) affected shaders:
Instrs: 629090 -> 628254 (-0.13%); split: -0.19%, +0.06%
CodeSize: 3410164 -> 3406748 (-0.10%); split: -0.14%, +0.04%
Latency: 7834755 -> 7821011 (-0.18%); split: -0.70%, +0.52%
InvThroughput: 1369698 -> 1374495 (+0.35%); split: -0.12%, +0.47%

A lot of the fossil-db changes are noise.
threekingdoms.8db138826c386a62.1.foz/0b222ed175eebad0 is an example of a
shader that actually has this issue.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: c037ba1bb7 ("aco/gfx10: Mitigate LdsBranchVmemWARHazard.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17697>
2022-08-08 13:59:17 +00:00
Jonathan
c7f52551a7 gallium/u_threaded: buffer subdata merging (v2)
In a scenario where a sequence of calls happens like:
* subdata(buffer_a, offset=0, size=64)
* subdata(buffer_a, offset=64, size=64)
* subdata(buffer_a, offset=128, size=64)
* subdata(buffer_a, offset=192, size=64)

and the buffer can't be directly mapped (e.g., because it has bindings), the
subdata calls will now be merged together into one larger subdata call.

This achieves a 3x perf gain in
KHR-GL46.CommonBugs.CommonBug_SparseBuffersWithCopyOps on radeonsi

Before:
real    0m1,923s
user    0m1,017s
sys     0m0,051s

After:
real    0m0,686s
user    0m0,502s
sys     0m0,071s

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17741>
2022-08-08 13:27:36 +00:00
Danylo Piliaiev
293298de65 tu: Flush depth on depth img transition from undef layout
Same logic as in tu_subpass_barrier.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17911>
2022-08-08 13:01:28 +00:00
Timur Kristóf
dccd6f495a ac/nir/cull: Fix typo in bounding box culling.
Bounding box culling is only viable when the W of all
vertices are positive. Always accept triangles whose any
W is negative.

Fixes: 0d527bb1aa
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7018
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17929>
2022-08-08 11:16:04 +00:00
Samuel Pitoiset
1fb12d2cce radv: use ref counting for VS prologs and PS epilogs
With GPL, it will be possible to create VS prologs and PS epilogs
from libraries, so reference counting is useful here too.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17628>
2022-08-08 10:52:31 +00:00
Samuel Pitoiset
a2b8a92c72 radv: rework shaders ref counting
Introduce helpers like for descriptor set layouts. This will also
help graphics pipeline libraries.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17628>
2022-08-08 10:52:31 +00:00
Guilherme Gallo
6f4b6b4d11 ci/radeonsi: Add zork jobs and rules
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
6c22601048 ci/radeonsi: skqp: Add fail test files for raven
Lots of models are missing.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
d4338c9df0 ci/freedreno: skqp: run with new tests files
Settings as flakes tests that passed in the exhaustive run, to keep the
same state as it was before

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
7801a17b54 ci/skqp: Add gitlab sections for uncluttering
skqp output is verbose, as we are running multiple backends at the same
job, normally the trace will surpass the Gitlab UI line limit.

This commit wraps every skqp execution in a Gitlab section and removes
some `set -xtrace` from skqp-runner.sh for a cleaner output.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
d4dcee7a8a ci/skqp: Remove .baremetal-skqp-test in favor of .skqp-test
Both hidden jobs has the same content, let's reuse it.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
eece545d79 ci/skqp: Supress irrelevant shellcheck warnings
To fix some warnings, one should write a much complex bash code, such as
SC2086, so prefer to be simple and functional.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
57e7459377 ci/skqp: Put generated tests files in artifacts
Showing the resulting test file can help the developer to debug skqp
runs by coping this file locally.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
e50d461fec ci/skqp: Use SKQP_BIN_DIR instead of hardcoded /skqp dir
This will make skqp-runner.sh more generic, making it easier to test
locally.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
5001d818da ci/skqp: Add support for commenting tests files
The files are now separated in three: crashes, fails and flakes.

They should be located inside $INSTALL folder at:

- $GPU_VERSION_$SKQP_BACKEND_rendertests-$MODE.txt
- $GPU_VERSION_unittests-$MODE.txt

Where:

- $MODES can be crashes, fails, and flakes
- $SKQP_BACKEND can be gl, gles and vk

crashes and flakes removes tests from skqp, so they will not be run.
As skqp does not have support for flaky test detection, let's not run
them.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
f0f5601a9b ci/skqp: Build list_gpu_unit_tests and list_gms
These binaries are used to generate a list of tests that can be run in a
target device and are useful for testing new devices

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
23732b4732 ci/skqp: Fix Nima-Cpp fetching error
Nima-Cpp is not available anymore inside googlesource, revert to github
one

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
5c91397042 ci/skqp: Fix paths in skqp-runner
Default results directory was fixed via $PWD variable, but it is safer
to use the same as init-stage2.sh uses: $CI_PROJECT_DIR to indicate the
results folder.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
6f387b7848 ci/skqp: Show reports on crashes
Some skqp tests may crash the entire job run, assure that the reports
will be showed to the user after the test started to run.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Guilherme Gallo
2d77c7f9c9 ci/skqp: Add an option to run all tests
When the skqp is introduced to a new driver, the best practice is to
run all available tests from skqp and classifying the
failing/crashing/flaking ones.
The default behavior of skqp is to run the tests from the commit where
the skqp built, which may not be adequate for the target driver.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17835>
2022-08-08 08:51:24 +00:00
Samuel Pitoiset
2012246075 radv: ignore out-of-order rasterization if stencil write mask is dynamic
This might break out-of-order rasterization on GFX8-GFX9 because it
relies on the stencil write mask which can be dynamic.

Found by inspection.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17673>
2022-08-08 08:49:19 +02:00
Timothy Arceri
8bffd601ed Revert "nir: Preserve offsets in lower_io_to_scalar_early"
This reverts commit 96fa23bca5.

The correct fix to the problem was a1bc152340, making this
change obsolete as the pass skips any vars marked with
always_active_io. There was no real advantage to allowing these
vars to be split because they can't be removed anyway. Also there
is no way to split varying arrays gracefully here due to the xfb
layout rules, and this change didn't handle arrays at all.

Removing this obsolete code also fixes an assert in the new CTS
test KHR-Single-GL45.enhanced_layouts.xfb_all_stages. The test
was legally adding xfb offsets to all vertex stages but since
we only mark the varyings in the final vertex stage with the
always_active_io flag the other stages were correctly lowering
to scalars but when an array with an offset hit this code it
asserted since it couldn't handle it.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

Fixes: a1bc152340 ("spirv: mark variables decorated with XfbBuffer as always active")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6928
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17878>
2022-08-08 01:37:20 +00:00
Alyssa Rosenzweig
3712609ee3 agx: Only emit the used components of gl_FragCoord
In case a shader only use gl_FragCoord.xy, this avoids wasting
coefficient registers for gl_FragCoord.zw which should be a small
optimization. It's also less work for DCE but I'm less worried about
that.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
2022-08-07 20:43:54 -04:00
Alyssa Rosenzweig
17168162fb agx: Remove p_extract
It's now unused. We didn't have coalescing for it anyway, splits are the
preferred alternative.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
2022-08-07 20:43:54 -04:00
Alyssa Rosenzweig
c1900cb951 agx: Handle type-changing splits
If we want to break down a 64-bit value into its 32-bit halves, we want
to be able to use a split for this:

   lo, hi = split long

Extend the RA to handle this case.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
2022-08-07 20:43:54 -04:00
Alyssa Rosenzweig
f86ad382c5 agx: Stop using broken idiv lowering
It is, as the name suggests, broken. Instruction count goes from 50->53
on the shader in
dEQP-GLES2.functional.shaders.operator.binary_operator.div.highp_int_fragment.
I'm happy to eat that cost in exchange for correct results!

There are lots more low-hanging opportunities for optimizations to that
shader:

- fuse double icmpsel for the b2i32(cmp) sequences
- promoting big immediates to uniforms
- fusing integer multiply+add

But for now this is acceptable and anyway I'm doing this on "fix broken
NIR lowering" time and not Asahi time.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
2022-08-07 20:43:54 -04:00
Alyssa Rosenzweig
f28c631a89 agx: Implement nir_op_umul_high
This is crucial to the efficiency of the accurate idiv path.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
2022-08-07 20:43:54 -04:00
Alyssa Rosenzweig
aab535ffda agx: Extract umul_high implementation
We can implement umul_high (for both 16-bit and 32-bit types)
efficiently by multiplying in the next larger type size and extracting
the upper word. We already have such an implementation (for instancing).
Extract it so we can use it for emit_alu too.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
2022-08-07 20:43:54 -04:00
Alyssa Rosenzweig
a8cea8679d agx: Assert that registers are naturally aligned
This seems to be an architectural constraint. Ensure that RA satisfies
it, because otherwise we're left with mysterious fails.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
2022-08-07 20:43:54 -04:00
Alyssa Rosenzweig
8c2e626064 agx: Align 64-bit register pairs
This seems to be necessary for correct operation.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
2022-08-07 20:43:54 -04:00