aco does not implement fpow, need nir to lower it
first. llvm will do by itself in the same way, so
we always lower fpow in nir now.
Remove the llvm fpow implementation that has special
handling for the muliplication. It's not used any
more and does not match GLSL spec as fpow(0,0)=NaN
but here we get 0.
There's some pixel changes for gl-radeonsi-stoney:
ror-default 2 (no tolerance), 0 (1% tol.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
ACO only support nir_fsin/cos_amd.
There's some pixel changes for gl-radeonsi-stoney trace.
Different pixels:
furmark 61 (no tolerance), 0 (1% tol.)
gimark 93867 (no tolerance), 888 (1% tol.)
tessmark 39 (no tolerance), 0 (1% tol.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
aco does not implement these idiv ops.
nir_lower_idiv is for idiv ops <= 32bit and ported from
llvm amdgpu, so llvm do the same.
nir_lower_divmod64 is for 64bit idiv ops.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
Use case: radeonsi will generate internal tgsi shader
with 64bit udiv instruction, and we want all 64bit udiv
to be lowered in nir by lower_int64_options.
For GLSL shaders, this is done in glsl to nir, so we do
the same for tgsi here.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
It's the output of ACO compiler. To share the si_shader_binary
struct with ELF type:
* add a type field to indicate RAW or ELF
* rename elf_buffer/size to code_buffer/size
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22573>
committed has to be a constant so there is no need to have a src and
depend on constant folding to remove the i2b.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22963>
For ALU ops where input types are known, we can store that info on
the input sources. This can be used to produce the correct overloads
of load instructions that don't immediately need to be followed by
bitcasts, or similarly to produce a constant value which can be directly
consumed by the relevant instruction without needing a bitcast.
Similarly for values that will be stored in an output, we know type
information. And using that info, we can use more-correct information
for phis instead of forcing all phi sources to be bitcast to int just
to be bitcast back to float on the other side for an alu or an output
store.
One missing piece is SSBO stores, where we can support int or float.
If the input is coming from a phi, we don't influence the phi's type,
so it'll be int, even though the incoming sources might've been float.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22972>
For each phi src, ensure that it's only used as a phi src.
This lets us give each phi their own unique types without worrying
about them stomping on each other. Also scalarize phis.
For each constant, ensure that it's only used once. The DXIL backend
will already dedupe these consts within the module, but this lets a
single load_const have multiple types depending on how it's used.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22972>
Use ACCESS_* flags in call sites instead of GLC/DLC/SLC.
ACCESS_* flags are extended to describe other aspects of memory instructions
like load/store/atomic/smem.
Then add a function that converts the access flags to GLC, DLC, SLC.
The new functions are also usable by ACO.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22770>
... and expect it to behave like ACCESS_NON_TEMPORAL.
Handling ACCESS_NON_TEMPORAL is sufficient.
This was copied from ac_nir_to_llvm.c.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22770>
Refactor framebuffer to renderpass to mirror previous INTEL_MEASURE
changes.
We dump hashes/pointers for shaders and framebuffer/renderpass.
Reduce from 64bit to 32bit pointers. We don't benefit from the
extra precision and reduced output size is convenient.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22723>
Per-RenderTarget analysis was removed from anv's INTEL_MEASURE
previously, probably after switching to dynamic rendering model.
Restore capability by tracking count of beginRenderPass calls.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22723>
nir_validate checks that the format of an atomic (if specified) is compatible
with the atomic operation. For example, we can't fadd R64_UINT texels. The logic
can't be extended as-is to unified atomics because it's split across different
switch cases for different atomic-op intrinsics. So we add our own validation
case, porting over the logic from the separate existing cases below.
(The redundant logic will be deleted once we delete legacy atomics.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
This is the one place where using nir_atomic_op instead of nir_op directly is a
little annoying, since we need to translate between the two enums, but it's not
a big deal.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
Lots of passes can be made unified-atomics-aware simply by adding extra cases in
their switch statements. This commit fixes a bunch of passes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>