Texel buffer could be arbitrary large, so the assumption being made in
the following comment is wrong:
"Zero-extension (u16) and sign-extension (i16) have
the same behavior here - txf returns 0 if bit 15 is set
because it's out of bounds and the higher bits don't matter."
Sign extension should matter for GLSL_SAMPLER_DIM_BUF.
This fixes the case of doing texelFetch with u16 offset:
uniform itextureBuffer s1;
uint16_t offset = some_ssbo.offset;
value = texelFetch(s1, offset).x;
If the offset is higher than s16 optimization incorrectly
left it as 16b.
In spirv the above glsl is translated into:
%22 = OpLoad %ushort %21
%23 = OpUConvert %uint %22
%24 = OpBitcast %int %23
%26 = OpImageFetch %v4int %16 %24
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31664>
If there is a 4-byte hole between 2 loads, drivers can now optionally
vectorize the loads by including the hole between them, e.g.:
4B load + 4B hole + 8B load --> 16B load
All vectorize callbacks already reject all holes, but AMD will want to
allow it.
radeonsi+ACO with the new vectorization callback:
TOTALS FROM AFFECTED SHADERS (25248/58918)
VGPRs: 871116 -> 871872 (0.09 %)
Spilled SGPRs: 397 -> 407 (2.52 %)
Code Size: 43074536 -> 42496352 (-1.34 %) bytes
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
New load merging transformations (first, second), examples:
(vec4, vec3) ==> vec8(read=0x7f) (because NIR doesn't have vec7)
(vec1, vec8(read=0x7f)) ==> vec8(read=0xff)
- the unused component at the end of vec8 is dropped
Not merged:
vec8(read=0xfe) + vec1
- unused components at the beginning are kept
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
If you allow an unsupported component count in the callback for loads,
nir_opt_load_store_vectorize will align num_components to the next supported
vector size, essentially overfetching.
This changes all callbacks to reject it. AMD will enable it in a later commit.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
It will be used to allow merging loads with a hole between them.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
We will represent vec6..vec7, vec9..vec15 loads with 8 and 16
components respectively, so we need to track how many components
we really use.
This is a prerequisite for optimal merging up to vec16. Example:
Step 1: vec4 + vec3 ==> vec7as8 (last component unused)
Step 2: vec1 + vec7as8 ==> vec8 (last unused component dropped)
Without using the number of components read, the same example would end up
doing:
Step 1: vec4 + vec3 ==> vec8
Step 2: vec1 + vec8 ==> vec9 (fail)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
another nir pass will use this.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
We don't advertise this in rusticl (and probably shouldn't, at least
until we can honor the request) but DPC++ emits this regardless so we
may as well ignore it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31592>
If the backend does not implement this too, or some other future transform
modifiess the phi so that this isn't the case (replace the phi with a
bcsel or replace undef with zero), then it will not actually be uniform.
This keeps it enabled to some degree for RADV/ACO.
fossil-db (navi31):
Totals from 76 (0.10% of 79395) affected shaders:
Instrs: 195008 -> 195282 (+0.14%)
CodeSize: 1012592 -> 1015884 (+0.33%)
Latency: 3892826 -> 3898843 (+0.15%); split: -0.00%, +0.15%
InvThroughput: 460681 -> 460964 (+0.06%)
Copies: 13508 -> 13516 (+0.06%)
Branches: 5244 -> 5412 (+3.20%)
PreVGPRs: 5092 -> 5096 (+0.08%)
VALU: 116177 -> 116197 (+0.02%)
SALU: 23449 -> 23785 (+1.43%)
fossil-db (navi21):
Totals from 76 (0.10% of 79395) affected shaders:
Instrs: 164471 -> 164981 (+0.31%)
CodeSize: 883988 -> 888420 (+0.50%)
Latency: 4074287 -> 4082043 (+0.19%)
InvThroughput: 783783 -> 784276 (+0.06%); split: -0.00%, +0.06%
Branches: 5262 -> 5430 (+3.19%)
PreVGPRs: 5100 -> 5104 (+0.08%)
VALU: 116375 -> 116381 (+0.01%)
SALU: 23589 -> 23925 (+1.42%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30211>
For enumerants and instruction names, instead of duplicating the values
now the grammar will use an aliases field to list the alternative names.
Update the Python scripts for that.
The new SPIR-V files correspond to d92cf88c371424591115a87499009dfad41b669c
("Add "aliases" fields to the grammar and remove duplicated (#447)")
in https://github.com/KhronosGroup/SPIRV-Headers.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31369>
The nir shader cache read call is just below this call now so the code
is easier to follow.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31500>
These functions are already defined in linker_util.h so moving them here
is logical.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31500>
We want to remove the old linker.cpp file in following patches so move
this util function to the util code file.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31500>
This is where all the other uniform related structs are and these were
the only structs used by both the compiler and gl api that were defined
in the compiler code so lets move them to where everything else is
defined.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31500>
V3D can use these too.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31480>
The error message in the linker that checked
gl_MaxCombinedClipAndCullDistances would never be issued because the
compiler was already doing the check. I think the compiler might have
been done this way in the original commit d656736b as the linker
only sets the size when the clip/cull outputs are written so the
piglit test for this wouldn't have been triggered as it does not
write to the outputs.
Here we move the error to the compiler and fix things up so the
correct messages are triggered.
Fixes: d656736bbf ("glsl: Add arb_cull_distance support (v3)")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31471>
Don't use glsl_get_explicit_stride as it may return 0 for vector types,
use nir_deref_instr_array_stride instead.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31460>
Otherwise, we could end up with phis of derefs.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31324>
Rename it to SmallVec, make it more generic and switch NAK
to it.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31409>
If ftrunc@64 is lowered by nir_lower_doubles it is turned into a
comparable long series of 32 bit operations. If the hardware
supports ffract@64 then nir_opt_algebraic can first lower ftrunc@64
to use some combinations with ffloor@64. They can then be turned
into a combination of fsub@64 and ffract@64 resulting in less
all-over instructions.
Fixes: 5218cff34b
nir/algebraic: avoid double lowering of some fp64 operations
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29281>
If the stride we're adding to our loop counter is larger than the total
amount of shared local memory we're trying to initialize, we know the
loop will run at most one time. So we can skip emitting a loop.
Loop unrolling appears to be unable to detect this currently.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31312>
This is what callers actually want, and it simplifies nir_opt_remove_phis
because we can assume dominance meta data is valid.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031>
The problem with radeonsi+ACO is that UBO loads from vec4 uniforms using
only 1 component always load all 4 components. This fixes that.
We are only interested in shrinking UBO and SSBO loads, but I added more
intrinsics because why not.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29384>
This commit does 3 things at once (3 squashed commits) as required
to make sure the commit doesn't break things.
1. convert to nir at compile time
2. enable full nir linking
3. switch standalone compiler to nir linker
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137>