All of the affected shaders are in Mad Max. I noticed this while
looking at some other things. I tried a couple similar patterns, but
the affect on cycles was general negative. It may be worth revisiting
this later.
v2: Rebase on 1-bit Boolean changes.
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282073 -> 15282053 (<.01%)
instructions in affected programs: 1192 -> 1172 (-1.68%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1
helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39%
95% mean confidence interval for instructions value: -1.73 -1.13
95% mean confidence interval for instructions %-change: -1.91% -1.38%
Instructions are helped.
total cycles in shared programs: 372595954 -> 372594532 (<.01%)
cycles in affected programs: 11477 -> 10055 (-12.39%)
helped: 14
HURT: 0
helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104
helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78%
95% mean confidence interval for cycles value: -111.05 -92.09
95% mean confidence interval for cycles %-change: -14.90% -10.98%
Cycles are helped.
No changes on any Gen6 or earlier platforms.
Reviewed-by: Matt Turner <mattst88@gmail.com>
All of the affected shaders are in Mad Max. The inner part of the
pattern is itself an open-coded sign(a). I tried using that as a
pattern, but the results were not good. A bunch of shaders were helped
for instructions, but overall cycles, spill, and fills were hurt.
v2: Rebase on 1-bit Boolean changes.
v3: Fix order of copysign() parameters in comments and commit message.
Noticed by Matt.
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282141 -> 15282073 (<.01%)
instructions in affected programs: 6106 -> 6038 (-1.11%)
helped: 17
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.30% -1.00%
Instructions are helped.
total cycles in shared programs: 372597886 -> 372595954 (<.01%)
cycles in affected programs: 32701 -> 30769 (-5.91%)
helped: 17
HURT: 0
helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118
helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83%
95% mean confidence interval for cycles value: -152.84 -74.45
95% mean confidence interval for cycles %-change: -8.89% -3.51%
Cycles are helped.
No changes on any Gen6 or earlier platforms.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Other nir_src_as_* functions just take a nir_src. It's not that much
more memory copying and the constness preserving really isn't worth the
cognitive dissonance.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2: When available, include the opcode name too. (Karol)
v3: Use more to_string helpers. (Karol)
Include the wrong bit_size in those failures.
Include the capability number in spv_check_supported.
Provide vtn_fail_with_* macros to avoid noise in the call sites.
v4: Provide macros only for opcode and decoration, which have enough
usages to justify them. (Jason)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Also, use a set to identify repeated values. The previous arrangement
worked when the repetitions were one after another, but in some of the
new cases they are not.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This takes the stupid simplest and most reliable approach to reducing
redundancy that I could come up with: Just use the struct declaration
as the cach key. This cuts the size of the generated C file to about
half and takes about 50 KiB off the .data section.
size before (release build):
text data bss dec hex filename
5363833 336880 13584 5714297 573179 _install/lib64/libvulkan_intel.so
size after (release build):
text data bss dec hex filename
5229017 285264 13584 5527865 545939 _install/lib64/libvulkan_intel.so
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Both Vulkan and OpenGL might be using glsl_types simultaneously or we
can also have multiple concurrent Vulkan instances using glsl_types.
Patch adds a one time init to track number of users and will release
types only when last user calls _glsl_type_singleton_decref().
This change fixes glsl_type memory leaks we have with anv driver.
v2: reuse hash_mutex, cleanup, apply fix also to radv driver and
rename helper functions (Jason)
v3: move init, destroy to happen on GL context init and destroy
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Until now, we were only doing this when linking a SSO
program. However, nothing avoids linking a non SSO program which
doesn't have both a VS and FS. In those cases, we also need to report
the usual linking errors, if happening.
v2: Use a better name for the renamed function (Timothy).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Matt Turner <mattst88@gmail.com>
v3: rebase
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Seems it was missing the "/ ma + 0.5" and the order was swapped.
Fixes: a1a2a8dfda ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When gathering info for unmovable types we need to handle arrays.
While we dont support packing/moving arrays we do support packing
scalar components with these arrays.
Fixes piglit:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test
Fixes: 5eb17506e1 ("nir: do not pack varying with different types")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to pluck off components properly.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order copy components around properly.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to swizzle them properly.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
we already assert above that there are no more than 3 sources, so it
doesn't make sense to use an array of 4 sources
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While we're here, fix a typo which caused it to actually return a vec4
with the third and fourth components zero.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On Mali hardware (supported by Panfrost and Lima), the fixed-function
transformation from world-space to screen-space coordinates is done in
the vertex shader prior to writing out the gl_Position varying, rather
than in dedicated hardware. This commit adds a shared NIR pass for
implementing coordinate transformation and lowering gl_Position writes
into screen-space gl_Position writes.
v2: Run directly on derefs before io/vars are lowered to cleanup the
code substantially. Thank you to Qiang for this suggestion!
v3: Bikeshed continues.
v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment.
Ian and Qiang's reviews are from v3, but no real functional changes from
v4. Rob's review is from v4.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space. It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.
v2: Drop const_index comments (by anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
The constant_index slots are named right there in the intrinsic
definition, and the comment is just a chance to get out of sync. Noticed
while reviewing the lower_to_scratch changes that copy-and-pasted wrong
comments, and load_ubo and load_per_vertex_output had incorrect comments
currently.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
gl_nir_lower_samplers_as_deref splits structure uniform variables,
creating new variables for individual fields. As part of that, it
calculates a new location. It then never set this on the new variables.
Thanks to Michael Fiano for finding this bug. Fixes crashes on i965
with Piglit's new tests/spec/glsl-1.10/execution/samplers/uniform-struct
test, which was reduced from the failing case in Michael's app.
Fixes: f003859f97 nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2: handle atomics as well
make use of nir_rewrite_image_intrinsic
v3: remove call to nir_remove_dead_derefs
v4: (Timothy Arceri) dont actually call lowering yet
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
fixes retrieving the sampler type for bindless images stored inside structs.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: add support for AMD
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes a couple of Coverity warnings CID 1444626.
Fixes: e30804c602 ("nir/radv: remove restrictions on opt_if_loop_last_continue()")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
In commit 3b3653c4cf we decided not to use bare types; hence do not use
bare type when comparing with interface type to find out if the xfb
variable is an array block.
This fixes dEQP-VK.transform_feedback.* tests.
Fixes: 3b3653c4cf ("nir/spirv: don't use bare types, remove assert in
split vars for testing")
CC: Dave Airlie <airlied@redhat.com>
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
also set some constants for SSBOs.
With that it can compile the shader from:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Otherwise nir_lower_non_uniform_access crashes when it tries
to get the access of a load_ubo.
Fixes: 8ed583fe52 "spirv: Handle the NonUniformEXT decoration"
Fixes: e50ab2c0f2 "nir: Add access flags to deref and SSBO atomics"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CTS: GL45-CTS.compute_shader.resources-max
Fixes: 4e1e8f684b "glsl: remember which SSBOs are not read-only and pass it to gallium"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers,
Page 67, (Location aliasing):
" Further, when location aliasing, the aliases sharing the location
must have the same underlying numerical type and bit
width (floating-point or integer, 32-bit versus 64-bit, etc.) and
the same auxiliary storage and interpolation qualification."
Additionally, we have improved the linker error descriptions.
Specifically, when taking structs into account we were producing a
linker error because we assumed that all components in each location
were used and that would cause component aliasing. This is not
accurate of the actual problem. Now, the failure specifies that the
underlying numerical type incompatibility is the cause for the
failure.
Fixes the following piglit test:
tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test
v2:
- Do not assert if we see invalid numerical types. These come
straight from shader code, so we should produce linker errors if
shaders attempt to do location aliasing on variables that are not
numerical such as records.
- While we are at it, improve error reporting for the case of
numerical type mismatch to include the shader stage.
v3:
- Allow location aliasing of images and samplers. If we get these
it means bindless support is active and they should be handled
as 64-bit integers (Ilia)
- Make sure we produce link errors for any non-numerical type
for which we attempt location aliasing, not just structs.
v4:
- Rebased with minor fixes (Andres).
- Added fixing tag to the commit log (Andres).
v5:
- Remove the helper function and check individually for the
underlying numerical type and bit width (Timothy).
- Implicitly, assume that any non-treated type which is checked for
its underlying numerical type is either integer or
float and has a defined bit width (Timothy).
- Implicitly, assume that structs are the only non-treated
non-numerical type (Timothy).
- Improve the linker error descriptions and commit log (Andres).
Fixes: 13652e7516 ("glsl/linker: Fix type checks for location aliasing")
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>