All of the tables are static const, so they only need to be validated
once. As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass. Even with
the help of the previous commit, this does not always occur.
-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5
Reviewed-by: Eric Anholt <eric@anholt.net>
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions. It seems none of this happens even at -O3.
Adding the pragmas helps encourage loop unrolling at some optimization
levels. I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.
-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5
v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.
Reviewed-by: Eric Anholt <eric@anholt.net>
Varyings are similar to already handled cases. And "glsl_zero_init"
name of the workaround already looks like it should include varyings.
The issue was observed in GiMark subtest from GpuTest.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This adds the option to lower 64-bit ufind_msb opcodes.
v2: use split_x/y removes component loops (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This was apparently missed in 67b32190f3, which added support
for ARB_shading_language_include to #line, including the 'path'
field for the location.
Fixes crashes in CTS with all drivers as they attempt to access
an uninitialized path string during parsing.
Fixes: 67b32190f3 ("glsl: add ARB_shading_language_include support to #line")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2132
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>
also make 8 and 16 compoments invalid. We will enable that later again
when we actually support it.
v2: fix validation of nir_intrinsic_instr::num_components
correct validation of instr->num_components
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be useful as a deterministic identifier/index for the variable.
v2: fix comment style
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
Doesn't shrink it (at least, on x86-64) and leaves space for more members.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Adds nir_type_bool8 as well as 8-bit versions of all the bool
opcodes.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds nir_type_bool16 as well as 16-bit versions of all the bool
opcodes.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds binop_reduce_all_sizes which generates both 1-bit and 32-bit
versions of the reduce operation. This reduces the code duplication a
bit and will make it easier to later add 16-bit versions as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds binop_compare_all_sizes which generates both 1-bit and 32-bit
versions of the comparison operation. This reduces the code
duplication a bit and will make it easier to later add 16-bit versions
as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to continue searching the current path for
relative shader includes.
From the ARB_shading_language_include spec:
"If it is quoted with double quotes in a previously included
string, then the first search point will be the tree location
where the previously included string had been found."
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
If the shader contains an include when need to first run the
preprocessor before deciding if we can skip compilation based
on the shader cache.
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
From the ARB_shading_language_include spec:
"#line must have, after macro substitution, one of the following
forms:
#line <line>
#line <line> <source-string-number>
#line <line> "<path>"
where <line> and <source-string-number> are constant integer
expressions and <path> is a valid string for a path supplied in the
#include directive. After processing this directive (including its
new-line), the implementation will behave as if it is compiling at
line number <line> and source string number <source-string-number>
or <path> path. Subsequent source strings will be numbered
sequentially, until another #line directive overrides that
numbering."
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
It should rely on the source type, not on the return type which
is always a boolean anyways, so vote_feq was never selected. For
OpSubgroupAllEqualKHR it's always an integer comparison.
This fixes some VK_KHR_shader_subgroup_extended_types tests with RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Store a flag stating if there was an implmentation, and use
fxn->impl as a temporary flag between deserializsation stages.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested by Jason.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Using a hash-table walk means that variables will get inserted in
different orders on different runs. Just walk the list of globals
instead, even if some of them can't be turned into locals.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If the data is uniform, then it's really a uniform copy. If the index is
uniform, then it's really a read_invocation.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 9cc4c2c916 "spirv: Add a vtn_decorate_pointer helper"
Tested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Otherwise, we fail validation and potentially generate invalid code.
Let's fix up the mode of the accesses to the variable.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Currently the linker do all the work then check for the limits, which
means num_textures and num_images in shader_info may have to store more
than the limit. This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.
To fix this, pull the check before we set the counts in shader_info.
Add necessary plumbing to make sure we bail once those errors are
found.
Fixes: 84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Currently the linker do all the work then check for the limits, which
means num_ssbos and num_ubos in shader_info may have to store more
than the limit. This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.
To fix this, pull the check before we set the counts in shader_info.
One drawback of this approach is that for some cases we might not see
the collected errors from various stages, but bail as soon as a stage
breaks the limits.
Fixes: 84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>