The callers already have this value, and we would like to make it
follow different rules other than stage that might not be visible to
the helper function, so just pass explicitly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9779>
The callers already have this value, and we would like to make it
follow different rules other than stage that might not be visible to
the helper function, so just pass explicitly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9779>
Make the check once in a variable, that can be reused for other parts.
Also add `unlikely` to the various conditionals depending on it
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9779>
Makes calling code more explicit about what is being set, and allows
take advantage of zero initialization for the ones the callsite don't
care.
Besides moving to the struct, two extra "ergonomic" changes were done:
- Add a new shader_time boolean, so shader_time_index is ignored when
unused -- this allow taking advantage of the zero initialization of
unset fields.
- Since we have a struct, provide space for the error_str pointer.
Both iris and i965 were using it, and the extra rstrdup in case of
failure shouldn't be a burden for the others.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9779>
On Intel platforms before Gen6, there is no min or max instruction.
Instead, a comparison instruction (*more on this below) and a SEL
instruction are used. Per other IEEE rules, the regular comparison
instruction, CMP, will always return false if either source is NaN. A
sequence like
cmp.l.f0.0(16) null<1>F g30<8,8,1>F g22<8,8,1>F
(+f0.0) sel(16) g8<1>F g30<8,8,1>F g22<8,8,1>F
will generate the wrong result for min if g22 is NaN. The CMP will
return false, and the SEL will pick g22.
To account for this, the hardware has a special comparison instruction
CMPN. This instruction behaves just like CMP, except if the second
source is NaN, it will return true. The intention is to use it for min
and max. This sequence will always generate the correct result:
cmpn.l.f0.0(16) null<1>F g30<8,8,1>F g22<8,8,1>F
(+f0.0) sel(16) g8<1>F g30<8,8,1>F g22<8,8,1>F
The problem is... for whatever reason, we don't emit CMPN. There was
even a comment in lower_minmax that calls out this very issue! The bug
is actually older than the "Fixes" below even implies. That's just when
the comment was added. That we know of, we never observed a failure
until #4254.
If src1 is known to be a number, either because it's not float or it's
an immediate number, use CMP. This allows cmod propagation to still do
its thing. Without this slight optimization, about 8,300 shaders from
shader-db are hurt on Iron Lake.
Fixes the following piglit tests (from piglit!475):
tests/spec/glsl-1.20/execution/fs-nan-builtin-max.shader_test
tests/spec/glsl-1.20/execution/fs-nan-builtin-min.shader_test
tests/spec/glsl-1.20/execution/vs-nan-builtin-max.shader_test
tests/spec/glsl-1.20/execution/vs-nan-builtin-min.shader_test
Closes: #4254
Fixes: 2f2c00c727 ("i965: Lower min/max after optimization on Gen4/5.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8115134 -> 8115135 (<.01%)
instructions in affected programs: 229 -> 230 (0.44%)
helped: 0
HURT: 1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9027>
It is currently a bitset on top of a uint64_t but there are already
more than 64 values. Change to use BITSET to cover all the
SYSTEM_VALUE_MAX bits.
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>
Motivation is to detect earlier certain bugs that can occur when
missing a check for the stage before using the downcast.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7540>
It's included in declaration of INTEL_DEBUG.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6732>
v2: Restore the gen == 10 hunk in brw_compile_vs (around line 2940).
This function is also used for scalar VS compiles. Squash in:
intel/vec4: Reindent after removing Gen8+ support
intel/vec4: Silence unused parameter warning in try_immediate_source
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6826>
These should be more accurate than the current cycle counts, since
among other things they consider the effect of post-scheduling passes
like the software scoreboard on TGL. In addition it will enable us to
clean up some of the now redundant cycle-count estimation
functionality in the instruction scheduler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Passing shader_stats to the fs_generator constructor means that the
SIMD8 shader stats from the visitor (such as the scheduler mode) will be
reported out for the SIMD16/SIMD32 versions as well.
As you can see, we are now passing 'shader_stats' and 'stats' to
generate_code(), which is obviously odd looking. Ian rebased and
committed an old patch of mine which added the shader_stats struct on
July 30 in commit dabb5d4bee (i965/fs: Add a shader_stats struct.) and
shortly after on August 12 Jason added the brw_compile_stats struct in
commit 134607760a (intel/compiler: Fill a compiler statistics struct).
I'd like to combine the two, but I'm not sure how. shader_stats is an
input to generate_code() while brw_compile_stats is an output and is
only used by the Vulkan driver. Leave it as is for now...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
This involves wrapping vec4_live_variables in a BRW_ANALYSIS object
and hooking it up to invalidate_analysis() so it's properly
invalidated. Seems like a lot of churn but it's fairly
straightforward. The vec4_visitor invalidate_ and
calculate_live_intervals() methods are no longer necessary after this
change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This moves the following methods that are currently defined in
vec4_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_vec4_live_variables.cpp:
> int var_range_start(unsigned v, unsigned n) const;
> int var_range_end(unsigned v, unsigned n) const;
> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;
It makes sense for them to be part of the vec4_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.
The naming of the virtual_grf_start/end arrays was rather misleading,
they were indexed by variable rather than by vgrf, this renames them
start/end to match the FS liveness analysis pass. The churn in the
definition of var_range_start/end is just in order to avoid a
collision between the start/end arrays and local variables declared
with the same name.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
Have fun reading through the whole back-end optimizer to verify
whether I've missed any dependency flags -- Or alternatively, just
trust that any mistake here will trigger an assertion failure during
analysis pass validation if it ever poses a problem for the
consistency of any of the analysis passes managed by the framework.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR. For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.
This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
brw_vec4.h (in particular vec4_visitor) is logically a user of the
live variables analysis pass, not the other way around.
brw_vec4_live_variables.h requires the definition of some VEC4 IR data
structures to compile, but those can be obtained directly from
brw_ir_vec4.h without including brw_vec4.h.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
It's not used by anything anymore now that so much lowering has been
moved into NIR. Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge. Short of a lot of pointless work,
that one's probably not going away.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct. This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It'll grow further, and we'd like to avoid adding an additional
parameter to fs_generator() for each new piece of data.
v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The rules for gl_SubgroupSize in Vulkan require that it be a constant
that can be queried through the API. However, all GL requires is that
it's a uniform. Instead of always claiming that the subgroup size in
the shader is 32 in GL like we have to do for Vulkan, claim 8 for
geometry stages, the maximum for fragment shaders, and the actual size
for compute.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data. I'd like to add another thing or two and need a
place to put it. This commit adds a new brw_base_prog_key struct which
contains those two common bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, an instruction like
mul(8) vgrf29.xy:F, vgrf25.yxxx:F, [-1F, 1F, 0F, 0F]
would get rewritten as
mul(8) vgrf0.yz:F, vgrf25.yyxx:F, [-1F, 1F, 0F, 0F]
The latter does not produce the correct result. The VF immediate in the
second should be either [-1F, -1F, 1F, 1F] or [0F, -1F, 1F, 0F]. This
commit produces the former.
Fixes: 1ee1d8ab46 ("i965/vec4: Reswizzle sources when necessary.")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that NIR_TEST_* doesn't swap the shader out from under us, it's
sufficient to just modify the shader rather than having to return in
case we're testing serialization or cloning.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we write to the flag register changing the swizzle would change
what channels are written to the flag register.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110201
Fixes: 4cd1a0be
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: <ian.d.romanick@intel.com>
libintel_common depends on libintel_compiler, but it contains debug
functionality that is needed by libintel_compiler. Break the circular
dependency by moving gen_debug files to libintel_dev.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so
we no longer need the non-logical opcodes there. Prefix them VEC4 so
it's clear that they're only used by the vec4 back-end.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The unused typed surface read/write support in the vec4 back-end has
been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all
image and buffer ops. There's no reason to keep these opcodes around
anymore.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The next patch replaces an unsigned bitfield with a plain unsigned,
which triggers gcc to begin warning on signed/unsigned comparisons.
Keeping this patch separate from the actual move allows bisectablity and
generates no additional warnings temporarily.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and
scalar parts. In the VS part, adding the new pattern was not
helpful. The pattern that is removed is really old, and it has been
handled by NIR for ages.
All Gen7+ platforms had similar results. (Broadwell shown)
total instructions in shared programs: 14715715 -> 14715709 (<.01%)
instructions in affected programs: 474 -> 468 (-1.27%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.40% -1.15%
Instructions are helped.
total cycles in shared programs: 559569911 -> 559569809 (<.01%)
cycles in affected programs: 5963 -> 5861 (-1.71%)
helped: 6
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85%
95% mean confidence interval for cycles value: -18.15 -15.85
95% mean confidence interval for cycles %-change: -1.95% -1.51%
Cycles are helped.
Iron Lake and Sandy Bridge had similar results. (Iron Lake shown)
total instructions in shared programs: 7780915 -> 7780913 (<.01%)
instructions in affected programs: 246 -> 244 (-0.81%)
helped: 2
HURT: 0
total cycles in shared programs: 177876108 -> 177876106 (<.01%)
cycles in affected programs: 3636 -> 3634 (-0.06%)
helped: 1
HURT: 0
GM45
total instructions in shared programs: 4799152 -> 4799151 (<.01%)
instructions in affected programs: 126 -> 125 (-0.79%)
helped: 1
HURT: 0
total cycles in shared programs: 122052654 -> 122052652 (<.01%)
cycles in affected programs: 3640 -> 3638 (-0.05%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This moves nir_shader_clone() to the driver-specific compile function,
rather than the shared src/intel/compiler code. This allows i965 to do
key-specific passes before calling brw_compile_*. Vulkan should not
need this cloning as it doesn't compile multiple variants.
We do need to continue cloning in the compute shader code because we
lower various things in NIR based on the SIMD width.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This fixes a bug uncovered by my NIR integer division by constant
optimization series.
Fixes: 19f9cb72c8 "i965/fs: Add pass to propagate conditional..."
Fixes: 627f94b72e "i965/vec4: adding vec4_cmod_propagation..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem
in nir for platforms that don't have popcount or popcountll, such as
32bit msvc.
v2: - Fix additional uses of _mesa_bitcount added after this was
originally written
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
At the time of commit 7bc6e455e2 (i965: Add support for saturating
immediates.) we thought mixed type saturates would be impossible. We
were only thinking about type converting moves from D to F, for
example. However, type converting moves w/saturate from F to DF are
definitely possible. This change minimally relaxes the restriction to
allow cases that I have been able trigger via piglit tests.
Fixes new piglit tests:
- arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test
- arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All of the affected shaders are geometry shaders... the same ones from
the similar fs changes.
The "No changes on any other platforms" comment below is not quite
right. Without the previous change to register coalescing, this
optimization caused quite a few regressions in tests that either used
gl_ClipVertex or used different interpolation modes. I observed that
with both patches applied,
glsl-1.10/execution/interpolation/interpolation-none-gl_BackSecondaryColor-smooth-vertex.shader_test
was one instruction shorter. I suspect other shaders would be similarly
affected. Since this is all based on NOS, shader-db does not reflect
it.
Haswell
total instructions in shared programs: 12954955 -> 12954918 (<.01%)
instructions in affected programs: 3603 -> 3566 (-1.03%)
helped: 37
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.30% -1.69%
Instructions are helped.
total cycles in shared programs: 410012108 -> 410012098 (<.01%)
cycles in affected programs: 3540 -> 3530 (-0.28%)
helped: 5
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -0.28% -0.28%
Cycles are helped.
Ivy Bridge
total instructions in shared programs: 11679387 -> 11679351 (<.01%)
instructions in affected programs: 3292 -> 3256 (-1.09%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.34% -1.74%
Instructions are helped.
No changes on any other platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This prevents regressions in a bunch of clipping and interpolation tests
caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The Vertex Elements are now:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is-indexed-draw, 0, 0>
VE1 is it kept as it was before, VE2 additionally contains the new
system value.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>