load_preamble is intended to be almost free (costing at most a move), and it
does not have special bounds checking requirement, so it's ok to select with it.
With this, drivers that use nir_opt_preamble together with a late call to
peephole_select can optimize sequences like:
if (x) {
<uniform-on-uniform calculation>
} else {
<different uniform-on-uniform calculation>
}
to simply
bcsel(x, <uniform register 0>, <uniform register 1>)
rather than emitting needless control flow / branching over some moves.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20597>
multiplying by the array size is always wrong for this case, and not
doing so allows for some simplification and better inlining, though
the output results are identical
the one corner case is clip/cull distance, which need special handling
since they're arrays with vec semantics
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20678>
This is heavily copy-pasted from a patch of Ian Romanick, including the
commit message.
Previously, this pass always generated fcsel for bcsel. This was the
only place that generate fcsel, so various drivers assumed (and needed!)
that src0 was a Boolean with 0.0 or 1.0 as the only values.
Specifically, many DX9 / GL_ARB_vertex_program platforms lack a CMP
instruction in vertex shaders. In those cases, they would use LRP to
implement fcsel. The bummer is that many plaforms have a real fcsel
instruction, and those platforms would benefit from other places
generating that opcode.
Instead of leaving assumptions in drivers about the sources of an opcode
that they can't really support, allow them to control the way the
lowering pass translates bcsel. Two flags are used to control this:
- If the driver sets has_fused_comp_and_csel in nir_options, fcsel_gt
will be used. Since the Boolean value is 0.0 or 1.0, this is
equivalent to fcsel.
- If the parameter has_fcsel_ne is set, fcsel will be used. This is the
old path.
- Otherwise, the lowering pass assumes we're on a crufty, old DX9 vertex
program, and it emits flrp.
With this, the assumptions about src0 of fcsel in NTT can be removed.
If a platform can't handle fcsel, it should ensure that the lowering
pass won't generate it.
No change in shader-db.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20162>
DX9 PS 1.x / GL_ARB_fragment_program shaders that have been converted to
GLSL are littered with patterns like
ps_r1.x = ((ps_t0.y >= 0.0) ? ps_r1.x : ps_c1.y);
This is because CMP is a fundamental opcode in those earlier shading
languages. i915 supports this opcode natively, but there's no way to
get it directly into the backend. Instead, NIR and NTT generate some
combination of fcsel and sge and hope for the best.
i915
total instructions in shared programs: 49032 -> 48897 (-0.28%)
instructions in affected programs: 4173 -> 4038 (-3.24%)
helped: 39
HURT: 0
total temps in shared programs: 2795 -> 2790 (-0.18%)
temps in affected programs: 22 -> 17 (-22.73%)
helped: 5
HURT: 0
total const in shared programs: 4976 -> 4967 (-0.18%)
const in affected programs: 203 -> 194 (-4.43%)
helped: 9
HURT: 0
GAINED: shaders/trine/fp-13.shader_test FS
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20162>
These match the TGSI CMP opcode very nicely. Every driver that uses NTT
for its fragment shader path should be able to enable these opcodes now.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20162>
This is a global register which isn't settable by userspace contexts.
It also shouldn't appear in any of our aubinator decodes from error
states or aub dumps, as no userspace batch should be setting it.
So it's not very valuable to have here. Just makes us think we can
set it. Plus, a lot of the field definitions changed a bunch, and
would need updating.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20627>
Some format modifiers change the number of planes used by an image.
For instance AMD DCC modifiers uses 2 or 3 planes. However the
format modifier was ignored in the PIPE_RESOURCE_PARAM_NPLANES
get_param hook.
Fix this by using get_dmabuf_modifier_planes() instead of
util_format_get_num_planes().
This fixes wlroots-based compositors under zink.
Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: c025cb9ee9 ("zink: fix dmabuf plane returns")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20395>
v2: Add a one line offset to the compute shader, to get the
correct output, as suggested by Suresh <Suresh.Guttula@amd.com>.
v3: Add `FALLTHROUGH` to fix a compilation error
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Tested-by: Suresh Guttula <suresh.guttula@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19915>
Fixes the scale and translate portion of the code to allow for
scale and crop to work properly.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Tested-by: Marcus Seyfarth <m.seyfarth@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19915>
These fields are allocated as they have to be taken into account for initScreen.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20663>
The given screen is already freed by the caller in case a NULL-pointer is
returned by the implementation.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20663>
Some of the code paths were not freeing the allocated strings,
also remove the unused ret variable as we are always returning -1 on failure.
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20667>
Both wayland and X backends are covered by headless weston, with
X enabled through weston Xwayland.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Acked-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20392>
Restricting the state change only to newly used states results in problems,
because SSBOs, Images, and the framebuffers make use of the same limited
set of resources, and not properly unbinding un-used resoureces leads to
invalid rendering and GPU hangs.
Fixes: aaa4b0e6
st/mesa: move check_program_state code into _mesa_update_state
v2: use new cap name and switched meaning
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7969
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20493>
With aaa4b0e6 state validation is no longer called for all changed states,
but only for states that will be active with a new shader program.
Not all drivers support this and might prefer if the state validation
is emitted for all states that might be changed. So add a cap that the
driver can signal one or the other preference, and default to the new
behavior.
Fixes: aaa4b0e6
st/mesa: move check_program_state code into _mesa_update_state
v2: - Rename cap and and invert its meaning, query the cap
only once and store it in st, handle the mask update
when updating the shader i.e. not in st_validate_state (Marek)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20493>
This was always meant to be at the bottom of the page. To reduce the
risk of more driver-specific environment variables being added below,
let's add a horizontal rule to mark the difference. This should make it
more clear that this paragraph doesn't belong to the previous heading.
Fixes: c70c5ecd2e ("docs: move generic gallium envvars to root doc")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20644>
Gather outputs in advance to save both output data and type. Output data
is used for streamout and gfx11 param export. Output type is used for
streamout latter.
The output info will also be used for nir vertex export in the future.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Singed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
When slot has no param offset, we should not emit store output for
them on gfx11.
Fixes: abe2e99e9e ("ac/nir/ngg: gs support 16bit outputs")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
Follow the spec, all outputs even not this stream need to be
reset after emit vertex.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>
Because we called nir_lower_io_to_temporaries which ensure the
store output and emit vertex in the same block.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20350>