Enabling clamping in the opcode here doesn't do quite what we need. This
makes the HW clamp to the max LOD specified in the sampler, but we need
to clamp to the maximum available LOD instead, which is the minimum
of the max-lod of the sampler and the max level in the texture itself.
We also need to take the mipmap mode into account when computing the
level of detail. This is not something the TEX_GRADIENT instruction
does, so we need to do this manually.
Now that we no longer modify the flags in the loop, we can get rid of
the loop alltogether, and only issue a single TEX_GRADIENT instruction.
While we're at it, clean up some naming to better match the phrasing
from the spec.
This only applies to Valhall for now.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/14867
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
This bit is reserved and should be zero on V9, so we should report an
illegal instruction if we ever encounter it while packing.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
If we use the texture coordinates mode for TEX_GRADIENT we need valid
texture coordinates on disabled lanes to compute correct lods across all
pixels on a triangle, otherwise pixels along triangle edges will read
garbage when computing coordinate deltas and produce bogus results.
We previously tried to solve this by setting the force_delta_enable bit,
but that doesn't always work... and worse, this bit isn't supported on
V9, which means we sometimes end up generating illegal instructions.
Fixes Piglit:
shaders/zero-tex-coord texturequerylod
Fixes: 4e58029dc0 ("pan/va: fix base-level for nir_texop_lod")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Per Vulkan spec, the pipeline sample mask applies to all rasterization
sample counts, including single-sample. Drop the msaa-conditional clamp
that forced the sample mask to UINT16_MAX when rasterizationSamples == 1
and just use vk_dynamic_graphics_state's value directly. The default
when no static pSampleMask is provided is already all-ones, so existing
behaviour is preserved for pipelines that don't set the mask.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40882>
It's possible to use a shader that has ViewIndex input when multiview
isn't enabled. According to the Vulkan specification, when multiview
isn't enabled in a renderpass, the value of the ViewIndex input should
be 0.
However currently the driver does not emit execution of the PDS code
setting up view index, which leads to stale value to remain in
ViewIndex.
Setup the PDS code for setting view index and emit the command stream
for executing that PDS code when the shader wants ViewIndex, even if
multiview isn't enabled.
Fixes: 9d48088428 ("pvr: add view index support for vertex shaders")
Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Reviewed-by: Nick Hamilton <nick.hamilton@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40972>
I originally added this mechanism to have the first (SIMD8) compile
note that certain features were in use which would prevent SIMD16/32
from compiling, so we could skip the work of trying those.
But these days, there aren't many cases, and the ones we have are
easily detectable based on the NIR. We can detect it earlier without
even having to do the SIMD8 compile.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
We checked that ver is 11 or 12. It can't be >= 20. This is dead code.
Dual source blending on Xe2 does not have native SIMD32 RT write message
support, but SIMD splitting is currently lowering it to low/high SIMD16
message pairs when using SIMD32 dispatch. I'm not aware of any of the
hardware errata from previous platform still applying.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
The new FRAG_RESULT_DUAL_SRC_BLEND option is easier to work with than
looking for FRAG_RESULT_DATA0 with an index of 1. This also means we
no longer care about the dual source blend index, and can just use the
FRAG_RESULT location. That cascades to meaning we no longer have to
store a tuple in driver_location. And, if we just need location, we
can avoid populating that at all and use nir_io_semantics to get it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
Detecting dual source blending is currently annoying: you can either
look at info->fs.color_is_dual_source, or FRAG_RESULT_DUAL_SRC_BLEND
being in the info->outputs_written bitfield.
The former is only set if nir_shader_gather_info runs prior to
nir_lower_io lowering it to FRAG_RESULT_DUAL_SRC_BLEND.
The latter is only set if nir_shader_gather_info runs after the
nir_lower_io lowering.
Just make the IO lowering also set the outputs_written flag so if
you're trying to use FRAG_RESULT_DUAL_SRC_BLEND, you can always
check outputs_written without worrying about pass ordering.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
This older code really only exists for Gen8 and elk. On brw platforms,
we moved to handling this in brw_nir_lower_fs_load_output, and don't
need to lower FS outputs before iris_setup_binding_tables.
Call the right compiler's lowering so that things continue working
on elk once we change brw_lower_fs_outputs in the next commit.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
We can just have iris look at its own program key and change the
fragment shader output variable's location/index in the NIR. By
doing this before lowering fragment shader outputs, the rest of
the output lowering does the right thing, and the backend no longer
has to consider hacks for broken OpenGL apps.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>
To verify that some GPUs are compatible and that shader binaries can be
shared to avoid precompiling twice for SteamOS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41346>
Otherwise we'll hit a LLVM assert when handling load_const:
llvm/include/llvm/ADT/APInt.h:121: llvm::APInt::APInt(unsigned int, uint64_t, bool, bool): Assertion `llvm::isIntN(BitWidth, val) && "Value is not an N-bit signed value"' failed.
Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41404>
llvmpipe supports real function calls, so we need to call nir_progress on
every function, not just the entry point.
Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41404>
The panfrost compiler is now able to handle these on v9+ and we don't
need to lower them ourselves anymore. We only need the lowering on
Bifrost because we don't have the magic LD_PKA there.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41352>
This new pass, pan_nir_lower_image(), will eventually subsume all image
lowering. For now, though, it only lowers image_size and only on
Valhall.
Reviewed-by: Lorenzo Rossi <lorenzo.rossi@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41352>
Some vulkancts tests rely on vkGetImageMemoryRequirements to return the same
exact size after exporting and importing an image. This broke when we started
adding padding to sampled surfaces to manage overfetch, because the texture
usage flag does not get applied to the ISL surface when the image is recreated
using an explicit layout.
Fixes: 8d13628f7 ("isl: Add additional alignment/padding requirements to prevent overfetch")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41376>
This is less heavy handed, avoiding unnecessary stalls after SENDs in a
bunch of common cases. The stats (SIMD32) are:
Totals:
CodeSize: 70345392 -> 71674272 (+1.89%)
Totals from 1774 (67.02% of 2647) affected shaders:
CodeSize: 67359248 -> 68688128 (+1.97%)
What's happening here is we are inserting extra SYNC.nop instructions in a
bunch of cases for the .src preceding the eventual .dst. However, putting aside
the i-cache impact for a moment, this is showing the optimization doing what it
should (deferring dst syncs and inserting cheaper src syncs first). So this
should be positive in reality despite the negative stat impact.
The most hurt shaders are pooling up SYNC.nop's at the end of blocks due to
local-only SWSB and lack of SYNC.allwr optimization. The latter is added later
in this MR. The former is planned.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
update the tracking with what we actually waited on, not what we ideally wanted
to wait on. reduces extra annotations in some cases.
SIMD32:
Totals from 194 (7.33% of 2647) affected shaders:
CodeSize: 14473840 -> 14469088 (-0.03%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
IGC does these optimizations and I think they should be safe given my mental
model. Given a sequence like:
r0 = add.f32 r1, r2
r1 = add.f32 r3, r4
Each ALU pipe is pipelined but in-order. Therefore, the second add cannot
possibly complete before the first add, so it cannot write r1 before the first
add reads r1, so we can elide the write-after-read dependency. That in term
avoids a pipeline bubble between the two instructions. Ditto for
write-after-write.
Similarly if the distance is too great within an in-order pipe since there is a
maximum pipeline length, it's not infinite.
Note that if there was cross-pipe dependencies we do need the annotation since
the pipes themselves are parallel.
SIMD32:
Totals from 58 (2.19% of 2647) affected shaders:
CodeSize: 3316592 -> 3315056 (-0.05%); split: -0.05%, +0.00%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
Greedy post-RA substitution pass, similar to IGC's AccSubstitution pass.
Stats together with the previous commits.
SIMD16:
Totals from 2209 (83.45% of 2647) affected shaders:
Instrs: 2701029 -> 2696350 (-0.17%)
CodeSize: 39166720 -> 40372272 (+3.08%); split: -0.36%, +3.44%
SIMD32:
Totals from 2211 (83.53% of 2647) affected shaders:
Instrs: 4691165 -> 4641188 (-1.07%)
CodeSize: 69365792 -> 69341616 (-0.03%); split: -0.50%, +0.47%
The instruction count reduction is from RA shuffle code getting coalesced via
accumulators. The code size changes are from:
* Fewer moves from the instr count reduction (helped)
* Smaller MADs encoded as MACs (helped)
* Fewer SYNC.nop due to fewer scoreboarding annotations (helped)
* Less compaction due to explicit accumulator operands (hurt)
I expect significant cycle count changes from this but we don't have a cycle
model wired up yet, so reading the assembly will have to do.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41398>
handle cases where attachments are remapped to higher indices than
the renderpass was created with
fixes:
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.local_read.mapping_*_attachments_to_locs_from_*
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41407>
According to documents linked in HSD 1209977789, the push constant
allocation for PS stage is not applicable on Gfx12.5+ (removed). The
documents says push constant data is fetched by SBE in URB.
The HW must still parse the command and do nothing with it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39584>