Declaring them earlier will allow us to access them in NIR.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773>
load_smem_gcn is similar to load_global/load_global_constant, but it's
guaranteed to use SMEM and it's much easier to utilize the format's 32-bit
offset source.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12773>
R8G8 have a different block width/height and height alignment from other
formats that would normally be compatible (like R16), and so if we are
trying to, for example, sample R16 as R8G8 we need to demote to linear.
Follows the fix in Freedreno: b97e3bb2e1
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15465>
As creating surface with VAConfigAttrib, checking if modifier from attrib list is null
Signed-off-by: shanshengwang <shansheng.wang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15483>
Replaced the existing internal TGSI compute shader, which clears
a read-modify-write buffer, with its NIR equivalent. The disassembly
shader generated by the new NIR variant is identical to the previous
implementation. These changes remove the additional conversion step
from TGSI to NIR for the shader at runtime. Tested on a Navi 23 card.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15356>
When CCS compression first came out on Skylake, we referred to it as
"renderbuffer compression", or RBC for short. However, that name has
long since fallen out of favor, and we refer to it as CCS nearly
everywhere.
This patch renames DEBUG_NO_RBC to DEBUG_NO_CCS inside the codebase
for clarity, and adds INTEL_DEBUG=noccs. The legacy INTEL_DEBUG=norbc
name continues to work, because it's one line of code and having both
names makes our lives easier in the interim.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15447>
After running divergence analysis, we include "div" or "con" for each
SSA def's divergence/convergence status:
vec1 32 div ssa_35 = fddy ssa_34
vec1 32 con ssa_36 = fddy ssa_6.x
We omit this before the first time divergence analysis has been run.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15445>
this should've always been clipping invocations, but I got scared because
then tests with rasterization_discard=1 fail and I didn't handle that instead
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15392>
by disabling color and depth write, the side effects of force-disabling discard can
be mitigated
fixes:
KHR-GL46.tessellation_shader.single.isolines_tessellation
KHR-GL46.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through
KHR-GL46.tessellation_shader.tessellation_invariance.invariance_rule3
KHR-GL46.tessellation_shader.tessellation_shader_point_mode.points_verification
KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.degenerate_case
KHR-GL46.tessellation_shader.tessellation_shader_quads_tessellation.inner_tessellation_level_rounding
KHR-GL46.tessellation_shader.tessellation_shader_tessellation.gl_InvocationID_PatchVerticesIn_PrimitiveID
KHR-GL46.tessellation_shader.vertex.vertex_spacing
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15392>
for drivers where this is broken/missing, the same effect can be achieved
by feeding the renderpass a framebuffer with null/dummy attachments
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15392>
Flat varying can save some rasterization compute cost
and register needed by shader.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15341>
This fixes performance regression for Specviewperf/Energy
on AMD GPU. Other GPUs passing varying by memory may choose
to re-enable it as need.
Fixes: 2604625043 ("nir/linker: support uniform when optimizing varying")
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15341>
We need it for emulating packed depth/stencil as separate depth/stencil
resources, populating separate_stencil for us as required.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15482>
Take a pipe_framebuffer_state and go from there. We need some care to
handle separate stencil, but the logic is largely routine.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15482>
This reverts commit d68b2db89c.
With this change, no regressions have been observed with the
dEQP-VK.synchronization* test group. There are regressions with
dEQP-VK.drm_format_modifiers.export_import.*, but those have been
root-caused to be test issues (see 3575).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6125
Fixes: 57445adc89 ("anv: Re-enable CCS_E on TGL+")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15420>
From docs:
The PVS Instruction which uses the Input Vertex Memory for the last
time. This value is used to free up the Input Vertex Slots ASAP.
This field must be set to a valid instruction.
Right now it is set to the last instruction. When the last read is
inside a loop, set it on the outhermost ENDLOOP. This could in theory
help performance, but none of my usual benchmarks including GLmark,
Unigine Sanctuary or Lightsmark show any measurable performance difference.
Suggested in: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6045
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15252>
We'll have to figure out the cross compiling strategy, in particular
for Android. But as it stands we can't have the target & host llvm
packages installed at the same time so we can't really compile it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13171>
This tool is currently only aimed at Gfx version 12.5+ with
COMPUTE_WALKER. We could make it work on earlier platforms but they
require pushing gl_SubgroupInvocation and the CLC code is missing the
back-end compiler set-up bits for that.
v2: Commit description by Jason
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13171>