Previously, an instruction like
mul(8) vgrf29.xy:F, vgrf25.yxxx:F, [-1F, 1F, 0F, 0F]
would get rewritten as
mul(8) vgrf0.yz:F, vgrf25.yyxx:F, [-1F, 1F, 0F, 0F]
The latter does not produce the correct result. The VF immediate in the
second should be either [-1F, -1F, 1F, 1F] or [0F, -1F, 1F, 0F]. This
commit produces the former.
Fixes: 1ee1d8ab46 ("i965/vec4: Reswizzle sources when necessary.")
Reviewed-by: Matt Turner <mattst88@gmail.com>
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
add a new attribute, 'disable_ccs_repack' to gen_device info, which
indicates whether repacking of components in certain pixel formats
before compression needs to be disabled to keep the compatibility
with decompression capability of display controller (gen11+)
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
correct bit fields information of CACHE_MODE_0 reg in current gen11.xml
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The "demote" intrinsic works like "discard" but don't change the
control flow, allowing derivative operations to work. This is the
semantics of D3D discard.
The "is_helper_invocation" intrinsic will return true for helper
invocations -- both the ones that started as helpers and the ones that
where demoted. This is needed to avoid changing the behavior of
gl_HelperInvocation which is an input (so not expected to change
during shader execution).
v2: Emit the discard jump and comment why it is safe. (Jason)
Rework the is_helper_invocation() that was stomping f0.1. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Pretty much every driver using nir_lower_io_to_temporaries followed by
nir_lower_io is going to want this. In particular, radv and radeonsi in
the next commits.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
load_fragcoord is already handled in common code for radeonsi, so we
don't need to do anything to handle it. However, there were some passes
creating NIR with the varying, so we switch them over to the sysval. In
the case of nir_lower_input_attachments which is used by both radv and
anv, we add handling for both until intel switches to using a sysval.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It was reported as unsupported previously. It should be importable
and is compatible with itself.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Fixes: 69cc6272fb ("anv: Implement VK_EXT_external_memory_host")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Do not use the view format when filling the surface state.
Fixes dEQP-VK.image.texel_view_compatible.compute.extended.texture.*
Fixes: fb1350c76f ("intel: Add and use helpers for level0 extent")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If they never get used, dead code should clean them up. Also, we rework
the at_offset and at_sample intrinsics so they return a proper vec2
instead of returning things in PLN layout. Fortunately, copy-prop is
pretty good at cleaning this up and it doesn't result in any actual
extra MOVs.
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support
align16 mode (Matt Turner)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The simulator complains about using byte operands, we also have
documentation telling us.
Note that add operations on bytes seems to work fine on HW (like ADD).
Using dwords operands with CMP & SEL fixes the following tests :
dEQP-VK.spirv_assembly.type.vec*.i8.*
v2: Drop the GLK changes (Matt)
Add validator tests (Matt)
v3: Drop GLK ref (Matt)
Don't mix float/integer in MAD (Matt)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
BSpec: 3017
Cc: <mesa-stable@lists.freedesktop.org>
SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.
We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.
This reverts commit 2be60e0c73.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Aligning phys_level0_sa by the compression block dimension prior to
mipmap layout causes the layout of compressed surfaces to differ from
the sampler's expectations in certain cases. The hardware docs agree:
From the BDW PRM, Vol. 5, Compressed Mipmap Layout,
The compressed mipmaps are stored in a similar fashion to
uncompressed mipmaps [...]
The following exceptions apply to the layout of compressed (vs.
uncompressed) mipmaps:
* [...]
* The dimensions of the mip maps are first determined by applying
the sizing algorithm presented in Non-Power-of-Two Mipmaps
above. Then, if necessary, they are padded out to compression
block boundaries.
The last bullet indicates that alignment should not be done for
calculating a miplevel's dimensions, but rather for determining miplevel
placement/padding. Comply with this text by removing the extra
alignment.
Fixes some fbo-generatemipmap-formats piglit failures on all tested
platforms (SNB-KBL).
v2:
- Note fixed platforms.
- Update some consumers via a helper function.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prepare for a bug fix by adding and using helpers which convert
isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of
surface elements.
v2:
- Update iris (Ken).
- Update anv.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This rewrites the ddy in EXECUTE_4 mode with a loop to make it more
obvious what is going on and also sets the group each of the 4 threads
in the groups are supposed to execute.
Fixes the following CTS tests :
dEQP-VK.glsl.derivate.dfdyfine.dynamic_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2134ea3800 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")
Sampler state prefetching is broken on Gen11, and WA_160668216 says
to disable it. Apparently sampler state prefetching also has basically
zero impact on performance, so we don't need to worry there.
i965, anv, and iris already handle this correctly, but we missed BLORP.
Ideally the kernel should globally disable this by writing SARCHKMD, at
which point we wouldn't have to worry about it. But let's be defensive
and handle it ourselves too.
v2: separate out from BTP workaround in case we change that eventually
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> [v1]
When immutable samplers are set we call write_image_view with a NULL
image view. This causes issues on IVB where we have to fake texture
swizzling.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110999
Fixes: d2aa65eb18 "anv: Emulate texture swizzle in the shader when..."
When HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED is used, then the platform
gralloc module will select a format based on the usage flags provided by
the camera device and the other endpoint of the stream.
The patch fixes crash in vulkan when the test is run with camera stream
set to HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED.
Test: android.graphics.cts.CameraVulkanGpuTest#testCameraImportAndRendering
on chromebook with camera HAL3.
v2: use AHARDWAREBUFFER_FORMAT_IMPLEMENTATION_DEFINED and take
AHARDWAREBUFFER_USAGE_CAMERA_MASK in to account (Gurchetan)
Fixes: f1654fa7e3 "anv/android: support creating images from external format"
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is the preferred clipping mode since it doesn't mean your points
disappear the moment part of the point crosses over the edge of the
viewport and that lines have weird endpoints at viewport edges. We've
just never bothered to hook it up until now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In workloads where there is a lot of geometry drawn that crosses over
the edge of the viewport, this should substantially improve clipper
performance. Not really sure why it's taken 3 years to turn it on but
we never got around to it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
We can rely on only one kind of synchronization object (drm-syncobj)
when it is available. This reduces the number of file descriptors we
use in our implementation.
This will be required later for timeline semaphores implementation, at
this point we won't ever want to use anything else but syncobjs.
v2: Only use has_syncobj for semaphores (Jason)
v3: Only has_syncobj in assert on semaphores in QueueSubmit (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For the block BLOCK_TEXEL_VIEW_COMPATIBLE case, this didn't matter
because the flags were already more-or-less what we wanted. However,
for gen7 stencil shadow images, it still had ISL_SURF_USAGE_STENCIL_BIT
so we were getting W-tiled which isn't what we want for the shadow. By
passing just ISL_SURF_USAGE_TEXTURE_BIT (and CUBE if we care), we now
get something that's actually texturable.
Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
Copies to a shadow image happen during a VkCmdPipelineBarrier or at
subpass transitions. We could potentially be a bit more conservative
but these transitions shouldn't happen often and it's better to have our
bases covered.
Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
Using the existing VK_EXT_debug_report extension.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This should fix floating-point border color on all gen7 HW. Integer is
still thoroughly busted on gen7 because it doesn't exist on IVB and it's
crazy on HSW.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>