Align16 is no longer a thing, so a new implementation is provided
using Align1 instead. Not all possible swizzles can be represented as
a single Align1 region, but some fast paths are provided for
frequently used swizzles that can be represented efficiently in Align1
mode.
Fixes ~90 subgroup quad swap Vulkan CTS tests.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
lower_integer_multiplication() implements 32x32-bit multiplication on
some platforms by bit-casting one of the 32-bit sources into two
16-bit unsigned integer portions. This can give incorrect results if
the original instruction specified a source modifier. Fix it by
emitting an additional MOV instruction implementing the source
modifiers where necessary.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Looks like it is impossible that 'last' variable is a null
because at least the get_vs_prog_data shouldn't return a null pointer.
So this check is unnecessary starts from commit:
99d497c5b6 "anv/pipeline: Replace get_fs_input_map with ..."
This small issue is found by cppcheck.
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In the following scenario :
1. Create image format R8G8B8A8_UNORM
2. Create image view format R8G8B8A8_SRGB
3. Clear the view through a sub pass to a particular color
4. Barrier on the image to from color attachment to source transfer
5. Copy the image into a linear buffer to check the content
The step 4 resolving the clear color is unaware of the SRGB format of
the view, because the blorp resolve operations operate on images the
color associated with the resolve will not operate on SRGB format but
UNORM. Leading to the wrong color being written into surfaces.
This change forces a clear color resolve at the end of the render pass
so following resolves won't have to deal with the clear color with a
format that doesn't match the image's format.
On gfxbench vulkan_5_normal 1280x720, this appear to cost us ~0.5fps,
from 49.316 down to 48.949.
v2: Only fast clear resolve when image & view have different formats
(Lionel)
v3: Update warning (Jason)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Resolve operations can happen when dealing with view (begin/end
subpasses) in which case the view's format needs to apply, not the
image's format.
v2: Relayout arguments of a ccs_op() call (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911
Cc: mesa-stable@lists.freedesktop.org
For now, it's hidden behind a cap. Hopefully, we can eventually drop
that along with all the manual offset code in spirv_to_nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of baking in uvec2 for UBO and SSBO pointers and uint for push
constant and shared memory pointers, make it configurable.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We're going to want to do more deref optimizations going forward and
this gives us a central place to do them. Also, cast propagation will
get a bit more complicated with the addition of ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
SPIR-V allows for matrix and array types to be decorated with explicit
byte stride decorations and matrix types to be decorated row- or
column-major. This commit adds support to glsl_type to encode this
information. Because this doesn't work nicely with std430 and std140
alignments, we add asserts to ensure that we don't use any of the std430
or std140 layout functions with explicitly laid out types.
In SPIR-V, the layout information for matrices is applied to the parent
struct member instead of to the matrix type itself. However, this is
gets rather clumsy when you're walking derefs trying to compute offsets
because, the moment you hit a matrix, you have to crawl back the deref
chain and find the struct. Instead, we take the same path here as we've
taken in spirv_to_nir and put the decorations on the matrix type itself.
This also subtly adds support for strided vector types. These don't
come up in SPIR-V directly but you can get one as the result of taking a
column from a row-major matrix or a row from a column-major matrix.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The loop through instructions doesn't set the cursor for us so unless we
set it somewhere, we may end up emitting instructions in the wrong
place. The only reason why we haven't been bitten by this in the past
is that it only happens in a few variable pointers cases and the CTS
tests for those don't use much control flow so things were getting
emitted in the correct order by accident.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We do the ImageFormatProperties check already, and rejecting an usage
flag when both ImageFormatProperties and the WSI (which is Android)
support it is not allowed.
Intel does support storage for some of the support WSI formats, such
as R8G8B8A8_UNORM, and looking at the ISL_SURF_USAGE_DISABLE_AUX_BIT,
the imported images do not have any form of compression that would
prevent this fix.
v2: Also consider STORAGE bit for Gralloc usage bits.
(From Kevin Strasser <kevin.strasser@intel.com>)
Fixes: 053d4c328f "anv: Implement VK_ANDROID_native_buffer (v9)"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
In 92eb5bbc68 we attempted to avoid copying clear colors whenever
we weren't doing a resolve. However, this broke MSAA resolves because
we need the clear color in the source. This patch makes blorp much more
conservative such that it only avoids the clear color copy if either
aux_usage == NONE or it's explicitly doing a fast-clear.
Fixes: 92eb5bbc68 "intel/blorp: Only copy clear color when doing..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107728
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Probably no difference but it's nice to have i965 & blorp emit things
in the same order.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without this, I get the following error when building the tests with
autotools on i686:
---8<---
src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration]
__builtin_ia32_clflush(p);
^~~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_pause
src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration]
__builtin_ia32_mfence();
^~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_fnclex
---8<---
The erros are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c
This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Eric Engeström <eric@engestrom.ch>
Without this, I get the following error when building the tests using
meson on i686:
---8<---
In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46,
from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26:
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration]
__builtin_ia32_clflush(p);
^~~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_pause
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration]
__builtin_ia32_mfence();
^~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_fnclex
---8<---
The errors are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c
This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>
Makes things easier to read rather than a long block of text.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Not decoding the shader at the right offset.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Instruction addresses are always in ppgtt space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We've made the choice not to use fast clears on layer > 0 with
multilayer images. This is partly because we would need to store
multiple clear colors for each layer, making the existing memory
layout, already including aux surfaces, fast clear color, image state,
etc... even more complex.
Partial resolves are the operations transfering the clear colors into
the auxiliary buffers. This operation is currently implemented in
Blorp by loading the clear color from the image's BO, into a shader
that then samples from the auxiliary buffer and writes the color only
if it isn't there already.
The problem here is that because we store only one clear color for all
layers and it is used for partial resolves. If you trigger a partial
clear on a layer > 0, then you're likely to deal with a color that is
not what you actually want. In the particular issues below, we have
multiple layers, each cleared with a different color but the partial
resolve just writes the wrong color into the auxiliary buffers for
layers > 0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108910
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911
Cc: mesa-stable@lists.freedesktop.org
The former expects to see SSA-only things, but the latter injects registers.
The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.
Fixes: 11dc130779 "nir: Add a bool to int32 lowering pass"
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fulfills a requirement for clients that want to utilize same
code path for images with external formats (VK_FORMAT_UNDEFINED) and
"regular" RGBA images where format is known. This is similar to how
OES_EGL_image_external works.
To support this, we allow color conversion samplers for non-YUV
formats but skip setting up conversion when format does not have
can_ycbcr flag set.
v2: add comment and bundle can_ycbcr to the existing break
condition (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If a conversion struct was passed, then initialize view using
format from the conversion structure.
v2: use vk_format directly from the anv_format struct
v3: added some assertions (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If external format is used, we store the external format identifier in
conversion to be used later when creating VkImageView.
v2: rebase to b43f955037 changes
v3: added assert, ignore components when creating external
format conversion (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Since we don't know the exact format at creation time, some initialization
is done only when bound with memory in vkBindImageMemory.
v2: demand dedicated allocation in vkGetImageMemoryRequirements2 if
image has external format
v3: refactor prepare_ahw_image, support vkBindImageMemory2,
calculate stride correctly for rgb(x) surfaces, rename as
'resolve_ahw_image'
v4: rebase to b43f955037 changes
v5: add some assertions to verify input correctness (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: have separate memory properties for android, set usage
flags for buffers correctly
v3: code cleanup (Jason)
+ limit maxArrayLayers to 1 for AHardwareBuffer based images
v4: rebase to b43f955037 changes
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: add support for non-image buffers (AHARDWAREBUFFER_FORMAT_BLOB)
v3: properly handle usage bits when creating from image
v4: refactor, code cleanup (Jason)
v5: rebase to b43f955037 changes,
initialize bo flags as ANV_BO_EXTERNAL (Lionel)
v6: add assert that anv_bo_cache_import succeeds, add comment
about multi-bo support to clarify current implementation (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes it cleaner to introduce more cases where we import memory
from different types of external memory buffers.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Use the anv_format address in formats table as implementation-defined
external format identifier for now. When adding YUV format support this
might need to change.
v2: code cleanup (Jason)
v3: set anv_format address as identifier
v4: setup suggestedYcbcrModel and suggested[X|Y]ChromaOffset
as expected for HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL
v5: set linear tiling for GPU_DATA_BUFFER usage, add comment
about multi-bo support to clarify current implementation (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: handle R8G8B8X8 as R8G8B8_UNORM (Jason)
v3: add HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL, we make it define
for now to avoid direct dependency to minigbm headers
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This will be utilized later by GetAndroidHardwareBufferPropertiesANDROID.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This will make it possible for next patch to rip
anv_image_create_info out from make_surface function.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive. On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.
This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).
v2: Remove stray #if block. Noticed by Thomas.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
That flow control may be trying to avoid invalid loads. On at least
some platforms, those loads can also be expensive.
No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").
v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested
by Rob. See also the big comment in src/intel/compiler/brw_nir.c.
v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).
v4: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If there is a CMP.NZ that compares a single component (via a .zzzz
swizzle, for example) with 0, it can propagate its conditional modifier
back to a previous CMP that writes only that component. The specific
case that I saw was:
cmp.l.f0(8) g42<1>.xF g61<4>.xF (abs)g18<4>.zF
...
cmp.nz.f0(8) null<1>D g42<4>.xD 0D
In this case we can just delete the second CMP.
No changes on Broadwell or Skylake because they do not use the vec4
backend. Also no changes on GM45 or Iron Lake.
Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10856676 -> 10852569 (-0.04%)
instructions in affected programs: 228322 -> 224215 (-1.80%)
helped: 1331
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4
helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83%
95% mean confidence interval for instructions value: -3.19 -2.99
95% mean confidence interval for instructions %-change: -1.93% -1.83%
Instructions are helped.
total cycles in shared programs: 154788865 -> 154732047 (-0.04%)
cycles in affected programs: 2485892 -> 2429074 (-2.29%)
helped: 1097
HURT: 59
helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64
helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22%
HURT stats (abs) min: 2 max: 16 x̄: 3.02 x̃: 2
HURT stats (rel) min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71%
95% mean confidence interval for cycles value: -51.04 -47.26
95% mean confidence interval for cycles %-change: -3.40% -3.07%
Cycles are helped.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>