The rules for gl_SubgroupSize in Vulkan require that it be a constant
that can be queried through the API. However, all GL requires is that
it's a uniform. Instead of always claiming that the subgroup size in
the shader is 32 in GL like we have to do for Vulkan, claim 8 for
geometry stages, the maximum for fragment shaders, and the actual size
for compute.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data. I'd like to add another thing or two and need a
place to put it. This commit adds a new brw_base_prog_key struct which
contains those two common bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
load_fragcoord is already handled in common code for radeonsi, so we
don't need to do anything to handle it. However, there were some passes
creating NIR with the varying, so we switch them over to the sysval. In
the case of nir_lower_input_attachments which is used by both radv and
anv, we add handling for both until intel switches to using a sysval.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Using the existing VK_EXT_debug_report extension.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_* is
enabled, we can just take a single pointer and not a pointer to pointer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that NIR_TEST_* doesn't swap the shader out from under us, it's
sufficient to just modify the shader rather than having to return in
case we're testing serialization or cloning.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it. There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.
The return type reflects better what the function name suggests. It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it. When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When running with NIR_TEST_CLONE=1, the pointer will not be valid, as
the whole shader is going to be recreated every pass. Prefer using
is_entrypoint (to query when looping) and nir_shader_get_entrypoint()
instead.
Fixes the Vulkan Piglit tests
- vulkan/glsl450/frexp-double
- vulkan/glsl450/isinf-double
- vulkan/shaders/fs-multiple-large-local-array
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of setting the glsl types of the pointers for each resource,
set the nir_address_format, from which we can derive the glsl_type,
and in the future the bit pattern representing a NULL pointer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
From the Vulkan 1.1.107 spec:
Sample shading is enabled for a graphics pipeline:
- If the interface of the fragment shader entry point of the
graphics pipeline includes an input variable decorated with
SampleId or SamplePosition. In this case minSampleShadingFactor
takes the value 1.0.
- Else if the sampleShadingEnable member of the
VkPipelineMultisampleStateCreateInfo structure specified when
creating the graphics pipeline is set to VK_TRUE. In this case
minSampleShadingFactor takes the value of
VkPipelineMultisampleStateCreateInfo::minSampleShading.
Otherwise, sample shading is considered disabled.
In other words, if sampleShadingEnable is set to VK_FALSE, we should
ignore minSampleShading.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There are tests in CTS for alpha to coverage without a color attachment
that are failing. This happens because we remove the shader color
outputs when we don't have a valid color attachment for them, but when
alpha to coverage is enabled we still want to preserve the the output
at location 0 since we need the alpha component. In that case we will
also need to create a null render target for RT 0.
v2:
- We already create a null rt when we don't have any, so reuse that
for this case (Jason)
- Simplify the code a bit (Iago)
v3:
- Take alpha to coverage from the key and don't tie this to depth-only
rendering only, we want the same behavior if we have multiple render
targets but the one at location 0 is not used. (Jason).
- Rewrite commit message (Iago)
v4:
- Make sure we take into account the array length of the shader outputs,
which we were no handling correctly either and make sure we also
create null render targets for any invalid array entries too.
v5:
- Simplify removal of unused outputs by using rt_used[] so we don't have
to special case alpha to coverage there too.
Fixes the following CTS tests:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This enables the remaining capabilities in SPV_EXT_descriptor_indexing.
Fixes: 6e230d7607 "anv: Implement VK_EXT_descriptor_indexing"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Now that everything is in place to do bindless for all resource types
except input attachments and UBOs, VK_EXT_descriptor_indexing is
"trivial".
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit adds a new way for ANV to do SSBO bindings by just passing a
GPU address in through the descriptor buffer and using the A64 messages
to access the GPU address directly. This means that our variable
pointers are now "real" pointers instead of a vec2(BTI, offset) pair.
This carries a few of advantages:
1. It lets us support a virtually unbounded number of SSBO bindings.
2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't
implement before because those atomic messages are only available
in the bindless A64 form.
3. It's way better than messing around with bindless handles for SSBOs
which is the only other option for VK_EXT_descriptor_indexing.
4. It's more future looking, maybe? At the least, this is what NVIDIA
does (they don't have binding based SSBOs at all). This doesn't a
priori mean it's better, it just means it's probably not terrible.
The big disadvantage, of course, is that we have to start doing our own
bounds checking for robustBufferAccess again have to push in dynamic
offsets.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is more descriptive and a bit nicer than checking for gen >= 8 &&
use_softpin everywhere.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2:
- Merge Float16 and Int8 capabilities into a single patch (Jason)
- Merged patch that enabled SPIR-V front-end checks for these caps
(except for Int8, which was already merged)
v3:
- Keep capabilities sorted (Jason)
v4:
- SpvCapabilityFloat16 support already added in master (Juan)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
From "Alpha Coverage" section of SKL PRM Volume 7:
"If Pixel Shader outputs oMask, AlphaToCoverage is disabled in
hardware, regardless of the state setting for this feature."
From OpenGL spec 4.6, "15.2 Shader Execution":
"The built-in integer array gl_SampleMask can be used to change
the sample coverage for a fragment from within the shader."
From OpenGL spec 4.6, "17.3.1 Alpha To Coverage":
"If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value
is generated where each bit is determined by the alpha value at the
corresponding sample location. The temporary coverage value is then
ANDed with the fragment coverage value to generate a new fragment
coverage value."
Similar wording could be found in Vulkan spec 1.1.100
"25.6. Multisample Coverage"
Thus we need to compute alpha to coverage dithering manually in shader
and replace sample mask store with the bitwise-AND of sample mask and
alpha to coverage dithering.
The following formula is used to compute final sample mask:
m = int(16.0 * clamp(src0_alpha, 0.0, 1.0))
dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) |
0x0808 * (m & 2) | 0x0100 * (m & 1)
sample_mask = sample_mask & dither_mask
Credits to Francisco Jerez <currojerez@riseup.net> for creating it.
It gives a number of ones proportional to the alpha for 2, 4, 8 or 16
least significant bits of the result.
GEN6 hardware does not have issue with simultaneous usage of sample mask
and alpha to coverage however due to the wrong sending order of oMask
and src0_alpha it is still affected by it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.
While we are at it, factorize a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
An extension reporting cache hit in the user supplied pipeline cache
as well as timing information for creating the pipelines & stages.
v2: Don't consider no cache for cache hits (Jason)
Rework duration accumulation (Jason)
v3: Fold feedback creation writing into pipeline compile functions (Jason/Lionel)
v4: Get cache hit information from anv_device_search_for_kernel() (Jason)
Only set cache hit from the whole pipeline if all stages also have that bit (Lionel)
v5: Always user_cache_hit in anv_device_search_for_kernel() (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's just a 32-bit index and offset. We're going to want to use it in
GL as well so stop talking about Vulkan.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Now that nir_opt_copy_prop_vars can properly handle array derefs on
vectors, it's safe to move UBO and SSBO lowering to late in the
pipeline. This should allow NIR to actually start optimizing SSBO
access.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves. This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We zero out the prog data anyway and, now that bias is always zero, this
function is accomplishing nothing.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit moves our handling of gl_NumWorkgroups over to work like our
handling of other special bindings in the Vulkan driver. We give it a
magic descriptor set number and teach emit_binding_tables to handle it.
This is better than the bias mechanism we were using because it allows
us to do proper accounting through the bind map mechanism.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This adds a second level of caching for the pre-lowered NIR that's only
based off of the shader module, entrypoint and specialization constants.
This is enough for spirv_to_nir as well as our first round of lowering
and optimization. Caching at this level should allow for faster shader
recompiles due to state changes.
The NIR caching does not get serialized to disk via either the
VkPipelineCache serialization mechanism or the transparent on-disk
cache. We could but it's usually not that expensive to fall back to
SPIR-V for the odd cache miss especially if it only happens once for
several misses and it simplifies the cache.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The stuff hashed by anv_pipeline_hash_shader is exactly the inputs to
anv_shader_compile_to_nir so it can be used for NIR caching.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Thanks to the new NIR load_descriptor intrinsic added by the UBO/SSBO
lowering series, we weren't getting UBO pushing because the UBO range
detection pass couldn't see the constants it needed. This fixes that
problem with a quick round of constant folding. Because we're folding
we no longer need to go out of our way to generate constants when we
lower the vulkan_resource_index intrinsic and we can make it a bit
simpler.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For now, it's hidden behind a cap. Hopefully, we can eventually drop
that along with all the manual offset code in spirv_to_nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>