v2 (Jason Ekstrand):
- Separate the anv_gem interface from anv_queue internals
- Rework on top of the new anv_queue_family stuff
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
v2 (Jason Ekstrand):
- Don't take an anv_physical_device in anv_gem_get_engine_info()
- Return the engine info from anv_gem_get_engine_info()
- Free the engine info in anv_physical_device_destroy()
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
This may vary based on the newer kernel engines based contexts.
v2 (Jason Ekstrand):
- Initialize anv_queue::exec_flags in anv_queue_init
- Don't conflate this with refactors to get_reset_stats
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
This is modeled on anv_memory_type and anv_memory_heap which we already
use for managing memory types. Each anv_queue_family contains some data
which is returned by vkGetPhysicalDeviceQueueFamilyProperties() verbatim
as well as some internal book-keeping bits. An array of queue families
along with a count is stored in the physical device. Each anv_queue
then contains a pointer to the anv_queue_family to which it belongs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
By moving vk_object_base_finish() to the end and putting the thread
clean-up in an if block we both better mimic anv_queue_init() and have a
more correct object destruction order. It comes at the cost of a level
of indentation but that seems to actually make the function more clear.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
I don't know if this is a typo or an artifact of ancient versions of the
Vulkan API. In any case, it's wrong.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
I originally wrote this several years ago to aid in app debugging. Now
that we have nice tools like RenderDoc, it's no longer needed. I don't
think anyone's really used it in 4 years or more.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8667>
I meant to do this years ago when I first added SHADER_OPCODE_SEND. At
the time, the only use for the extended descriptor was bindless handles
which were always one thing and never non-constant. However, it doesn't
actually require any extra instructions because we have to OR in ex_mlen
anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8748>
Enabled the VK_EXT_sample_locations for Intel Gen >= 7.
v2: Replaced device.info->gen >= 7 with True, as Anv doesn't support
anything below Gen7. (Lionel Landwerlin)
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
In src/intel/vulkan/genX_blorp_exec.c we included the file:
common/gen_sample_positions.h but not use it. Removed.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
Allowing the user to set custom sample locations, by filling the
extension structs and chaining them to the pipeline structs according
to the Vulkan specification section [26.5. Custom Sample Locations]
for the following structures:
'VkPipelineSampleLocationsStateCreateInfoEXT'
'VkSampleLocationsInfoEXT'
'VkSampleLocationEXT'
Once custom locations are used, the default locations are lost and need
to be re-emitted again in the next pipeline creation. For that, we emit
the 3DSTATE_SAMPLE_PATTERN at every pipeline creation.
v2: In v1, we used the custom anv_sample struct to store the location
and the distance from the pixel center because we would then use
this distance to sort the locations and send them in increasing
monotonical order to the GPU. That was because the Skylake PRM Vol.
2a "3DSTATE_SAMPLE_PATTERN" says that the samples must have
monotonically increasing distance from the pixel center to get the
correct centroid computation in the device. However, the Vulkan
spec seems to require that the samples occur in the order provided
through the API and this requirement is only for the standard
locations. As long as this only affects centroid calculations as
the docs say, we should be ok because OpenGL and Vulkan only
require that the centroid be some lit sample and that it's the same
for all samples in a pixel; they have no requirement that it be the
one closest to center. (Jason Ekstrand)
For that we made the following changes:
1- We removed the custom structs and functions from anv_private.h
and anv_sample_locations.h and anv_sample_locations.c (the last
two files were removed). (Jason Ekstrand)
2- We modified the macros used to take also the array as parameter
and we renamed them to start by GEN_. (Jason Ekstrand)
3- We don't sort the samples anymore. (Jason Ekstrand)
v3 (Jason Ekstrand):
Break the refactoring out into multiple commits
v4: Merge dynamic/non-dynamic changes into a single commit (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
Allows to extract the values in different ways than just the genxml
format.
v2 (Jason Ekstrand):
- Add a struct gen_sample_location so that we can re-use the array
macros from the earlier commit.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
This commit adds a "locations" parameter to emit_multisample and
emit_sample_pattern which, if provided, will override the default
sample locations.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
The VkPhysicalDeviceSampleLocationPropertiesEXT struct is filled with
implementation dependent values and according to the table from the
Vulkan Specification section [36.1. Limit Requirements]:
pname | max | min
pname:sampleLocationSampleCounts |- |ename:VK_SAMPLE_COUNT_4_BIT
pname:maxSampleLocationGridSize |- |(1, 1)
pname:sampleLocationCoordinateRange|(0.0, 0.9375)|(0.0, 0.9375)
pname:sampleLocationSubPixelBits |- |4
pname:variableSampleLocations | true |implementation dependent
The hardware only supports setting the same sample location for all the
pixels, so we only support 1x1 grids.
Also, variableSampleLocations is set to true because we can set sample
locations per draw.
Implement the vkGetPhysicalDeviceMultisamplePropertiesEXT according to
the Vulkan Specification section [36.2. Additional Multisampling
Capabilities].
v2: 1- Replaced false with VK_FALSE for consistency. (Sagar Ghuge)
2- Used the isl_device_sample_count to take the number of samples
per platform to avoid extra checks. (Sagar Ghuge)
v3: 1- Replaced VK_FALSE with false as Jason has sent a patch to replace
VK_FALSE with false in other places. (Jason Ekstrand)
2- Removed unecessary defines and set the grid size to 1 (Jason Ekstrand)
v4: Fix properties reporting in GetPhysicalDeviceProperties2, not
GetPhysicalDeviceFeatures2 (Lionel)
Use same alignment as other functions (Lionel)
Report variableSampleLocations=true (Lionel)
v5: Don't overwrite the pNext in GetPhysicalDeviceMultisamplerPropertiesEXT
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> (v3)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
Added the VK_EXT_sample_locations to the anv_extensions.py list to
generate the related entrypoints.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1887>
When calculating a URB configuration, we start with a notion of how
much space each stage /wants/ (to achieve the maximum amount of
concurrency), but sometimes fall back to giving it less than that,
because we don't have enough space. (Typically, this happens when
the per-stage size is large, or there are many stages, or both.)
We now output a "constrained" boolean which is true if we weren't
able to satisfy all the "wants" due to a lack of space.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8721>
Below Deqp CTS failure is seen on KBL GT1(tested on 0x5906) only ,
GT2 all test passes, changing the max shader geometry to 256
(previous 640) fixes all failure tests.Similar issues on
CML GT1 (Gen9) is fixed
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8550
dEQP-GLES31.functional.geometry_shading.layered.
render_with_default_layer_cubemap
render_with_default_layer_3d
render_with_default_layer_2d_array
Signed-off-by: Abhishek Kumar <abhishek4.kumar@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8731>
It is currently a bitset on top of a uint64_t but there are already
more than 64 values. Change to use BITSET to cover all the
SYSTEM_VALUE_MAX bits.
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>
We can skip CCS ambiguate if followed by a fast clear within render
pass.
v2: (Jason)
- Check array layer as well since we only fast clear first layer and
first LOD.
- Don't drop fast clear check while doing resolve operation.
Fixes: d5849bc840 "anv: Skip HiZ and CCS ambiguates which preceed fast-clears"
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6988>
According to the BSpec page for MEDIA_VFE_STATE, on Gen12 platforms
"if a fused configuration has fewer threads than the native POR
configuration, the scratch space allocation is based on the number of
threads in the base native POR configuration". However we currently
use the subslice count from devinfo->num_subslices[0], which only
includes the subslices currently enabled by the platform fusing. This
leads to scratch space underallocation and occasional hangs.
The problem is likely to affect most Gen12 GPUs with less than 96 EUs.
GFXBench5 Aztec Ruins is able to reproduce the issue fairly reliably.
Fixes: 9e5ce30da7 "intel: fix the gen 12 compute shader scratch IDs"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8636>
While invalidating the AUX-TT entries, we have to consider the surface
offset as well otherwise, we will end up invalidating another surface's
CCS portion.
For eg. when we have HiZ+CCS and STC_CCS enabled, both will use the CCS
portion allocated at the end of BO. While invalidating the CCS portion
of stencil buffer, we will end up invalidating the CCS portion that
belongs to the depth main surface and vice-versa, if the surface offset
is not considered.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4123
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8677>
We were using nir_tex_instr::dest_type to a glsl_type, then passing it
to emit_texture(), only to just check the number of components. Just
pass the number of components directly. This lets us delete
brw_glsl_base_type_for_nir_type, which was asserting with
nir_texop_all_samples_equal because it didn't handle bool32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7989>
On Gen11, they took away our hardware int64 support. We have lowering
for all of it in NIR except for subgroup ops. Now that all the subgroup
ops are implemented, we can enable the feature.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
This adds an emit_scan_step helper which gives us a place to do
something a bit more interesting than emitting a single op.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
There are two problems this commit solves: First, is that the 64x64 MUL
lowering generates a Q MOV which, because of how late it runs in the
compile pipeline, it never gets removed. Second, it generates 32x32
MULs and we have to run it a second time to lower those.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
We could probably support some strides if we tried hard enough but the
whole point of this opcode is to accelerate things with crazy Align16 or
crazy regions. It's ok if we have to emit an extra MOV to get a packed
source.
Fixes: 8b4a5e641b "intel/fs: Add support for subgroup quad operations"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
Previously, we were returning 2 whenever the source was a Q type. As
far as I can tell, the only reason why this hasn't blown up before is
that it was only ever used for VGRFs until the SWSB pass landed which
uses it for everything. This wasn't a problem because Q types generally
aren't a thing on TGL. However, they are for a small handful of
instructions.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
If we don't have any dynamic state, pipeline, or descriptor changes,
we can do a very quick early-exit instead of checking for a bunch of
stuff bit-by-bit.
Tested-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8594>
Previously, if we had a pipeline transition from something which used,
say, tessellation to something which didn't and we ended up with
tessellation descriptors dirty, we could end up re-emitting far more
than necessary. With this commit, we mask off unused stages so we only
update when necessary.
Tested-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8594>