For this particular case only it doesn't matter. Fixes some new CTS
tests with small inline uniform sizes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28040>
Xe KMD needs VMs to be created to work.
Setting this on Xe KMD code path allow us to simply a feature check
in init_queue_families().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>
Don't make sense to only set it in VkGetPhysicalDeviceQueueFamilyProperties2().
Not setting it to the code path without pdevice->engine_info because
the protected support landed on i915 after DRM_I915_QUERY_ENGINE_INFO.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>
Having just one place to check the Sparse type is less error prone.
For example in i915 it was always setting sparse_uses_trtt to true
even if running in gfx 9 that don't support sparse.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>
The spv cap has the correct requirements to be satisfied before an app
can use it, so we can drop the redundant check here to be more robust.
Either of below is needed:
- VkPhysicalDeviceFeatures::shaderStorageImageReadWithoutFormat
- VK_VERSION_1_3
- VK_KHR_format_feature_flags2
v2: dropped unused variable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28117>
The `stride` and `offset` attributes are meaningful for the "virtual"
register files (VGRFs, UNIFORMs and ATTRs). Accumulator is an ARF so
validation should check `hstride` (part of the <V,W,H> triple) and `subnr`
instead.
Fixes: 12d7aaf2b8 ("intel/compiler: add more validation for acc register usage")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>
This ensure the region triple <V,W,H> is set correctly, in this case the
desired region is a sequential like <8,8,1>. Without the helper the
sequence we get is <0,1,0> -- which the generator currently partially
adjusts when emitting code, but is not sufficient when doing validation
earlier.
The code generated code is slightly modified. From crucible test
func.shader.subtractSaturate.uint in the fragment shader for SIMD8, the
diff looks like
```
mov(8) acc0<1>UD g21<8,8,1>UD { align1 1Q $0.dst };
-add.sat(8) g22<1>UD -acc0<0,1,0>UD g16<8,8,1>UD { align1 1Q @1 $0.dst };
+add.sat(8) g22<1>UD -acc0<8,8,1>UD g16<8,8,1>UD { align1 1Q @1 $0.dst };
```
Note that without the patch generator adjusted the hstride for acc0 used
as destination (see brw_set_dest), but kept the src region as is. For
the source, it is not clear to me why the <0,1,0> would work correctly
here since it is a scalar, but using <8,8,1> it is correct.
Fixes: 58907568ec ("intel/fs: Add SHADER_OPCODE_[IU]SUB_SAT pseudo-ops")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>
This register has changed a little bit for LNL.
While this fixes sparse with TR-TT, it is worth remembering that LNL
is using sparse with vm_bind by default.
v2: Use the proper value instead of hardcoding 0xF (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27316>
Xe KMD landed on drm-next, uAPI is now stable and we can remove
the build time parameter to enable support to it but platforms
older than Lunar lake will have experimental support with Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20418>
For something as basic as read_invocation(x, 0), we were emitting:
mov(8) vgrf67:D, 0d
find_live_channel(8) vgrf236:UD, NoMask
broadcast(8) vgrf237:D, vgrf67:D, vgrf236+0.0<0>:UD NoMask
broadcast(8) vgrf235+0.0:W, vgrf197+0.0:W, vgrf237+0.0<0>:D NoMask
mov(8) vgrf234+0.0:W, vgrf235+0.0<0>:W
This is way overcomplicated - if the invocation is a constant, we can
simply emit a single MOV which reads the desired channel index. Not
only that, but it's difficult to clean up:
1. If this expression appears multiple times, CSE will find all the
redundant emit_uniformize(invocation) and get rid of the duplicate
(find_live_channel+broadcast) on future instructions.
2. Copy propagation will put the 0d directly in the first broadcast.
3. Dead code elimination will get rid of the vgrf67 temp holding 0.
4. Algebraic will replace the first broadcast(x, 0) with a MOV.
5. Copy propagation will put the 0d directly in the second broadcast.
6. Dead code elimination will get rid of the vgrf237 temp.
7. Algebraic will replace the second broadcast(x, 0) with a MOV.
8. Copy propagation will finally combine the two MOVs
That's at least 7-8 optimization passes and several loops through the
same passes just to clean up something we can do trivially.
Cuts 25% of the of the optimizer steps in pipeline 22200210259a2c9c
of fossil-db/google-meet-clvk/BgBlur.1f58fdf742c27594.1 (31 to 23).
Shortens compilation time of the google-meet-clvk/Relight pipeline by
-2.87717% +/- 0.509162% (n=150).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28097>
I tried this when I was working on MR !7698, and it didn't have much
affect back then. Maybe I've added more stuff to my fossil-db?
Gfx12 platforms (Tiger Lake and DG2) are unaffected because the POW
instruction was removed.
shader-db:
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20301933 -> 20301900 (<.01%)
instructions in affected programs: 9077 -> 9044 (-0.36%)
helped: 33 / HURT: 0
total cycles in shared programs: 842797624 -> 842799471 (<.01%)
cycles in affected programs: 1361911 -> 1363758 (0.14%)
helped: 35 / HURT: 111
LOST: 0
GAINED: 9
fossil-db:
Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 165510222 -> 165510163 (-0.00%)
Cycles: 15125195835 -> 15125194484 (-0.00%); split: -0.00%, +0.00%
Spill count: 45204 -> 45196 (-0.02%)
Fill count: 74157 -> 74149 (-0.01%)
Totals from 65 (0.01% of 656118) affected shaders:
Instrs: 57426 -> 57367 (-0.10%)
Cycles: 1667918 -> 1666567 (-0.08%); split: -0.11%, +0.03%
Spill count: 137 -> 129 (-5.84%)
Fill count: 515 -> 507 (-1.55%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
The majority of cases that would have been affected by this actually
had both sources as integer constants. The earlier commit "intel/rt:
Don't directly generate umul_32x16" allowed those to be constant
folded.
v2: Move the a*-1 block to be near the existing a*-1 block.
No shader-db changes on any Intel platform.
fossil-db results:
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165510246 -> 165510222 (-0.00%)
Cycles: 15125198238 -> 15125195835 (-0.00%); split: -0.00%, +0.00%
Totals from 46 (0.01% of 656118) affected shaders:
Instrs: 36010 -> 35986 (-0.07%)
Cycles: 2613658 -> 2611255 (-0.09%); split: -0.17%, +0.07%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
The DW source must be first on all platforms since Gfx7. On previous
platforms it's the other way around.
Unsurprisingly, no shader-db or fossil-db changes. This change is
necessary for the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
src/intel/compiler/brw_lower_logical_sends.cpp: In member function ‘bool fs_visitor::lower_logical_sends()’:
src/intel/compiler/brw_lower_logical_sends.cpp:3170:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
3170 | if (devinfo->has_lsc) {
| ^~
src/intel/compiler/brw_lower_logical_sends.cpp:3174:7: note: here
3174 | case SHADER_OPCODE_DWORD_SCATTERED_READ_LOGICAL:
| ^~~~
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27552>
Workaround describes that we need to set instance level distribution
granularity when primitive id is used by the draw.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27955>
If you look at the sampler message header on Gfx9+, you'll see that we
mostly only use 2 dwords (dw2 & dw3). DW2 has a bunch of sampler
parameters, DW3 is the sampler handle.
On Gfx9 we can micro optimize by copying r0 into the header because
the HW mostly doesn't care about other DWs. We just have to clear dw2
on non VS/FS stages.
On Gfx11+, we always have to do a careful copy of the r0.3 bits to
mask out the bottom unrelated bits. So there, just clearing the entire
header makes more sense.
On Xe2+, the dw4 of the header references the sampler feedback surface
handle and bit0 is a boolean to know whether to use that surface or
not. So it *REALLY* matters to have that as 0. If we copy r0, we'll
get random bits in dw4, leading to enable that surface.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28082>
The default scan order of scaling lists is up-right-diagonal
according to the spec. But the device requires raster order,
so we need to convert from the passed scaling lists.
Fixes: 8d519eb ("anv: add initial video decode support for h265")
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28063>
Userptr don't have a valid gem fd so it can't use DRM_XE_VM_BIND_OP_UNMAP_ALL.
Current code was unbinding workaround_bo or returning error when
workaround_bo size was smaller than userptr address.
So here doing a regular DRM_XE_VM_BIND_OP_UNMAP, without setting
xe_bind->obj and setting xe_bind->range and xe_bind->addr.
Fixes: 19439624 ("anv: Use DRM_XE_VM_BIND_OP_UNMAP_ALL to unbind whole bos")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28114>
We're changing the memory address translation tables, we should
invalidate their cache.
It seems i915.ko is already doing this for us in between batches. The
xe.ko driver only adds invalidates to the ring before submissions if
scratch page is enabled in the VM (which it is today, but may change
in the future), and after some vm_bind and all vm_unbind ioctls, but
we don't use vm_bind for TR-TT. Still, it won't hurt to have it here
righ tnow.
v2: Use PIPE_CONTROL_length (José).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27928>
Workaround tool was already updated with MTL production stepping so no
need to return any stepping value for MTL.
For TGL it was also updated a long time ago, so no need to check for
revision 0.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27399>
This should drop it from MTL as there it should apply only for a0
stepping.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28047>
All these vfuncs funnel down to either stubs or the xe_vm_bind_op()
function. By returning int we're shifting VkResult generation to the
callers, which are simply not doing the correct job. If they get
VkResult they can simply throw the errors up the stack without having
to erroneously try to figure out what really happened.
Today the callers are returning either VK_ERROR_UNKNOWN or
VK_ERROR_OUT_OF_DEVICE_MEMORY, but after the patch we're returning
either VK_ERROR_OUT_OF_HOST_MEMORY or VK_ERROR_DEVICE_LOST.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27926>
The bind_timeline is used to guarantee that non-sparse objects will
be bound when batches use them (although any batch will wait on the
most recent bind, even if that's not necessary). For sparse binding
resources, it's up to the user to guarantee synchronization: do not
force every single batch buffer to wait on the latest sparse binding
operation, as that adds unnecessary synchronization points.
v2: Document how each of the vfuncs interacts with bind_timeline
(José).
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27926>
We can now finally leave the semaphore waits and signals to the
vm_bind ioctl, making vm_bind operations truly asynchronous.
This was previously done for TR-TT in 18bd00c024 ("anv/trtt: don't
wait/signal syncobjs using the CPU anymore").
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27926>
The xe.ko driver finally fixed bug 746, which means we can finally
pass multiple bind operations in a single ioctl. There's a dEQP test
that issues 960 bind operations in a single call, so our gains here
have potential, although most real-world apps are not even remotely
close to this.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/746
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27926>
Blitter and video engines don't support PIPE_CONTROL and
3DSTATE_BINDING_TABLE_POOL_ALLOC.
I'm not 100% sure if something else should be called instead but this
is doing the same as cmd_buffer_emit_state_base_address() and this
fixes the test that was crashing in
unreachable("Trying to emit unsupported PIPE_CONTROL command.");
Fixes: dEQP-VK.pipeline.monolithic.timestamp.misc_tests.two_cmd_buffers_secondary_transfer_queue_with_availability_bit
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28053>
These 2 compute code paths were checking for
anv_cmd_buffer_is_render_queue() before calling
flush_pipeline_select_gpgpu() causing cmd_buffer->state.current_pipeline
to never to be set to GPGPU, trigerring
assert(cmd_buffer->state.current_pipeline == GPGPU) when running in
the compute engine.
So here just dropping the anv_cmd_buffer_is_render_queue() check.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28053>
Add up to four reasons per flush to perfetto flushes. PC reasons
will help debuggers understand why flushes were required, and
perhaps provide hints as to how they can be avoided.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27400>
DebugMarkerSetObjectNameEXT is just a less powerful version of
SetDebugUtilsObjectNameEXT. Fixes the objectType cast warning as well.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27975>
The spec for vkCreateDescriptorPool says:
If VkMutableDescriptorTypeCreateInfoEXT does not exist in the pNext
chain, or VkMutableDescriptorTypeCreateInfoEXT::pMutableDescriptorTypeLists[i]
is out of range, the descriptor pool allocates enough memory to be
able to allocate a VK_DESCRIPTOR_TYPE_MUTABLE_EXT descriptor with any
supported VkDescriptorType as a mutable descriptor.
So check that mutableDescriptorTypeListCount is in range of the binding
we are asking for instead of just 0.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28031>