fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 22:18:18 +02:00

Author	SHA1	Message	Date
Kenneth Graunke	32cce2f397	intel/brw: Set appropriate types for 16-bit sampler trailing components 16-bit SIMD8 sampler writeback messages come with a bit of padding in them, requiring us to emit a LOAD_PAYLOAD to reorganize the data into the padding-free format expected by NIR. Additionally, we may reduce the response length on the sampler messages based on which components of the (always vec4) NIR destination are actually in use. When we do that, dest_size > read_size, and the trailing components are all empty BAD_FILE registers, indicating the contents are undefined. Unfortunately, we can't ignore those trailing components entirely. In the past, we left them default-initialized, giving us a BAD_FILE register with UD type (which didn't matter, since all sampler returns were 32-bit). But with 16-bit, this was confusing the LOAD_PAYLOAD. For example, writing RGB and skipping A (without sparse) would produce read_size = 3 and dest_size = 4 and nir_dest[5] containing: nir_dest[] = <R:hf, G:hf, B:hf, blank-A:ud, blank-sparse:ud> We'd then call LOAD_PAYLOAD on the first 4 sources, causing it to see 3 HF's and a UD, and try to copy the full 32-bit value at the end, instead of 16-bits of pad like we intended. This meant it would overflow the destination register's size, triggering validation errors. Thanks to Ian Romanick for noticing this, writing a test, and also coming up with a nearly identical fix. Fixes: `0116430d39` ("intel/brw: Handle 16-bit sampler return payloads") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11617 References: https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/152 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30529>	2024-08-06 17:26:05 +00:00
Alyssa Rosenzweig	d99c2ef059	nir/opt_uniform_atomics: add fs atomics predicated? flag on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in this pass is redundant. and would cause a problem since we'd then have to lower the "is helper inv?" flag late. so just skip the extra lowering code. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:17 -04:00
Alvin Wong	0413e1f7dc	hasvk: Conditionally expose VK_KHR_present_wait Gate it behind driconf query for now. Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no> Acked-by: Hans-Kristian Arntzen <post@arntzen-software.no> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30480>	2024-08-06 11:39:38 +08:00
Kenneth Graunke	c19e5a0a75	intel/brw: Replace predicated break optimization with a simple peephole We can achieve most of what brw_fs_opt_predicated_break() does with simple peepholes at NIR -> BRW conversion time. For predicated break and continue, we can simply look at an IF ... ENDIF sequence after emitting it. If there's a single instruction between the two, and it's a BREAK or CONTINUE, then we can move the predicate from the IF onto the jump, and delete the IF/ENDIF. Because we haven't built the CFG at this stage, we only need to remove them from the linked list of instructions, which is trivial to do. For the predicated while optimization, we can rely on the fact that we already did the predicated break optimization, and simply look for a predicated BREAK just before the WHILE. If so, we move the predicate onto the WHILE, invert it, and remove the BREAK. There are a few cases where this approach does a worse job than the old one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks containing break, and nir_trivialize_registers may decide it needs to insert movs into those blocks. So, at NIR -> BRW time, we'll actually emit some MOVs there, which might have been possible to copy propagate out after later optimizations. However, the fossil-db results show that it's still pretty competitive. For instructions, 1017 shaders were helped (average -1.87 instructions), while only 62 were hurt (average +2.19 instructions). In affected shaders, it was -0.08% for instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	fad63d6483	intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass With the select peephole gone, this no longer does much of anything. No instruction changes in fossil-db on Alchemist. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	06e8335e11	intel/brw: Delete the brw_fs_opt_peephole_select() pass Now that we can handle load_ubo in NIR's peephole select pass, the backend pass isn't really useful anymore. fossil-db results on Alchemist show almost no impact: Totals: Instrs: 150646561 -> 150647106 (+0.00%); split: -0.00%, +0.00% Cycles: 12633748945 -> 12633760459 (+0.00%) Totals from 261 (0.04% of 630008) affected shaders: Instrs: 404946 -> 405491 (+0.13%); split: -0.00%, +0.14% Cycles: 23947172 -> 23958686 (+0.05%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	7c579f448f	intel/brw: Mark all UBO access with a direct buffer index as speculative UBO loads with a non-indirect buffer index should be safe to perform speculatively. With a direct offset, we may sometimes turn them into push constants, at which point it's just reading a register with no cost at all. Otherwise, we access them via messages that use surface state, and automatically perform bounds checking. So we shouldn't have any issues with reading out of bounds and page faulting, for example. This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics, so we can turn simple if's with loads on both sides to bcsels. In some cases this can collapse a surprising amount of control flow, allowing other optimizations to work better. The i965 OpenGL driver used load_uniform intrinsics, which are allowed in NIR's peephole select pass. But iris uses the Gallium NIR pass that translates uniforms to loads from UBO 0, so we haven't been able to take advantage of NIR's peephole select pass there. The backend pass was still able to handle this to some extent, however. fossil-db results on Alchemist: Totals: Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00% Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00% Send messages: 7416330 -> 7416261 (-0.00%) Spill count: 52471 -> 52473 (+0.00%) Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00% Scratch Memory Size: 3197952 -> 3198976 (+0.03%) Totals from 1848 (0.29% of 630003) affected shaders: Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02% Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03% Send messages: 59829 -> 59760 (-0.12%) Spill count: 3870 -> 3872 (+0.05%) Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02% Scratch Memory Size: 174080 -> 175104 (+0.59%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Felix DeGrood	0eed818588	anv/measure: ignore events from reused command buffers INTEL_MEASURE currently does not support measuring events in parallel from reused command buffers. When this case is detected, warn user and disallow. Fixes observed segfaults in such apps. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30523>	2024-08-05 23:45:41 +00:00
Iván Briano	f8553f56ac	intel/rt: fix terminateOnFirstHit handling If TraceRay() is called with the TerminateOnFirstHit flag, we need to terminate the ray on the first confirmed intersection. This is handled by the lowering of accept_ray_intersection and it's working fine for the case of multiple instances of the intersection shader being called. But if the shader calls reportIntersection() more than once, we were handling them all and accepting the closest one regardless of the flag. Check for the flag on every confirmed intersection and, if set, accept it right there. The subsequent lowering will take care of terminating handling the ray termination if necessary. Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first Cc: mesa-stable Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>	2024-08-05 21:43:36 +00:00
Lionel Landwerlin	c6bf1f02c4	anv: reuse object string for RMV token The current code is not handling the potential NULL pointer in VkDebugUtilsObjectNameInfoEXT::pObjectName Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `e1b9a6e4f3` ("anv: initial RMV support") Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30516>	2024-08-05 21:12:59 +00:00
Juston Li	43cb986d9e	anv/android: resolve ANB swapchain images on bind Like AHB, we don't know the layout for an image backed by gralloc swapchain memory until bind when gralloc information is passed by the platform. Signed-off-by: Juston Li <justonli@google.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29850>	2024-08-05 20:06:06 +00:00
Juston Li	bcb17acab9	anv/android: refactor out u_gralloc tiling query Refactor out shared code for the u_gralloc tiling query so it can also be used by ahw and later anb resolves. Signed-off-by: Juston Li <justonli@google.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29850>	2024-08-05 20:06:06 +00:00
Paulo Zanoni	644dcc0337	anv: disable CCS for Source2 games on Xe2 Dota 2 and Counter-Strike 2 really want to be able to allocate memory for both VkImages and VkBuffers from the same memory type. Xe2's special compression-only memory type does not support buffers, which makes these games crash. Disable CCS on these games as a workaround. This is a temporary workaround as we're still working towards a long-term solution (either by fixing the engine or finding a way better expose our memory types). Backport-to: 24.2 Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11520 Fixes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11521 Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30481>	2024-08-05 18:36:46 +00:00
Paulo Zanoni	b4f5a04223	anv: don't expose the compressed memory types when DEBUG_NO_CCS These memory types are useless when CCS is disabled, don't leave them there so they don't confuse applications. Backport-to: 24.2 Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30481>	2024-08-05 18:36:46 +00:00
Caio Oliveira	ba3fd5dc57	intel/brw: Don't retype load_subgroup_invocation result to signed The values are small unsigned integers, so their signed representation will be the same -- the sign conversion is not needed. As a result the extra MOV can be elided by the optimizations. Fossil-db results for DG2 ``` Totals: Instrs: 151779000 -> 151761591 (-0.01%) Cycle count: 12743968649 -> 12742826024 (-0.01%); split: -0.01%, +0.00% Max live registers: 31834993 -> 31834996 (+0.00%) Totals from 17018 (2.70% of 631450) affected shaders: Instrs: 2381740 -> 2364331 (-0.73%) Cycle count: 76798588 -> 75655963 (-1.49%); split: -1.70%, +0.22% Max live registers: 378921 -> 378924 (+0.00%) ``` and TGL ``` Totals: Instrs: 149812033 -> 149794080 (-0.01%); split: -0.01%, +0.00% Cycle count: 11534727002 -> 11534929834 (+0.00%); split: -0.01%, +0.01% Spill count: 42510 -> 42511 (+0.00%); split: -0.00%, +0.01% Fill count: 75100 -> 75101 (+0.00%); split: -0.00%, +0.00% Max live registers: 31727318 -> 31727321 (+0.00%) Totals from 17421 (2.76% of 630458) affected shaders: Instrs: 3092614 -> 3074661 (-0.58%); split: -0.58%, +0.00% Cycle count: 286061417 -> 286264249 (+0.07%); split: -0.32%, +0.39% Spill count: 11538 -> 11539 (+0.01%); split: -0.02%, +0.03% Fill count: 21359 -> 21360 (+0.00%); split: -0.01%, +0.01% Max live registers: 418954 -> 418957 (+0.00%) ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30508>	2024-08-05 18:05:45 +00:00
Felix DeGrood	86c4e89aa2	anv: fix src_hash dumping for compute shaders Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30495>	2024-08-05 16:46:42 +00:00
José Roberto de Souza	132c5cdeb9	intel/dev: Support new topology type with SIMD16 EUs Xe KMD will now report the different topology mask types based on the type of the EU of running platform. With this we don't need to divide the EU count by 2 in intel_perf. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30127>	2024-08-05 07:01:47 -07:00
Lionel Landwerlin	97f6a296e3	anv: better signal new frames to utrace Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>	2024-08-03 16:03:15 +03:00
Lionel Landwerlin	78ae7ab856	anv/hasvk: add indirect tracepoint arguments Gives visibility on some indirect parameter dispatches : - draw count - compute dispatch size Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>	2024-08-03 16:03:04 +03:00
Lionel Landwerlin	0a17035b5c	u_trace: add support for indirect data Allows a driver to declare indirect arguments for its tracepoints and pass an address. u_trace will request a copy of the data which should be implemented on the command processor. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Co-Authored-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>	2024-08-03 16:03:00 +03:00
Lionel Landwerlin	cb27b9541b	u_trace: remove timestamp reference in allocations We want to reduce the buffer allocations for other type of data than timestamps. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>	2024-08-03 16:02:56 +03:00
Lionel Landwerlin	4347ccbe57	u_trace: rework tracepoint argument declaration We're about to add indirect arguments, having a better way to describe arguments (as capture/storage) will be useful. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29944>	2024-08-03 16:02:53 +03:00
Jordan Justen	58469620d3	intel/brw/validate: Convert access mask to be grf based Our validation code doesn't need to know which bytes are accessed. It only needs to know which grfs were accessed by an element. This also helps to easily handle the Xe2 register size change. Backport-to: 24.2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>	2024-08-02 22:18:51 +00:00
Jordan Justen	e62606b2ec	intel/brw/validate: Update dst grf crossing check for Xe2 Rework: * Update grf_size_shift calculation (s-b Ken) Backport-to: 24.2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>	2024-08-02 22:18:51 +00:00
Jordan Justen	f2800deacb	intel/brw/validate: Simplify grf span validation check by not using a mask Previously this check would create a mask of the bytes used in the grf, and then shift the mask. This worked well when there was 32 bytes in the register because a 64-bit uint64_t could easily detect that bytes were used in the next regiter. (The next register was the high 32-bits of the `access_mask` variable.) With Xe2, the register size becomes 64 bytes, meaning this strategy doesn't work. Instead of a mask, we can just check to see if more than 1 grfs are used during each loop iteration. (Suggested by Ken.) This will make it easier to extend for Xe2 in a follow on commit. Verified this with dEQP-VK.subgroups.arithmetic.compute.subgroupexclusivemul_u64vec4_requiredsubgroupsize on Xe2, which otherwise would cause the program to fail to validate because it assumed a grf was 32 bytes. Backport-to: 24.2 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28479>	2024-08-02 22:18:51 +00:00
Hyunjun Ko	78ff100a52	anv: support h265 encoding Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	eefa886b01	anv/video: initial support for h265 encoding Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	3bd46afac1	anv/query: consider codec when querying the encoding status. Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	52f678004f	intel/decoder: Handle HCP_PAK_INSERT_OBJECT Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	46e02ee861	intel/genxml: adds a value of reference pic to HCP_SURFACE_STATE Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	7f280e1e93	intel/genxml: fix some length of HCP_FQM_STATE Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	663f9eb740	intel/genxml: Adds more VDENC commands Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	3eb69b9577	intel/genxml: fix the length of VDENC_DS_REF_SURFACE_STATE Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	e79cad5af0	intel/genxml: Add missing fields for HCP_SLICE_STATE Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	e28a299863	anv: enable VK_KHR_video_encode_queue and VK_KHR_video_encode_h264 Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Dave Airlie	3fbcd95b20	anv/video: add mode costs for h264 encoding Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	3ec8f7f995	anv/video: initial support for h264 encoding Co-authored-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	f6c3e82201	anv/video: implemnt VkGetEncodedVideoSessionParametersKHR Also add a stub for VkGetPhysicalDeviceVideoEncodeQualityLevelPropertiesKHR Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	f25cf314b3	anv/video: remove unnecessary macros Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	a660bd9471	anv/query: handle VK_QUERY_TYPE_VIDEO_ENCODE_FEEDBACK_KHR Anv supports VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BUFFER_OFFSET_BIT_KHR and VK_VIDEO_ENCODE_FEEDBACK_BITSTREAM_BYTES_WRITTEN_BIT_KHR. Also add to handle the VK_QUERY_RESULT_WITH_STATUS_BIT_KHR flag. Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	9425ba6f2b	intel/genxml: update VDENC instructions Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:59 +00:00
Hyunjun Ko	b97d440bc5	intel/genxml: change the length of MFX_QM_STATE Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:58 +00:00
Hyunjun Ko	5057a33fe3	intel/genxml: add a missing value for MFX_SURFACE_STATE Signed-off-by: Hyunjun Ko <zzoon@igalia.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27810>	2024-08-02 07:15:58 +00:00
Kenneth Graunke	8bca7e520c	intel/brw: Only force g0's liveness to be the whole program if spilling We don't actually need to extend g0's live range to the EOT message generally - most messages that end a shader are headerless. The main implicit use of g0 is for constructing scratch headers. With the last two patches, we now consider scratch access that may exist in the IR and already extend the liveness appropriately. There is one remaining problem: spilling. The register allocator will create new scratch messages when spilling a register, which need to create scratch headers, which need g0. So, every new spill or fill might extend the live range of g0, which would create new interference, altering the graph. This can be problematic. However, when compiling SIMD16 or SIMD32 fragment shaders, we don't allow spilling anyway. So, why not use allow g0? Also, when trying various scheduling modes, we first try allocation without spilling. If it works, great, if not, we try a (hopefully) less aggressive schedule, and only allow spilling on the lowest-pressure schedule. So, even for regular SIMD8 shaders, we can potentially gain the use of g0 on the first few tries at scheduling+allocation. Once we try to allocate with spilling, we go back to reserving g0 for the entire program, so that we can construct scratch headers at any point. We could possibly do better here, but this is simple and reliable with some benefit. Thanks to Ian Romanick for suggesting I try this approach. fossil-db on Alchemist shows some more spill/fill improvements: Totals: Instrs: 149062395 -> 149053010 (-0.01%); split: -0.01%, +0.00% Cycles: 12609496913 -> 12611652181 (+0.02%); split: -0.45%, +0.47% Spill count: 52891 -> 52471 (-0.79%) Fill count: 101599 -> 100818 (-0.77%) Scratch Memory Size: 3292160 -> 3197952 (-2.86%) Totals from 416541 (66.59% of 625484) affected shaders: Instrs: 124058587 -> 124049202 (-0.01%); split: -0.01%, +0.01% Cycles: 3567164271 -> 3569319539 (+0.06%); split: -1.61%, +1.67% Spill count: 420 -> 0 (-inf%) Fill count: 781 -> 0 (-inf%) Scratch Memory Size: 94208 -> 0 (-inf%) Witcher 3 shows a 33% reduction in scratch memory size, for example. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:34 -07:00
Kenneth Graunke	4ca4b064cf	intel/brw: Record g0 as live for sends with send_ex_desc_scratch set brw_send_indirect_split_message() implicitly reads g0 to construct the extended message descriptor for certain send messages when this is set. Record that liveness explicitly. Thanks to Francisco Jerez for reminding me about this use of g0. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:32 -07:00
Kenneth Graunke	9200fb966c	intel/brw: Record that SHADER_OPCODE_SCRATCH_HEADER uses g0 The generator code for emitting legacy scratch headers was implicitly using g0 as a source. But the IR wasn't indicating any usage of g0, which means the liveness isn't properly tracked at the IR level. It works because we reserve g0 as permanently live for the whole program. In order to stop doing that, we need to record it properly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:31 -07:00
Kenneth Graunke	545f20419f	intel/brw: Delete fs_reg_alloc::discard_interference_graph() Unused since commit `50519598ff`. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30319>	2024-08-01 16:37:28 -07:00
Juston Li	ef58f2408f	anv/android: handle R8G8B8X8 as R8G8B8A8 Fall through to common vk_ahb_format_to_image_format() to handle R8G8B8X8 as R8G8B8A8. Fixes issues with querying for format feature support when its handled as R8G8B8. Signed-off-by: Juston Li <justonli@google.com> Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30080>	2024-08-01 17:20:18 +00:00
Lionel Landwerlin	fafa0d5abb	anv: fix check on pipeline mode to track buffer writes We want to check the current mode of the pipeline, not the queue type (since graphics can toggle between 3D & gpgpu modes). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `455a13fb7f` ("anv: limit ANV_PIPE_RENDER_TARGET_BUFFER_WRITES to blorp operations using 3D") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11607 Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30469>	2024-08-01 12:20:52 +00:00
Sushma Venkatesh Reddy	0116430d39	intel/brw: Handle 16-bit sampler return payloads API requires samplers to return 32-bit even though hardware can handle 16-bit floating point, so we detect that case and make more efficient use of memory BW. This is helping improve performance of encode and decode tokens during LLM by at least 5% across multiple platforms. Thank you Kenneth Graunke for suggesting and guiding me throughout this implementation. Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>	2024-07-31 21:26:46 +00:00

1 2 3 4 5 ...

12482 commits