fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 20:00:10 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	43ca7f4178	intel/compiler: Convert brw_wm_aa_enable to brw_sometimes There are other cases where we want a tri-state logic like this. May as well have one enum for all the cases. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:17 +00:00
Jason Ekstrand	5d1c538449	intel/fs: Return early in a couple builtin setup helpers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:17 +00:00
Jason Ekstrand	714a291673	intel/compiler: Use SHADER_OPCODE_SEND for PI messages Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:17 +00:00
Jason Ekstrand	d25e5310bc	intel/nir: Lower barycentrics to per-sample in a dedicated pass This is more similar to what we do for single-sample and it should be more clear going forward once our lowering gets more complex. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:17 +00:00
Jason Ekstrand	991d546102	intel/compiler: Document wm_prog_key::persample_interp Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>	2023-02-06 09:12:17 +00:00
Ian Romanick	ea413e826b	nir: Eliminate nir_op_f2b Builds on the work of !15121. This gets to delete even more code because many drivers shared a lot of code for i2b and f2b. No shader-db or fossil-db changes on any Intel platform. v2: Rebase on `1a35acd8d9`. v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin. v4: Another rebase. Remove f2b stuff from Midgard. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>	2023-02-03 22:39:57 +00:00
Sagar Ghuge	0c083d29a5	intel/fs: Always stall between the fences on Gen11+ Be conservative in Gfx11+ and always stall in a fence. Since there are two different fences, and shader might want to synchronize between them. This change also brings back the original code block for the stall between the fence and comment from the commit `b390ff3517`. v2: (Caio) - Re-arrange code block. - Adjust comment. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6958 Fixes: `f7262462` ("intel/fs: Rework fence handling in brw_fs_nir.cpp") Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Tested-by: Mark Janes <markjanes@swizzler.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20996>	2023-02-02 00:21:21 +00:00
Amber	ab4c2990ed	intel/compiler: use lower_image_samples_to_one Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewer-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Amber Amber <amber@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20813>	2023-02-01 19:52:49 +00:00
Marcin Ślusarz	af9e2b8bf1	intel/compiler/mesh: remove dead code path supporting >4 dword writes Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20858>	2023-01-31 18:28:21 +00:00
Marcin Ślusarz	be82ed28f0	intel/compiler/mesh: support longer write messages Allowing longer writes reduces the number of send messages needed to support unaligned 4-component writes. Note: nothing currently generates 8-component writes, so this change makes "second_mask" code path in emit_urb_direct_writes and emit_urb_indirect_writes_mod dead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20858>	2023-01-31 18:28:21 +00:00
Marcin Ślusarz	3131c2fc7a	intel/compiler/mesh: optimize indirect writes Our hardware requires that we write to URB using full vec4s at aligned addresses. It gives us an ability to mask-off dwords within vec4 we don't want to write, but we have to know their positions at compile time. Let's assume that: - V represents one dword we want to write - ? is an unitinitialized value - "\|" is a vec4 boundary. When we want to write 2-dword value at offset 0 we generate 1 write message: \| V1 V2 ? ? \| with mask: \| 1 1 0 0 \| When we want to write 4-dword value at offset 2 we generate 2 write messages: \| ? ? V1 V2 \| V3 V4 ? ? \| with mask: \| 0 0 1 1 \| 1 1 0 0 \| However if we don't know the offset within vec4 at compile time we currently generate 4 write messages: \| V1 V1 V1 V1 \| \| 0 0 1 0 \| \| V2 V2 V2 V2 \| \| 0 0 0 1 \| \| V3 V3 V3 V3 \| \| 1 0 0 0 \| \| V4 V4 V4 V4 \| \| 0 1 0 0 \| where masks are determined at run time. This is quite wasteful and slow. However, if we could determine the offset modulo 4 statically at compile time, we could generate only 1 or 2 write messages (1 if modulo is 0) instead of 4. This is what this patch does: it analyzes the addressing expression for modulo 4 value and if it can determine it at compile time, we generate 1 or 2 writes, and if it can't we fallback to the old 4 writes method. In mesh shader, the value of offset modulo 4 should be known for all outputs, with an exception of primitive indices. The modulo value should be known because of MUE layout restrictions, which require that user per-primitive and per-vertex data start at address aligned to 8 dwords and we should statically always know the offset from this base. There can be some cases where the offset from the base is more dynamic (e.g. indirect array access inside a per-vertex value), so we always do the analysis. Primitive indices are an exception, because they form vec3s (for triangles), which means that the offset will not be easy to analyse. When U888X index format lands, primitive indices will use only one dword per triangle, which means that we'll always write them using one message. Task shaders don't have any predetermined structure of output memory, so always do the analysis. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20050>	2023-01-31 13:50:08 +00:00
Marcin Ślusarz	432e263284	intel/compiler: fine-grained control of dispatch widths Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v2] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20854>	2023-01-27 11:00:41 +00:00
Lionel Landwerlin	13cca48920	intel/fs: drop FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GFX7 We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more generic sends and drop this internal opcode. The idea behind this change is to allow bindless surfaces to be used for UBO pulls and why it's interesting to be able to reuse setup_surface_descriptors(). But that will come in a later change. No shader-db changes on TGL & DG2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>	2023-01-26 11:26:53 +00:00
Francisco Jerez	7b5e933629	intel/fs: Fix src and dst types of LOAD_PAYLOAD ACP entries during copy propagation. The ACP entries created by copy propagation to track the implied copies of LOAD_PAYLOAD instructions don't model the behavior of LOAD_PAYLOAD correctly, since (as of `41868bb682`) header moves are implicitly retyped to UD and the destination of non-header copies implicitly uses the same type as the corresponding source, even though the ACP entries created for such copies could incorrectly represent a type conversion, which can lead to mis-optimization of the program. According to Marcin, this fixes the func.mesh.ext.workgroup_id.task.q0 crucible test. Fixes: `41868bb682` ("i965/fs: Rework the fs_visitor LOAD_PAYLOAD instruction") Reported-by: Marcin Ślusarz <marcin.slusarz@intel.com> Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18980>	2023-01-25 22:22:12 +00:00
Marcin Ślusarz	536a2acfc2	intel/compiler/mesh: handle const data in task & mesh programs Started showing up when nir_opt_large_constants call was moved in `88756cee8d`. Fixes dEQP-VK.mesh_shader.ext.smoke.monolithic.fullscreen_gradient* Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `88756cee8d` ("intel/compiler: Run nir_opt_large_constants before scalarizing consts") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20876>	2023-01-24 14:47:21 +00:00
Marcin Ślusarz	9bb18a4f9e	intel/compiler: fix generation of vec8/vec16 alu instruction I stumbled on this when I inserted some suboptimal lowering code after all optimizations. Adding certain subset of optimizations after my lowering code actually avoided this bug, so I think it's not possible to hit this on upstream. Let's fix this for the next person generating suboptimal code... Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20857>	2023-01-24 13:15:58 +00:00
Lionel Landwerlin	a50d2fdb46	intel/fs: avoid cmod optimization on instruction with different write_mask I've been running into failures with tests like : dEQP-VK.robustness.robustness2.bind.notemplate.rgba32i.unroll.nonvolatile.uniform_buffer_dynamic.no_fmt_qual.len_4.samples_1.1d.frag With the load_global_const_block_intel NIR intrinsic, you can load a vec8/vec16 with a predicate. The predicate is correctly uniformized to feed into the SEND instruction's flag register. The problem is that a series of optimization first remove the find_live_channel and then changes the broadcast into a simple MOV instruction, on the assumption that the first channel is always active if there is not control flow. This is correct. But after that the cmod optimzation will remove this instruction : mov.nz.f0.0(16) null:D, vgrf16+0.0<0>:D NoMask because it seems to be equivalent to : cmp.g.f0.0(16) vgrf16:D, vgrf12:D, 63d In this case vgrf16 is the predicate to the load block SEND instruction. Since the execution mask is different between both, some of the channels of the SEND instruction end up not being loaded or loaded with the wrong predication and we end up with incorrect UBO data. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20852>	2023-01-24 07:35:42 +00:00
Kenneth Graunke	7092c1218a	intel/compiler: Use more symbolic source names in components_read() Rather than hardcoding source 1, source 2, etc. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	16b66ab659	intel/compiler: Drop dest checking in atomic code NIR atomic operation intrinsics all have destinations. This is just copy and pasted from other generic intrinsic handling where that may or may not be the case. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	780f3e2e6b	intel/compiler: Delete all the A64 atomic variants for type sizes These are handled identically in almost all cases. There is one place in the legacy surface lowering that was obtaining the bitsize from the opcode, but the LSC-based lowering uses (type_sz(inst->dst.type) * 8) for that and works just fine. If we just do that in the legacy lowering too, then we don't need this plethora of opcodes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	03ddde1230	intel/compiler: Combine nir_emit_{ssbo,shared}_atomic into one helper These are basically identical save for: - shared has surface hardcoded to SLM rather than an SSBO index - shared has to handle adding the 'base' const_index (SSBO have none) - the NIR source index for data is shifted by one It's not worth copy and pasting the entire function for this. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	b84939c678	intel/compiler: Delete fs_visitor::nir_emit_{ssbo,shared}_atomic_float() These are now basically identical to their non-float counterparts. The only thing that differed was the opcode checking to determine which operands existed. Now that we have a unified opcode enum and a helper for the number of data operands, we can just use that. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	f7b29d7924	intel/compiler: Drop redundant 32-bit expansion for shared float atomics We already expanded data to 32-bit a few lines earlier, so this is just redundantly doing it a second time. Fixes: `43169dbbe5` ("intel/compiler: Support 16 bit float ops") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	02129eee3a	intel/compiler: Eliminate SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT The only reason for the separate opcode was because of the overlapping BRW_AOP_* enums, making it impossible to tell whether a particular AOP was the integer or float operation. Now that we use the lsc_opcode enums, we can just have the legacy lowering inspect the opcode and select the right descriptor. No need for a separate opcode. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	284f0c9a57	intel/compiler: Add an lsc_op_num_data_values() helper There are a number of places that need to know how many operands an LSC atomic takes (0 for inc/dec, 1 for most things, 2 for cmpxchg). We can add a helper for that and eliminate some code (with more to come). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	90a2137cd5	intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs This gets our logical atomic messages using the lsc_opcode enum rather than the legacy BRW_AOP_* defines. We have to translate one way or another, and using the modern set makes sense going forward. One advantage is that the lsc_opcode encoding has opcodes for both integer and floating point atomics in the same enum, whereas the legacy encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX), which made it impossible to handle both sensibly in common code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Kenneth Graunke	8d2dc52a14	intel/compiler: Move atomic op translation into emit_*_atomic() There's no need to pass both the intrinsic and an opcode computed from that same intrinsic. Just do it in the functions themselves. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rohan Garg <rohan.garg@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>	2023-01-19 08:42:22 +00:00
Francisco Jerez	f40e17059a	intel/fs/gfx12+: Drop redundant handling of SHADER_OPCODE_BROADCAST in exec pipe inference. Commit `c80c0ed943` introduced handling of SHADER_OPCODE_BROADCAST into inferred_exec_pipe(), but it was already being handled, drop the redundant handling. Shouldn't lead to any functional changes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>	2023-01-19 06:14:03 +00:00
Francisco Jerez	b867d1b851	intel/eu/gfx12+: Implement decoding of 64-bit immediates. C.f. `a12533f2ce`. The corresponding change for the decoding path was never implemented so the disassembler was printing incorrect immediate values. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>	2023-01-19 06:14:03 +00:00
Francisco Jerez	f80f29dc4b	intel/disasm/gfx12+: Fix print out of non-existing condmod field with 64-bit immediate. The conditional mode field doesn't exist for instructions with a 64-bit immediate, so this would currently print garbage. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>	2023-01-19 06:14:03 +00:00
Francisco Jerez	f3352745ad	intel/disasm/gfx12+: Use helper instead of hardcoded bit access for 64-bit immediates. So we don't have to duplicate code to handle differences in the encoding of 64-bit immediates across platforms. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>	2023-01-19 06:14:03 +00:00
Francisco Jerez	4a2e7306dd	intel/fs/gfx12: Ensure that prior reads have executed before barrier with acquire semantics. This avoids a violation of the Vulkan memory model that was leading to intermittent failures of at least 8k test-cases of the Vulkan CTS (within the group dEQP-VK.memory_model.) on TGL and DG2 platforms. In theory the issue may be reproducible on earlier platforms like IVB and ICL, but the SYNC.ALLWR instruction is not available on those platforms so a different (likely costlier) fix will be needed. The issue occurs within the sequence we emit for a NIR memory barrier with acquire semantics requiring the synchronization of multiple caches, e.g. in pseudocode for a barrier involving the TGM and UGM caches on DG2: x <- load.ugm // Atomic read sequenced-before the barrier y <- fence.ugm z <- fence.tgm wait(y, z) w <- load.tgm // Read sequenced-after the barrier In the example we must provide the guarantee that the memory load for x is completed before the one for w, however this ordering can be reversed with the intervention of a concurrent thread, since the UGM fence will block on the prior UGM load and potentially take a long time, while the TGM fence may complete and invalidate the TGM cache immediately, so a concurrent thread could pollute the TGM cache with stale contents for the w location before* the UGM load has completed, leading to an inversion of the expected memory ordering. v2: Apply the workaround regardless of whether the NIR barrier intrinsic specifies multiple storage classes or a single one, since an acquire barrier is required to order subsequent requests relative to previous atomic requests of unknown storage class not necessarily specified by the memory scope information of the intrinsic. Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>	2023-01-18 21:34:33 -08:00
Tapani Pälli	53de48f1c4	intel/compiler: add cpp_std=c++17 when building tests Otherwise build fails: "../src/intel/compiler/brw_private.h:40:4: note: ‘std::variant’ is only available from C++17 onwards" Fixes: `6c194ddd18` ("intel/compiler: Prepare SIMD selection helpers to handle different prog_datas") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20725>	2023-01-17 13:58:03 +00:00
Nico Cortes	29adbb132f	Revert "intel/compiler: fine-grained control of dispatch widths" This reverts commit `bed18ab3e2`. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8063 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20654>	2023-01-12 00:33:25 +00:00
Marcin Ślusarz	bed18ab3e2	intel/compiler: fine-grained control of dispatch widths Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20535>	2023-01-11 08:17:12 +00:00
Ian Romanick	51be623372	intel/eu/validate: Check predication and cmod for SEL, CMP, and CMPN Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>	2023-01-09 19:15:19 +00:00
Ian Romanick	e0f409c5d8	intel/eu/validate: Add validation for csel v2: Also check the condition modifier. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>	2023-01-09 19:15:19 +00:00
Ian Romanick	3a7c23973b	intel/eu/validate: Add validation for bfi2 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>	2023-01-09 19:15:19 +00:00
Ian Romanick	f34821d998	intel/eu/validate: More validation for logic ops v2: Use number of source to condition validating src1 instead of using the opcode. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>	2023-01-09 19:15:19 +00:00
Ian Romanick	8be7406c81	intel/compiler: Assert that ARF used is the accumulator v2: Move the new check to be with similar existing checks. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>	2023-01-09 19:15:19 +00:00
Ian Romanick	3b579a2ea8	intel/compiler: Validate 3-source instruction source strides Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>	2023-01-09 19:15:19 +00:00
Ian Romanick	c5684019f6	intel/compiler: Validate 3-source instruction sources have same base type This can't be checked in EU validation because the bits to describe the base type of the individual sources no longer exist. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>	2023-01-09 19:15:19 +00:00
Lionel Landwerlin	6b494745be	intel/fs: only avoid SIMD32 if strictly inferior in throughput This enabled SIMD32 in blorp shaders and seems to be give a small FPS bump when using a DG2 GPU as secondary (requires copies to linear buffers to exchange with main GPU). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19341>	2023-01-09 08:41:47 +00:00
Ian Romanick	8ab7ec0129	intel/compiler: Enable lower_bitfield_extract_to_shifts and lower_bitfield_insert_to_shifts for pre-Gfx7 GLSL IR opcodes generated for bitfieldExtract and bitfieldInsert are lowered by lower_instructions. `4dff3ff005` ("nir/opt_algebraic: Optimize open coded bfm.") adds an optimization that can rematerialize nir_op_bfm that was prevented by the GLSL IR lowering. It appears that every piece of hardware, except older Intel GPUS, that has real integers (i.e., lower_bitops is not set) also sets lower_bitfield_extract_to_shifts and lower_bitfield_insert_to_shifts. Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `4dff3ff005` ("nir/opt_algebraic: Optimize open coded bfm.") Closes: #7874 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20323>	2023-01-03 18:37:53 -08:00
Lionel Landwerlin	25608659a0	intel/compiler: mark shader_record_ptr as uniform Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20413>	2022-12-23 09:22:13 +00:00
Jordan Justen	78a75e0d25	intel/common/intel_genX_state.h: Add intel_set_ps_dispatch_state() This replaces brw_fs_get_dispatch_enables(), which was added in `b9403b1c47` ("intel: factor out dispatch PS enabling logic"), but this function will not work well for future changes to 3DSTATE_PS. So, instead, this moves the related code into a "genX" file which can directly update 3DSTATE_PS for the given platform. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20329>	2022-12-15 00:54:59 -08:00
Ian Romanick	eb76cee9f8	nir: Eliminate nir_op_i2b There are a lot of optimizations in opt_algebraic that match ('ine', a, 0), but there are almost none that match i2b. Instead of adding a huge pile of additional patterns (including variations that include both ine and i2b), always lower i2b to a != 0. At this point in the series, it should be impossible for anything to generate i2b, so there /should not/ be any changes. The failing test on d3d12 is a pre-existing bug that is triggered by this change. I talked to Jesse about it, and, after some analysis, he suggested just adding it to the list of known failures. v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b. v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py. v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after nir_lower_doubles makes progress. The latter can generate b2i instructions, but nir_lower_int64 can't handle them (anymore). v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I had accidentally removed the f2b(bf2(x)) optimization. v6: Just eliminate the i2b instruction. v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused) emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this function was still used. 🤷 No shader-db changes on any Intel platform. All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141165875 -> 141165873 (-0.0%) Instructions helped: 2 Cycles in all programs: 9098956382 -> 9098956350 (-0.0%) Cycles helped: 2 The two Vulkan shaders are helped because of the "new" (('b2i32', ('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern. Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Ian Romanick	edae161d98	intel/fs: Use nir_type_convert instead of nir_type_conversion_op In a future commit, nit_type_conversion_op won't be able to handle i2b (and in a much later commit f2b), so switch many users to the fully featured function. No shader-db or fossil-db changes on any Intel platform. Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>	2022-12-14 06:23:21 +00:00
Lionel Landwerlin	94bb4a13fa	intel/fs: make Wa_1806565034 conditional to non robust access Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20280>	2022-12-13 18:05:19 +00:00
Marcin Ślusarz	75375233f6	intel/compiler/mesh: extract emit_urb_direct_vec4_write No functional changes. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>	2022-12-13 13:00:49 +00:00

1 2 3 4 5 ...

2433 commits