fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Lionel Landwerlin	4deb8e86df	nir: change intel dss_id intrinsic to topology_id This will allow to reuse the same intrinsic for various topology based ID. v2: fix intrinsic comment (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13719>	2022-02-08 12:55:24 +00:00
Marcin Ślusarz	18e628135d	anv: Add support for UBOs, SSBOs and push constants in Mesh pipeline Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13662>	2022-02-02 18:17:57 +00:00
Lionel Landwerlin	0cd93c59ef	intel/compiler: add primitive rate output support Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>	2022-02-02 17:09:46 +00:00
Lionel Landwerlin	cebf284ac1	intel/compiler: add a new pass to lower shading rate into HW format Rework: * Jason: Modernize brw_nir_lower_shading_rate_output: 1. Use nir_shader_instructions_pass() 2. Use *_imm builder helpers. 3. Use nir_intrinsic_base() instead of ->const_index[0] v2: Also lower loads (Caio) v3: Update stage check to trigger lowering (Caio) v4: Assert on != MESH (Caio) v5: Fixup instruction insertion (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>	2022-02-02 17:09:46 +00:00
Caio Oliveira	8bab8f6422	compiler, intel: Add gl_shader_stage_is_mesh() And replace the previous Intel-specific function. Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14823>	2022-02-01 17:41:25 +00:00
Connor Abbott	913bec10c4	nir/lower_subgroups: Rename lower_shuffle to lower_relative_shuffle This option only applies to relative shuffles (up/down/xor), and in a moment we're going to add an option to lower normal shuffles, so rename it. While we're here, rename lower_shuffle() to lower_to_shuffle() for similar reasons. Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14412>	2022-02-01 16:27:45 +00:00
Marcin Ślusarz	24fef8f33d	intel/compiler: Use Task/Mesh InlineData for the first few push constants Replace load_mesh_global_arg_addr_intel with a more general intrinsic load_mesh_inline_data_intel, since inline data now hold both a pointer descriptor information and the first few push constants. Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>	2022-01-29 06:32:19 +00:00
Marcin Ślusarz	1d9f47325b	intel/compiler: handle gl_[Clip\|Cull]Distance from mesh in fragment shaders Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>	2022-01-29 06:32:19 +00:00
Marcin Ślusarz	baa17865de	intel/compiler: handle gl_[Clip\|Cull]Distance in mesh shaders Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>	2022-01-29 06:32:19 +00:00
Caio Oliveira	856a0cacb1	intel/compiler: Merge Per-Primitive attribute handling in Mesh case Just a refactor, no behavior change. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>	2022-01-29 06:32:19 +00:00
Caio Oliveira	2b8b884bcd	intel/compiler: Have specific mesh handling in calculate_urb_setup() Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>	2022-01-29 06:32:19 +00:00
Caio Oliveira	448a840b39	intel/fs/xehp: Add unit test for handling of RaR deps across multiple pipelines. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Paulo Zanoni	d107a0bff8	intel/fs: Assert the GPU supports 64bit ops if present at lower_scoreboard time. On platforms where we don't support 64 bit instructions we shouldn't pass such instructions for the code generator to lower into supported instructions, because this makes their execution pipeline unpredictable to the scoreboard lowering pass on XeHP+ platforms. We really should be reducing all these 64 bit instructions before code generation, so here we add an assert to help us catch and fix these cases more easily. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [ Francisco Jerez: Also allow has_integer_dword_mul. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	79fb7f9de8	intel/fs: Perform 64-bit CLUSTER_BROADCAST lowering in the lower_regioning pass. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	bdf8ac2466	intel/fs: Honor strided source regions specified by the IR for CLUSTER_BROADCAST. This fixes a bug in the CLUSTER_BROADCAST code generation that causes the original IR region to be ignored, this will be a problem when we start lowering 64-bit CLUSTER_BROADCAST instructions at the IR level, since it will lead to instructions with non-trivial regioning. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	6c8782c135	intel/fs: Perform 64-bit SEL_EXEC lowering in the lower_regioning pass. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	9449b71bdd	intel/fs: Perform 64-bit SHUFFLE lowering in the lower_regioning pass. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	d2d72fccf1	intel/fs: Fix destination suboffset calculations for non-trivial strides in SHUFFLE codegen. One of the two SHUFFLE implementations wasn't taking into account the destination stride at all, and the other (more commonly used) one was taking it into account incorrectly since brw_reg::hstride represents the stride logarithmically, so we need to use a left-shift operator instead of product. Found by inspection. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	d1038197f3	intel/fs: Take into account region strides during SIMD lowering decision of SHUFFLE. This fixes a bug in the handcrafted SIMD lowering done by the SHUFFLE code generation, which wasn't taking into account the source and destination region strides while deciding whether it needs to split an instruction. v2: Use new element_sz() helper instead of left shift. (Lionel) Fixes: `90c9f29518` ("i965/fs: Add support for nir_intrinsic_shuffle") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	44e48751d2	intel/fs: Teach the lower_regioning pass how to split instructions of unsuported exec type. This adds some generic infrastructure that allows splitting any instruction into a number of instructions of a smaller legal execution type. This is meant to replace several instances of handcrafted 64bit type lowering done manually in the code generator, which is rather error-prone, prevents scheduling of the lowered instructions, and makes them invisible to the SWSB pass on Gfx12+ platforms, which will become especially problematic on Gfx12.5+ since the EUs introduce multiple asynchronous execution pipelines which the SWSB pass needs to be able to synchronize to one another, so it's critical for the real execution type of the instruction to be visible to the SWSB pass. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	539c879a6b	intel/fs: Move legal exec type calculation into helper function in lower_regioning pass. Right now the execution type lowering functionality of this pass assumes that an integer type of the original bit size is always acceptable, however we'll want more complex behavior than that in order to leverage this pass to automate the lowering of unsupported 64-bit operations into multiple 32-bit operations. In order to do that calculate the closest legal execution type from a new helper function, and take advantage of that function from the has_invalid_exec_type() helper, along the lines of other lower_regioning() helpers structured as a pair of has_invalid_foo() + required_foo() functions. This shouldn't have any functional changes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Francisco Jerez	3886e63033	intel/fs/xehp: Merge repeated in-order read dependencies instead of replacement. Previously the software scoreboard structure would drop previous dependencies for a given register and replace them with the most recent one for the same register when a new instruction (or set of instructions) is processed. This worked correctly on the Gfx12LP platforms this code was originally designed for, because a repeated dependency on the same register would either require the second instruction to synchronize against the first (so the first dependency could be disregarded from that point on) or require the dependency to be RaR and in-order, which allows the synchronization to be optimized out (the first dependency could still be disregarded as well, since the pipeline is in-order). However the latter assumption will break on upcoming Gfx12HP platforms, because they have multiple asynchronous FPU pipelines, so whenever we hit a RaR dependency we need to propagate forward both dependencies, since the order in which both reads will complete is not guaranteed by the hardware in cases where they occur from different asynchronous pipelines. Note that this dependency propagation change requires us to change the definition of dependency::done as well, since that constant is defined to discard any previous dependency information when used as argument for shadow(). This has been reported to fix the following conformance failures on DG2: KHR-GL46.shaders.uniform_block.random.all_per_block_buffers.19 dEQP-GLES3.functional.shaders.derivate.fwidth.* Reported-by: Tapani Pälli <tapani.palli@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5670 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>	2022-01-25 22:40:44 +00:00
Ian Romanick	945fb51fb5	intel/fs: Fix gl_FrontFacing optimization on Gfx12+ It's not obvious why the (gl_FrontFacing ? -1.0 : 1.0) case was handled different for Gfx12+ than for previous generations, and it's not correct. It tries to negate the result as an integer, and it does this before the mask operation that clears the other bits in the value. When we eventually support dual-SIMD8 dispatch, the other front-facing bit is in g1.6 at bit 15, so similar code should be possible there. Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `c92fb60007` ("intel/fs/gen12: Implement gl_FrontFacing on gen12+.") Closes: #5876 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14625>	2022-01-20 22:37:18 +00:00
Dave Airlie	f83f72be8e	intel/brw: drop gl header from the brw backend. This shouldn't be used anywhere now once we drop the GLbitfield64 types. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>	2022-01-19 21:54:58 +00:00
Dave Airlie	1352e0ba0c	mesa/*: add a shader primitive type to get away from GL types. This creates an internal shader_prim enum, I've fixed up most users to use it instead of GL types. don't store the enum in shader_info as it changes size, and confuses other things. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>	2022-01-19 21:54:58 +00:00
Dave Airlie	d54c07b4c4	mesa/*: use an internal enum for tessellation primitive types. To avoid dragging gl.h into places it has no business being, defined tessellation primitive mode to an enum. This has a lot of fallout all over the place. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>	2022-01-19 21:54:58 +00:00
Kenneth Graunke	d475e839da	intel/fs: Reuse the same FS input slot for VUE header fields. VARYING_SLOT_{VIEWPORT,LAYER,PSIZ} all live in the same VUE header slot, and the FS is already set up to read the x/y/z/w component of that vec4. However, we were setting up the SBE to pass each of those items as a separate FS input, so hypothetically if a shader read all three, we would burn 3 FS inputs with redundant data. Not only was this passing extra data to the FS, but it would count as extra input slots for the "Do we have 16 or fewer attributes?" check for using SBE swizzling to rearrange them in a convenient manner. Now we make them share a single FS attribute and only count them once. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14210>	2022-01-19 01:31:47 +00:00
Dave Airlie	f9f7f326fa	intel/compiler: add clamp_pointside to vs/tcs/tes keys. This will be used by crocus and iris to clamp pointsizes only on the last stage of the shader compile. Fixes: `3077d96856` ("crocus: Clamp VS point sizes to the HW limits as required.") Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14359>	2022-01-18 22:53:45 +00:00
Lionel Landwerlin	30a8b8d2df	intel/fs: disable VRS when omask is written As indicated by VkPhysicalDeviceFragmentShadingRatePropertiesKHR::fragmentShadingRateWithShaderSampleMask our implementation will clamp to 1x1 when reading samplemask or writing to samplemask. This fixes vkd3d-proton tests test_sample_mask_dxbc & test_sample_mask_dxil Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b6332fc4a8` ("intel/compiler: handle coarse pixel in render target writes descriptors") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14553>	2022-01-14 19:14:06 +00:00
Jason Ekstrand	a1de102479	intel/fs: Use compare_func for wm_prog_key::alpha_test_func Because 0 is no longer a recognizable value (it's NEVER, which isn't a good default), we add an emit_alpha_test bool to tell the back-end when to bother alpha testing. This lets us only touch crocus with the change. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14157>	2022-01-14 15:08:09 +00:00
Jason Ekstrand	460a953df5	intel/compiler: Stop using GLuint in brw_compiler.h Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14157>	2022-01-14 15:08:09 +00:00
Francisco Jerez	c6455cfec9	intel/fs: Don't assume packed dispatch for fragment shaders on XeHP. The current packed dispatch assumptions for fragment shaders seem to be the reason that the fs-readFirstInvocation-uint-loop Piglit test-case for the ARB_shader_ballot extension fails on DG2 in combination with the patches in this series that enable pixel pipe hashing (thanks Jordan for reporting the regression). I've confirmed that the brw_fs_test_dispatch_packing() test fails on DG2 hardware for fragment shaders, while it succeeds for other shader stages, indicating that the PSD hardware no longer guarantees packed dispatch. Disable it. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>	2022-01-10 18:27:41 -08:00
Danylo Piliaiev	b8d486f298	nir/algebraic: Separate has_dot_4x8 into has_sdot_4x8 and has_udot_4x8 Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but not signed dot product. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>	2022-01-10 13:20:39 +02:00
Rohan Garg	af13119993	intel/fs: OpImageQueryLod does not support arrayed images as an operand When we lower SPIR-V to NIR for textures in vtn_handle_texture, we only bump the number of coordinate components when the op is not a lod query. Update the assert to take this into account. This fixes: - dEQP-VK.robustness.robustness2.bind.template.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.bind.template.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag - dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag Fixes: `231337a1` ("intel/fs/xehp: Assert that the compiler is sending all 3 coords for cubemaps.") Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13925>	2022-01-07 10:53:35 +00:00
Jordan Justen	d57b10ab98	intel/compiler: Adjust TCS instance-id for dg2+ Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14385>	2022-01-05 16:13:28 -08:00
Henry Goffin	fe617bcca0	intel/compiler/test: Fix build with GCC 7 Without this change, test_fs_scoreboard.cpp does not compile on GCC 7 due to the use of C99 initializers in a C++ source file. Fixes: `c847bfaaf5` ("intel/fs/gen12: Add tests for scoreboard pass") Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14349>	2021-12-30 19:59:52 +00:00
Dave Airlie	4392c24844	intel/compiler: drop unused decleration Acked-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>	2021-12-22 21:37:55 +00:00
Dave Airlie	2692a5f8db	intel/compiler: don't lower swizzles in backend. These are lowered by crocus in the frontend, the key entries are still used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>	2021-12-22 21:37:55 +00:00
Dave Airlie	e12b0d0d60	intel/compiler: remove gfx6 gather wa from backend. Crocus lowers this in the frontend, they key member is still used but reset prior to backend. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>	2021-12-22 21:37:55 +00:00
Marcin Ślusarz	a48f1d51e2	intel/compiler: disable workaround not applicable to gfx >= 11 There's nothing in bspec that would suggest this is still needed. It only affected gfx 9 and 10. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14280>	2021-12-22 10:13:25 +00:00
Francisco Jerez	e7470a40c5	intel/fs: Add physical fall-through CFG edge for unconditional BREAK instruction. This adds a missing CFG edge that represents a possible physical control flow path the EU might take under some conditions which isn't part of the logical CFG of the program. This possibility shouldn't have led to problems on platforms prior to Gfx12, since the missing control flow edge cannot possibly influence liveness intervals. However on Gfx12+ it becomes the compiler's responsibility to resolve data dependencies across instructions, and the missing physical control flow paths may lead to a WaR data hazard currently not visible to the software scoreboard pass, which could lead to data corruption. Worse, the possibility for this path to be taken by the EU increases on Gfx12+ due to a hardware bug affecting EU fusion -- However the same physical path can be potentially taken on earlier platforms as well, so this patch extends the CFG on all platforms for consistency, even though the lack of this edge shouldn't lead to any functional issues on platforms earlier than Gfx12. There are no shader-db changes on earlier platforms, so there seems to be no disadvantage from using the same CFG representation as on later platforms. This issue has ben reported on TGL with the following conformance test, thanks to Ian for bringing the FULSIM dependency check warning to my attention: dEQP-VK.graphicsfuzz.spv-stable-pillars-volatile-nontemporal-store Fixes: `4d1959e693` ("intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4940 Reported-by: Tapani Pälli <tapani.palli@intel.com> Reported-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14248>	2021-12-21 00:43:29 +00:00
Jason Ekstrand	eebb2dedb2	intel/fs: Add a NONE scheduling mode While our LIFO scheduling mode attempts to optimize for register pressure, it's often hard for a scheduling algorithm to do better than the instruction order provided by the shader author. Shader authors often do perfectly reasonable things like using texture results immediately after fetching them or constructing texture coordinates immediately before the texture op. When we throw all the instruction ordering information away, we loose any help the author may have given us. By attempting NONE before we fall back to the worst case LIFO mode. And, yes, I tried this with NONE both before and after LIFO and doing NONE before LIFO is substantially better, according to shader-db. total instructions in shared programs: 19673152 -> 19665202 (-0.04%) instructions in affected programs: 33669 -> 25719 (-23.61%) helped: 20 HURT: 0 helped stats (abs) min: 15 max: 4609 x̄: 397.50 x̃: 107 helped stats (rel) min: 2.33% max: 67.50% x̄: 14.60% x̃: 9.12% 95% mean confidence interval for instructions value: -867.61 72.61 95% mean confidence interval for instructions %-change: -21.74% -7.46% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 935562500 -> 935020920 (-0.06%) cycles in affected programs: 18620349 -> 18078769 (-2.91%) helped: 104 HURT: 48 helped stats (abs) min: 88 max: 60986 x̄: 8031.48 x̃: 3680 helped stats (rel) min: 0.61% max: 51.44% x̄: 14.95% x̃: 8.87% HURT stats (abs) min: 10 max: 54724 x̄: 6118.62 x̃: 1530 HURT stats (rel) min: 0.13% max: 46.45% x̄: 10.28% x̃: 6.46% 95% mean confidence interval for cycles value: -5724.34 -1401.71 95% mean confidence interval for cycles %-change: -9.86% -4.10% Cycles are helped. total spills in shared programs: 12158 -> 10327 (-15.06%) spills in affected programs: 1831 -> 0 helped: 20 HURT: 0 total fills in shared programs: 14749 -> 12635 (-14.33%) fills in affected programs: 2114 -> 0 helped: 20 HURT: 0 LOST: 8 GAINED: 649 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	e6ddee764e	intel/fs: Reset instruction order before re-scheduling The way the current scheduler loop is implemented, each scheduling pass starts with what the previous pass had. This means that, if PRE screwed everything up majorly, PRE_NON_LIFO would have to try to fix it. It also meant that tiny changes to one pass would affect every later pass. Instead, reset the order of the instructions before each scheduling pass. This makes the passes entirely independent of each other. Shader-db results on Ice Lake: total instructions in shared programs: 19670486 -> 19670648 (<.01%) instructions in affected programs: 25317 -> 25479 (0.64%) helped: 2 HURT: 7 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07% HURT stats (abs) min: 8 max: 70 x̄: 24.29 x̃: 12 HURT stats (rel) min: 0.41% max: 4.95% x̄: 1.47% x̃: 0.87% 95% mean confidence interval for instructions value: -1.28 37.28 95% mean confidence interval for instructions %-change: -0.04% 2.30% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 935535948 -> 935490243 (<.01%) cycles in affected programs: 421994824 -> 421949119 (-0.01%) helped: 1269 HURT: 879 helped stats (abs) min: 1 max: 12008 x̄: 259.38 x̃: 52 helped stats (rel) min: <.01% max: 28.02% x̄: 1.12% x̃: 0.14% HURT stats (abs) min: 1 max: 29931 x̄: 322.46 x̃: 20 HURT stats (rel) min: <.01% max: 32.17% x̄: 1.74% x̃: 0.22% 95% mean confidence interval for cycles value: -71.37 28.81 95% mean confidence interval for cycles %-change: -0.11% 0.21% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12403 -> 12430 (0.22%) spills in affected programs: 1355 -> 1382 (1.99%) helped: 2 HURT: 7 total fills in shared programs: 15128 -> 15182 (0.36%) fills in affected programs: 3294 -> 3348 (1.64%) helped: 2 HURT: 7 LOST: 21 GAINED: 28 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	d49d092259	Revert "intel/fs: Do cmod prop again after scheduling" This reverts commit `ba2fa1ceaf`. Doing optimizations after scheduling but before RA means doing them in the middle of the scheduling loop which introduces additional dependencies between one scheduling iteration and the next. That won't work if we want to make the scheduling modes independent, at least not unless we have some way of fully cloning the IR. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	e6f0def97d	intel/eu: Don't double-loop as often in brw_set_uip_jip brw_find_next_block_end() scans through the instructions to find the end of the block. We were calling it for every instruction in the program which is, if you have a single basic block, makes the whole mess a nice clean O(n^2) when it really doesn't need to be. Instead, only call brw_find_next_block_end() as-needed. This brings it back to O(n) like it should have been. This cuts the runtime of the following Vulkan CTS on my SKL box by 5% from 1:51 to 1:45: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	cf98a3cc19	intel/fs: Use OPT() for split_virtual_grfs Now that we're being conservative in the pass, it's easy to tell when it makes progress and we can put it in the OPT() macro. This way, we get nice INTEL_DEBUG=optimizer dumps for it. While we're here, fix the header comment which is massively out-of-date. Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	38fa18a7a3	intel/fs: Be more conservative in split_virtual_grfs Instead of modifying every single instruction, keep track of which VGRFs are actually split in a bit-set, and only modify the instructions that actually touch split regs. This cuts the runtime of the following Vulkan CTS on my SKL box by 45% from 3:21 to 1:51: dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>	2021-12-18 01:46:19 +00:00
Jason Ekstrand	3c89dbdbfe	intel/fs: Implement the sample_pos_or_center system value Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00
Jason Ekstrand	a580fd55e1	intel/fs: Rework emit_samplepos_setup() This rolls compute_sample_position into emit_samplepos_setup, its only caller, by using a loop instead of calling it twice. We also early-return for the !persample_dispatch case instead of doing it as part of the sample calculation. This means that we don't call fetch_payload_reg() to get sample_pos_reg unless we're actually going to use it so the function is safe to call even if we haven't set up sample_pos_reg. This will be important for the next commit. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00
Jason Ekstrand	ac7255ed1e	intel/fs: Return fs_reg directly from builtin setup helpers There's no good reason why we're allocating them on the heap and returning a pointer. Return the fs_reg directly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>	2021-12-17 16:02:16 +00:00

1 2 3 4 5 ...

1976 commits