fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 20:10:14 +01:00

Author	SHA1	Message	Date
Kristian H. Kristensen	8e16fb1528	freedreno/ir3: Implement lowering passes for VS and GS This introduces two new lowering passes. One to lower VS to explicit outputs using STLW and one to lower GS to load input using LDLW and implement the GS specific functionality. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-10-17 13:43:53 -07:00
Kristian H. Kristensen	0324706764	freedreno/ir3: Add intrinsics that map to LDLW/STLW These intrinsics will let us do all the offset calculations in nir, which is nicer to work with and lets nir_opt_algebraic eat it all up. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-10-17 13:43:53 -07:00
Erik Faye-Lund	e8095f2af0	nir: drop unused alpha_ref_float Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-10-17 10:41:36 +02:00
Jason Ekstrand	951cf94521	nir: Add explicit signs to image min/max intrinsics This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-21 17:19:55 +00:00
Marek Olšák	9c7746ceae	compiler: add SYSTEM_VALUE_TESS_LEVEL_OUTER/INNER_DEFAULT TCS system values for internal passthru TCS, needed by radeonsi NIR support Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	1b881852bc	compiler: add SYSTEM_VALUE_USER_DATA_AMD for internal radeonsi shaders	2019-08-12 14:52:17 -04:00
Pierre-Eric Pelloux-Prayer	a9ec718652	nir: add atomic_inc_wrap/atomic_dec_wrap image intrinsics Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:41:02 -04:00
Daniel Schürmann	e272fdd508	nir,intel: lower if (cond) demote() to new intrinsic demote_if(cond) This will effectively enable the optimization in anv. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-24 13:02:18 -05:00
Andreas Baierl	f5804f1768	nir: Add gl_PointCoord system value gl_PointCoord handling needs some special bits set in lima/ppir code generation. Treating gl_PointCoord as a system value makes it easier to distinguish from a regular varying. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Iago Toral Quiroga	50016d7718	nir: add a V3D-specific intrinsic for per-sample color writes For per-sample color writes we need the output intrinsic to pack the sample index, which is not provided with regular store_output intrinsics unless we figured out a way to encode it into the base or the offset. v2: - Drop the writemask (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	b0eec9e27d	nir: add a new v3d-specific intrinsic for tile buffer color reads This is intended to be used, for example, with OpenGL logic operations. It takes a render target as source and a sample index in the base index for MSAA color reads. v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Alyssa Rosenzweig	15000c79da	nir: Add Panfrost-specific blending intrinsic This gives more flexibility than the normal store_deref/store_output versions (particularly, it allows us to abuse the type system in awful ways, which is necessary for efficient format conversion in blend shaders.) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-07-09 14:07:23 -07:00
Caio Marcelo de Oliveira Filho	a42e8f0ed1	nir: Add demote and is_helper_invocation intrinsics From SPV_EXT_demote_to_helper_invocation. Demote will be implemented as a variant of discard, so mark uses_discard if it is used. v2: Add CAN_ELIMINATE flag to the new intrinsic. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 08:57:25 -07:00
Connor Abbott	e5536aa584	compiler: Add color system value This is nice to have with radeonsi, where color varyings are handled specially to avoid recompiles. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:18:34 +02:00
Rob Clark	5787a2dfe3	nir: add pass to lower load_interpolated_input Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Connor Abbott	6f20643b47	nir: Allow qualifiers on copy_deref and image instructions In the next commit, we'll properly handle access qualifiers on struct members by propagating them to load/store instructions, but these instructions had no way to specify the qualifier. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:27 +02:00
Daniel Schürmann	ea51275e07	nir: add intrinsics for AMD_shader_ballot Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Jonathan Marek	c12750527b	nir: add type information to load uniform/input and store output intrinsics This type information will be used by gather_ssa_types to get usable results Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 21:35:26 +00:00
Alyssa Rosenzweig	006cafc243	nir: Add blend_const_color_rgba sysval This represents a float vec4 constant color, as passed to glBlendColor. While the existing 4 shader sysvals are retained to minimize code churn, a single vectorized intrinsic is required for efficient blending on vector architectures. (This may also apply to archictectures like Bifrost where ALU is scalar but load/store is vector; it largely depends on how blending is implemented per-driver.) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:49:28 +00:00
Rob Clark	2f0b9d2249	freedreno/ir3: lower load_barycentric_at_offset Calculates i,j at specified offset within a pixel. A new load_size_ir3 intrinsic is used in conjunction with fddx/fddy to translate the offset into primitive space and adjust the i,j from load_barycentric_pixel accordingly. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	c4f423aa36	freedreno/ir3: lower load_barycentric_at_sample This lowers load_barycentric_at_sample to load_sample_pos_from_id plus load_barycentric_at_offset. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Jason Ekstrand	18ed82b084	nir: Add a pass for selectively lowering variables to scratch space This commit adds new nir_load/store_scratch opcodes which read and write a virtual scratch space. It's up to the back-end to figure out what to do with it and where to put the actual scratch data. v2: Drop const_index comments (by anholt) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-12 15:59:31 -07:00
Eric Anholt	b88ef3bd76	nir: Add a comment about how intrinsic definitions work. I was thinking about a refactor, and needed to read this first. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 15:56:12 -07:00
Eric Anholt	35355b4860	nir: Drop remaining references to const_index in favor of the call to use. Please don't make me read a const_index[] expression ever again. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 15:56:04 -07:00
Eric Anholt	6e4d3d0a2f	nir: Drop comments about the constant_index slots for load/stores. The constant_index slots are named right there in the intrinsic definition, and the comment is just a chance to get out of sync. Noticed while reviewing the lower_to_scratch changes that copy-and-pasted wrong comments, and load_ubo and load_per_vertex_output had incorrect comments currently. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 15:55:55 -07:00
Bas Nieuwenhuizen	282bacab4a	nir: Add access qualifiers on load_ubo intrinsic. Otherwise nir_lower_non_uniform_access crashes when it tries to get the access of a load_ubo. Fixes: `8ed583fe52` "spirv: Handle the NonUniformEXT decoration" Fixes: `e50ab2c0f2` "nir: Add access flags to deref and SSBO atomics" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-10 02:04:04 +02:00
Alyssa Rosenzweig	a83862754e	nir: Add "viewport vector" system values While a partial set of viewport system values exist, these are scalar values, which is a poor fit for viewport transformations on vector ISAs like Midgard (where the vec3 values for scale and offset each need to be coherent in a vec4 uniform slot to take advantage of vectorized transform math). This patch adds vec3 scale/offset fields corresponding to the 3D Gallium viewport / glViewport+depth Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-04 03:44:09 +00:00
Jason Ekstrand	e50ab2c0f2	nir: Add access flags to deref and SSBO atomics We will need them for a new ACCESS_NON_UNIFORM flag that's about to be added in the next commit. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-25 16:12:09 -05:00
Jason Ekstrand	40074ebf74	nir: Add texture sources and intrinsics for bindless On Intel, we have both bindless and bindful and we'd like to use them at the same time if we can so we need to be able to distinguish at the NIR level between the two. This also fixes nir_lower_tex to properly handle bindless in its tex_texture_size and get_texture_lod helpers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-25 16:12:09 -05:00
Karol Herbst	d0ba326f23	nir/spirv: support physical pointers v2: add load_kernel_input Signed-off-by: Karol Herbst <kherbst@redhat.com> squash! nir/spirv: support physical pointers	2019-03-19 04:08:07 +00:00
Jason Ekstrand	3c11fc7654	nir/lower_io: Add a new buffer_array_length intrinsic and lowering Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Eduardo Lima Mitev	6ff50a488a	nir: Add ir3-specific version of most SSBO intrinsics These are ir3 specific versions of SSBO intrinsics that add an extra source to hold the element offset (dword), which is what the backend instructions need. The original byte-offset source provided by NIR is not replaced because on a4xx and a5xx the backend still needs it. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-13 21:19:44 +01:00
Karol Herbst	d0b47ec4df	nir/vtn: add support for SpvBuiltInGlobalLinearId v2: use formula with fewer operations Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-05 22:28:29 +01:00
Karol Herbst	f48c672965	nir: add support for address bit sized system values v2: add assert in else clause make local group intrinsics 32 bit wide v3: always use 32 bit constant for local_size v4: add comment by Jason Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Eric Anholt	2780a99ff8	v3d: Move the stores for fixed function VS output reads into NIR. This lets us emit the VPM_WRITEs directly from nir_intrinsic_store_output() (useful once NIR scheduling is in place so that we can reduce register pressure), and lets future NIR scheduling schedule the math to generate them. Even in the meantime, it looks like this lets NIR DCE some more code and make better decisions. total instructions in shared programs: 6429246 -> 6412976 (-0.25%) total threads in shared programs: 153924 -> 153934 (<.01%) total loops in shared programs: 486 -> 483 (-0.62%) total uniforms in shared programs: 2385436 -> 2388195 (0.12%) Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)	2019-03-05 10:59:40 -08:00
Jason Ekstrand	61e009d2c4	spirv: Use the same types for resource indices as pointers We need more space than just a 32-bit scalar and we have to burn all that space anyway so we may as well expose it to the driver. This also fixes a subtle bug when UBOs and SSBOs have different pointer types. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	e461926ef2	nir: Add load/store/atomic global intrinsics These correspond roughly to reading/writing OpenCL global pointers. The idea is that they just take a bare address and load/store from it. Of course, exactly what this address means is driver-dependent. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-26 13:39:18 -06:00
Karol Herbst	4125211e9c	nir: add legal bit_sizes to intrinsics With OpenCL some system values match the address bits, but in GLSL we also have some system values being 64 bit like subgroup masks. With this it is possible to adjust the builder functions so that depending on the bit_sizes the correct bit_size is used or an additional argument is added in case of multiple possible values. v2: validate dest bit_size v3: generate hex values in python code remove useless imports rename and move bit_sizes v4: add 1 to legal bit_sizes for front_face Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 00:16:51 +01:00
Jason Ekstrand	63b9aa2e25	spirv: Add support for using derefs for UBO/SSBO access For now, it's hidden behind a cap. Hopefully, we can eventually drop that along with all the manual offset code in spirv_to_nir. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	e90b738f20	nir/vulkan: Add a descriptor type to vulkan resource intrinsics Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	013ee5732b	nir/intrinsics: Add access flags to load/store_deref Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	7755171e4c	nir/intrinsics: Allow deref sources to consume anything This commit adds a new num_components value for intrinsic sources of -1 which means that it consumes everything and the number of components effectively isn't validated. This is useful for deref sources which just take the result of the deref and we leave it up to the driver to decide what that size should be. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Matt Turner	017199d2d2	mesa: Revert INTEL_fragment_shader_ordering support This extension is not properly tested (testing for GL_ARB_fragment_shader_interlock is not sufficient), and since this was noted in review on August 28th no tests have been sent. Revert "i965: Add INTEL_fragment_shader_ordering support." Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering" This reverts commit `03ecec9ed2`. This reverts commit `119435c877`. Cc: mesa-stable@lists.freedesktop.org Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Anholt <eric@anholt.net>	2018-12-03 15:37:37 -08:00
Jason Ekstrand	d34fd81e76	nir: Add alignment parameters to SSBO, UBO, and shared access This also changes spirv_to_nir and glsl_to_nir to set them. The one place that doesn't set them is shared memory access lowering in nir_lower_io. That will have to be updated before any consumers of it can effectively use these new alignments. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2018-11-15 19:59:42 -06:00
Samuel Pitoiset	4b74f05f6b	spirv/nir: handle memory access qualifiers for SSBO loads/stores v2: - change how the access qualifiers are accumulated v3: - duplicate members in struct_member_decoration_cb() - handle access qualifiers on variables - remove access qualifiers handling in _vtn_variable_load_store() - fix setting access qualifiers on type->array_element Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net	2018-10-12 08:42:08 +02:00
Jason Ekstrand	09f1de97a7	anv,i965: Lower away image derefs in the driver Previously, the back-end compiler turn image access into magic uniform reads and there was a complex contract between back-end compiler and driver about setting up and filling out those params. As of this commit, both drivers now lower image_deref_load_param_intel intrinsics to load_uniform intrinsics controlled by the driver and lower the other image_deref_* intrinsics to image_* intrinsics which take an actual binding table index. There are still "magic" uniforms but they are now added and controlled entirely by the driver and that contract no longer spans components. This also has the side-effect of making most image use compile-time binding table indices. Previously, all image access pulled the binding table index from a uniform. Part of the reason for this was that the magic uniforms made it difficult to decouple binding table indices from the uniforms and, since they are indexed completely differently (especially in Vulkan), it was hard to pull them apart. Now that the driver is handling both, it's trivial to decouple the two and provide actual binding table indices. Shader-db results on Kaby Lake: total instructions in shared programs: 15166872 -> 15164293 (-0.02%) instructions in affected programs: 115834 -> 113255 (-2.23%) helped: 191 HURT: 0 total cycles in shared programs: 571311495 -> 571196465 (-0.02%) cycles in affected programs: 4757115 -> 4642085 (-2.42%) helped: 73 HURT: 67 total spills in shared programs: 10951 -> 10926 (-0.23%) spills in affected programs: 742 -> 717 (-3.37%) helped: 7 HURT: 0 total fills in shared programs: 22226 -> 22201 (-0.11%) fills in affected programs: 1146 -> 1121 (-2.18%) helped: 7 HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	0de003be03	nir: Add handle/index-based image intrinsics Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	37f7983bcc	intel/compiler: Do image load/store lowering to NIR This commit moves our storage image format conversion codegen into NIR instead of doing it in the back-end. This has the advantage of letting us run it through NIR's optimizer which is pretty effective at shrinking things down. In the common case of rgba8, the number of instructions emitted after NIR is done with it is half of what it was with the lowering happening in the back-end. On the downside, the back-end's lowering is able to directly use predicates and the NIR lowering has to use IFs. Shader-db results on Kaby Lake: total instructions in shared programs: 15166910 -> 15166872 (<.01%) instructions in affected programs: 5895 -> 5857 (-0.64%) helped: 15 HURT: 0 Clearly, we don't have that much image_load_store happening in the shaders in shader-db.... Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	15d39f474b	nir: Make image load/store intrinsics variable-width Instead of requiring 4 components, this allows them to potentially use fewer. Both the SPIR-V and GLSL paths still generate vec4 intrinsics so drivers which assume 4 components should be safe. However, we want to be able to shrink them for i965. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Kevin Rogovin	119435c877	mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering This extension provides new GLSL built-in function beginFragmentShaderOrderingIntel() that guarantees (taking wording of GL_INTEL_fragment_shader_ordering extension) that any memory transactions issued by shader invocations from previous primitives mapped to same xy window coordinates (and same sample when per-sample shading is active), complete and are visible to the shader invocation that called beginFragmentShaderOrderingINTEL(). One advantage of INTEL_fragment_shader_ordering over ARB_fragment_shader_interlock is that it provides a function that operates as a memory barrie (instead of a defining a critcial section) that can be called under arbitary control flow from any function (in contrast the begin/end of ARB_fragment_shader_interlock may only be called once, from main(), under no control flow. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-08-28 17:15:10 +03:00

1 2

65 commits