fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-24 02:20:11 +01:00

Author	SHA1	Message	Date
Dave Airlie	7b46e2a74b	ac/nir: assign argument param pointers in one place. Instead of having the fragile code to do a second pass, just give the pointers you want params in to the initial code, then call a later pass to assign them. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-06-07 06:00:23 +01:00
Dave Airlie	b19cafd441	ac/nir: consolidate setting userdata location Just pass a pointer and increment inside the function, makes the code less error prone. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-06-07 05:59:57 +01:00
Dave Airlie	72f0830ecd	ac/nir: set workgroup size attribute to correct value. This ports: `55445ff189` from radeonsi radeonsi: tell LLVM not to remove s_barrier instructions LLVM 5.0 removes s_barrier instructions if the max-work-group-size attribute is not set. What a surprise. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-06-05 01:37:44 +01:00
Marek Olšák	e019ea8f4b	radeonsi: move building llvm.SI.load.const into ac_build_buffer_load Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-29 01:52:16 +02:00
Jason Ekstrand	b86dba8a0e	nir: Embed the shader_info in the nir_shader again Commit `e1af20f18a` changed the shader_info from being embedded into being just a pointer. The idea was that sharing the shader_info between NIR and GLSL would be easier if it were a pointer pointing to the same shader_info struct. This, however, has caused a few problems: 1) There are many things which generate NIR without GLSL. This means we have to support both NIR shaders which come from GLSL and ones that don't and need to have an info elsewhere. 2) The solution to (1) raises all sorts of ownership issues which have to be resolved with ralloc_parent checks. 3) Ever since `00620782c9`, we've been using nir_gather_info to fill out the final shader_info. Thanks to cloning and the above ownership issues, the nir_shader::info may not point back to the gl_shader anymore and so we have to do a copy of the shader_info from NIR back to GLSL anyway. All of these issues go away if we just embed the shader_info in the nir_shader. There's a little downside of having to copy it back after calling nir_gather_info but, as explained above, we have to do that anyway. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-05-09 15:07:47 -07:00
Marek Olšák	7647e90b15	ac: rename ac_eliminate_const_vs_outputs -> ac_optimize_vs_outputs Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-05-03 20:55:00 +02:00
Dave Airlie	3bf3f9866c	radv/ac: canonicalize the output for 32-bit float min/max. This fixes: dEQP-VK.glsl.builtin.precision.min.* dEQP-VK.glsl.builtin.precision.max.* dEQP-VK.glsl.builtin.precision.clamp.* The problem is the hw doesn't compare denorms properly, so we have to flush them, even though the spec says flushing is optional, if you don't flush the results should be correct. The -pro driver changes the shader float mode, it would be nice if llvm could grow that perhaps. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-05-03 12:55:34 +10:00
Dave Airlie	83e58b036e	radv: flush f32->f16 conversion denormals to zero. (v2) SPIR-V defines the f32->f16 operation as flushing denormals to 0, this compares the class using amd class opcode. Thanks to Matt Arsenault for figuring it out. This fix is VI+ only, add a TODO for SI/CIK. This fixes: dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-05-03 12:55:34 +10:00
Dave Airlie	f205e19e4f	radv/ac: eliminate unused vertex shader outputs. (v2) This is ported from radeonsi, and I can see at least one Talos shader drops an export due to this, and saves some VGPR usage. v2: use shared code. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-27 05:18:52 +01:00
Dave Airlie	7f77554b5b	radv/ac: setup mrt exports then export them in one go. (v2) Noticed while looking at Sascha Willems deferred shaders. This is a bit of an llvm workaround, llvm was producing this: v_cvt_pkrtz_f16_f32_e64 v4, v7, v8 ; D2960004 00021107 v_cvt_pkrtz_f16_f32_e64 v6, v9, 1.0 ; D2960006 0001E509 s_waitcnt vmcnt(0) ; BF8C0F70 exp mrt0 v4, v4, v6, v6 compr ; C400040F 00000604 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e64 v4, v12, v5 ; D2960004 00020B0C v_cvt_pkrtz_f16_f32_e64 v5, v14, 1.0 ; D2960005 0001E50E exp mrt1 v4, v4, v5, v5 compr ; C400041F 00000504 s_waitcnt expcnt(0) ; BF8C0F0F v_cvt_pkrtz_f16_f32_e64 v0, v0, v1 ; D2960000 00020300 v_cvt_pkrtz_f16_f32_e64 v1, v2, v3 ; D2960001 00020702 exp mrt2 v0, v0, v1, v1 done compr vm ; C4001C2F 00000100 After this change: v_cvt_pkrtz_f16_f32_e64 v4, v7, v8 ; D2960004 00021107 s_waitcnt vmcnt(0) ; BF8C0F70 v_cvt_pkrtz_f16_f32_e64 v0, v0, v1 ; D2960000 00020300 v_cvt_pkrtz_f16_f32_e64 v6, v9, 1.0 ; D2960006 0001E509 v_cvt_pkrtz_f16_f32_e64 v5, v12, v5 ; D2960005 00020B0C v_cvt_pkrtz_f16_f32_e64 v7, v14, 1.0 ; D2960007 0001E50E exp mrt0 v4, v4, v6, v6 compr ; C400040F 00000604 v_cvt_pkrtz_f16_f32_e64 v1, v2, v3 ; D2960001 00020702 exp mrt1 v5, v5, v7, v7 compr ; C400041F 00000705 exp mrt2 v0, v0, v1, v1 done compr vm ; C4001C2F 00000100 No waitcnt for exports are emitted. v2: fixup index->mrt mapping (Bas). Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-25 23:26:11 +01:00
Dave Airlie	b2cedb3ea9	radv/ac: overhaul vs output/ps input routing In order to cleanly eliminate exports rewrite the code first to mirror how radeonsi works for now. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-25 23:24:39 +01:00
Dave Airlie	35ea0c07a1	radv/ac: use tex_lz if we can. Looking at some Talos shaders vs radeonsi, I noticed they use tex_lz in a few places, so we should be able to as well. Reviewed-by: Bas Nieuwenhuizen <basni@google.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-20 22:00:13 +01:00
Dave Airlie	60a93e11ba	radv: drop debugging leftovers code in descriptor set patches. Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-19 09:31:14 +10:00
Dave Airlie	25a5ee391d	radv/ac: add support for indirect access of descriptor sets. We want to expose more descriptor sets to the applications, but currently we have a 1:1 mapping between shader descriptor sets and 2 user sgprs, limiting us to 4 per stage. This commit check if we don't have enough user sgprs for the number of bound sets for this shader, we can ask for them to be indirected. Two sgprs are then used to point to a buffer or 64-bit pointers to the number of allocated descriptor sets. All shaders point to the same buffer. We can use some user sgprs to inline one or two descriptor sets in future, but until we have a workload that needs this I don't think we should spend too much time on it. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-19 09:00:43 +10:00
Dave Airlie	d0991b135b	radv: start allocating user sgprs This adds an initial implementation to allocate the user sgprs and make sure we don't run out if we try to bind a bunch of descriptor sets. This can be enhanced further in the future if we add support for inlining push constants. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-19 09:00:43 +10:00
Dave Airlie	0b62669c8d	radv/ac: frag shader only needs ring offsets if sample positions enabled mostly documenting things, since with modern llvm we always have the spill enabled. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-19 09:00:42 +10:00
Dave Airlie	ec4785afb7	radv/ac: move needs_push_constants to shader info. First step to optimising push constants. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-19 09:00:42 +10:00
Dave Airlie	ec15e0d301	radv: optimise compute shader grid size emission. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-19 09:00:42 +10:00
Dave Airlie	31174069d2	radv: start conditionalising vertex inputs. (v2) In practice this will probably just drop draw id in a few places. v2: just do draw_id for now. (Bas) it might be possible to do something more if we need it in the future. (nha) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-19 09:00:42 +10:00
Dave Airlie	224cf2906a	radv/ac: add initial pre-pass for shader info gathering There is some radv specific info we need to gather from shaders before we get into converting nir->llvm, so we can make better decisions especially around user sgpr allocation. This is just an initial placeholder to gather if sample positions are required in the frag shader. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-19 09:00:42 +10:00
Bas Nieuwenhuizen	bd91caf863	radv: Use an offset instead of pointers for immutable samplers. Makes more sense when we hash the layout for the pipeline cache. Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-04-12 07:43:25 +02:00
Dave Airlie	22b116171f	radv: fix interp at sample code. Interp at sample needs to use the center, since the sample positions it retrieves are relative to the center. This fixes a bunch of CTS tests with multisample_interpolation. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-04 05:55:21 +10:00
Dave Airlie	1171b304f3	radv: overhaul fragment shader sample positions. The current code was broken, and I decided to redesign it instead. This puts the sample positions for all samples into the queue constant descriptor buffer after all the spill/ring descriptors. It then uses a single offset register to point how far into the samples the samples for num_samples are. This saves one user sgpr and means we only generate the sample position data in the rare single case where we need it currently. This doesn't fix the failing CTS tests without the followup fix. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-04 05:55:15 +10:00
Dave Airlie	1e9e747d00	radv/ac: fix texture derivative ordering The ordering NIR gives us is correct for the hw, this fixes: dEQP-VK.glsl.texture_functions.texturegrad.* (mainly trigged on isampler/usampler 3d textures.). Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-04 05:39:10 +10:00
Dave Airlie	303d22f319	radv/ac: round cube array coordinate before fixup. This fixes: dEQP-VK.glsl.texture_functions.texture.samplercubearray* Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-04 05:39:07 +10:00
Dave Airlie	5821f676ee	radv: move to using common buffer load format. Get rid of usage of SI.vs.load.input. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-04 05:37:52 +10:00
Dave Airlie	cb1518e96b	radv/ac: setup lds for tessellation This seems to get lost in the rebases, should fix the tessellation demos, crash in llvm. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:17:15 +10:00
Dave Airlie	aaabdd6bc6	radv/ac: handle writing out tess factors. This ports the code from radeonsi to build the if/endif, and ports the tess factor emission code. This code has an optimisation TODO that we can deal with later. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:16:47 +10:00
Dave Airlie	94f9591995	radv/ac: add support for TCS/TES inputs/outputs. This adds support for the tessellation inputs/outputs to the shader compiler, this is one of the main pieces of the patch. It is very similiar to the radeonsi code (post merge we should consider if there are better sharing opportunities). The main differences from radeonsi, is that we can have "compact" varyings for clip/cull/tess factors, and we have to add special handling for these. This consists of treating the const index from the deref different depending on the compactness. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:16:42 +10:00
Dave Airlie	5ab1289b48	radv/ac: add clip support for tess eval shader. As this may be the last shader to emit clip distances. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:16:37 +10:00
Dave Airlie	326b9bc6dc	radv/ac: hook up tessellation intrinsics. This just adds support for the nir intrinsics that tessellation uses. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:16:32 +10:00
Dave Airlie	d8ab71b207	radv/ac: hook up shader information handling for tessellation This hooks up the tessellation shader info to the nir values and ctx generated ones. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:16:27 +10:00
Dave Airlie	5b40eab00a	radv: add tess ctrl stage barrier workaround for SI. This just ports the workaround from radeonsi. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:16:04 +10:00
Dave Airlie	3a633cc2cb	radv/ac: add support for patch inputs to unique index code. This add support for tessellation patch inputs to the code that finds the unique parameter index. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:15:57 +10:00
Dave Airlie	60326a7afc	radv/ac: setup tessellation shader inputs. This just configures all the register inputs for the tessellation related stages. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:15:41 +10:00
Dave Airlie	3968162751	radv/ac: setup tess rings on compiler side. This just sets up the necessary pointers on the compiler side for the rings needed for tessellation. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:15:35 +10:00
Dave Airlie	a5136a97f7	radv: use defines for ring descriptor offsets. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:15:12 +10:00
Dave Airlie	97e0ff30c0	radv: handle clip dist in es outputs. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:14:53 +10:00
Dave Airlie	6279646306	radv: drop unneeded start Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:14:39 +10:00
Dave Airlie	a58d03a5a2	radv: fixup geometry clip emission since using the geom pass Fixes: `2b35b60d`: radv: move to using nir clip/cull merge pass. Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-04-01 07:14:38 +10:00
Marek Olšák	00e777b61c	amd: add texture format definitions for GFX9 the DATA_FORMAT and NUM_FORMAT fields are the same, but some of the enums differ, thus add GFX6 and GFX9 suffixes, so that the IB parser can show enums for both. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-03-30 14:44:33 +02:00
Dave Airlie	a930c2c612	radv: fix mask attribs properly. some days it just doesn't pay to get out of bed. Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-30 13:09:30 +10:00
Dave Airlie	aa27a9f687	radv: fix regression with mask attrib setting code. Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-30 12:07:32 +10:00
Dave Airlie	2b35b60df1	radv: move to using nir clip/cull merge pass. Doing this before tessellation makes doing some bits of tessellation a bit cleaner. It also cleans up a bit of the llvm generator code. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-30 11:04:56 +10:00
Dave Airlie	d43691ce77	radv: add parameter to emit_waitcnt. This is just a precursor for tess support, which needs to pass different values here. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-28 17:40:03 +10:00
Dave Airlie	931a8d0c9a	radv: rework vertex/export shader output handling In order to faciliate adding tess support, split the vs/es output info into a separate block, so we make it easier to have the tess shaders export the same info. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-28 17:39:59 +10:00
Alex Smith	ce4058dafd	radv/ac: Fix shared memory offset calculation The index passed to get_shared_memory_ptr is an attribute slot index, i.e. the index of a vec4 within LDS. Therefore this must be scaled by sizeof(vec4) to give the LDS byte offset. Fixes: `f4e499ec79` ("radv: add initial non-conformant radv vulkan driver") Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> CC: <mesa-stable@lists.freedesktop.org>	2017-03-17 09:35:48 +01:00
Dave Airlie	3ece76f03d	radv/ac: gather4 cube workaround integer This fix is extracted from amdgpu-pro shader traces. It appears the gather4 workaround for integer types doesn't work for cubes, so instead if forces a float scaled sample, then converts to integer. It modifies the descriptor before calling the gather. This also produces some ugly asm code for reasons specified in the patch, llvm could probably do better than dumping sgprs to vgprs. This fixes: dEQP-VK.glsl.texture_gather.basic.cube.rgba8* Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-15 09:51:53 +10:00
Jason Ekstrand	762a6333f2	nir: Rework conversion opcodes The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <eric@anholt.net>	2017-03-14 07:36:40 -07:00
Dave Airlie	b8ee70384a	radv: setup llvm target data layout Ported from radeonsi, pointed out by Tom. "This prevents LLVM from using sext instructions for local memory offsets and allows the backend to fold immediate offsets into the instruction. This also prevents some incorrect code generation for ptrtoint and inttoptr instructions." Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Tom Stellard <tstellar@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-14 10:33:59 +10:00

1 2 3 4

174 commits