fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 07:18:06 +02:00

Author	SHA1	Message	Date
Alejandro Piñeiro	2e22879115	v3d: refactor some code from v3d40_vir_emit_image_load_store And moved to new auxiliar method v3d40_image_load_store_tmu_op, equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:49:29 +02:00
Alejandro Piñeiro	934ce48db8	v3d: use inc/dec tmu operation with atomic sub/add of 1 Among other things, this avoid the need of loading 1/-1 constants (so one less operation). The removed comment suggest the option of adding support on NIR for inc/dec. Intel just uses an auxiliar method to get which hw operation is needed, so no lowering is needed. And at the same time, being so small, seems unreasonable to try to add a general one on NIR itself. It is more easy to just adapt the method here (that is what the patch does right now). It is worth to note that we are not getting any change on shader-db stats because all those methods are used on the usual shader-db set with shaders needing GLSL > 4.2. In general there aren't too many GLSL ES 3.1 tests. As an alternative, we captured the GLES3/GLSL31/GLS32 used on vk-gl-cts, even if that is not a real life usage of shaders. With those we get the following: total instructions in shared programs: 1217022 -> 1217013 (<.01%) instructions in affected programs: 117 -> 108 (-7.69%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1 helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09% 95% mean confidence interval for instructions value: -2.07 -0.93 95% mean confidence interval for instructions %-change: -10.54% -5.64% Instructions are helped. Note that the shaders helped are really low because most of the vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute shaders. Although right now there is a branch around with CS support, the usual is doing the stats against master. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:48:40 +02:00
Alejandro Piñeiro	3912a32a79	v3d: remove redefinition of tmu operations on nir_to_vir They are already defined, although is a slightly different format on the generated packet headers, so it was needed to change how it is used on nir_to_vir. In addition to allow to remove some duplicated headers, it will allow to define just one get_op_for_atomic_add aux method later to support using inc/dec instead of add of 1/-1. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:48:17 +02:00
Alejandro Piñeiro	c2ff38d2df	v3d: tweak initial comment on pack generator script As the files it mentions to use as reference has slightly different names. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:48:09 +02:00
Iago Toral Quiroga	10d50f2904	v3d: remove unused definitions Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	8e50a9f6cf	v3d: move implementation of some intrinsics to separate helpers Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	d69184204e	v3d: emit correct lowering for logic ops with RGB10A2 render targets Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	7bf3676845	v3d: emit correct lowering for logic ops with integer render targets Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	e540775f0c	v3d: add lowering for OpenGL logic operations This implements support for OpenGL logic operations by emitting code to read from the TLB if needed and blending the fragment output accordingly. It is similar to VC4's blend lowering pass, but exclusive to logic operations, since blending is otherwise supported in hardware. The pass doesn't handle MSAA targets yet. Fixes the following piglit tests: spec/!opengl 1.0/gl-1.0-logicop/* spec/!opengl 1.1/gl-1.1-xor spec/!opengl 1.1/gl-1.1-xor-copypixels It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which is rendered via glamor using the XOR logic operation. v2: fix checks for allowed variable location and maximum render target (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	7c1d708911	v3d: acquire scoreboard lock before first tlb read Until now we have always been emitting our scoreboard locks on the last thread switch to improve parallelism. We did this by emitting our last thread switch right before our tlb writes at the very end of the program, where we know that we are outside control flow. Unfortunately, this strategy is not valid when we have tlb color reads too, as these will happen before this point in the program and can happen inside control flow. To fix this we always emit a thread switch before the first tlb load and if we see additional thread switches after that point, we change the strategy to lock on the first thread switch. v2: change the solution so it is expected to work in more scenarios (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	47d7c80dc7	v3d: implement tile buffer color read intrinsic We will be emitting this intrinsic to signal TLB color loads when we implement OpenGL logic operations, where we need to blend the fragment shader color output with the existing color in the render target. Per-sample TLB reads are not supported yet. v2: fix the offset into the color_reads array (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	6af1bdefa9	v3d: fix size of color_reads and sample_colors arrays We need to scale the size of these arrays to consider up to V3D_MAX_DRAW_BUFFERS render targets and 4 components per color. v2: we want to store each color component separately, so scale by 4 too. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	0279ac6e51	v3d: add color formats and swizzles to the fragment shader key We are going to need these very soon to emit correct reads from the tlb to implement logic operations. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	d26b35ba44	v3d: add helpers to emit ldtlb and ldtlbu signals The ldtlbu version will read an implicit uniform with the TLB read specifier and should be used for the first read in a sequence of TLB reads (unless the default configuration is valid, in which case we can use ldtlb). The ldtlb version is used for any subsequent TLB read in the sequence. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	aff8885cf9	v3d: handle tlb read dependency tracking as if they were writes Tile buffer reads are emitted as ordered sequences and cannot be reordered. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	4793e2c888	v3d: instructions with the ldtlb and ldtlbu signals are tlb instructions Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	83a66e10de	v3d: tlb loads cannot be removed Loads from the tile buffer are emitted in ordered sequences so we cannot eliminate or reorder any of them. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	08f4dc3adc	v3d: the ldtlbu signal reads an implicit uniform Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	271bc8acfb	v3d: handle ldtlb and ldtlbu signals during disassembly We already have code to print these signals but the early return in the code that checks if any signals are present present was missing the checks for them, so it would skip printing them unless they were paired with other signals. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Sagar Ghuge	456557a837	nir: Add lower_rotate flag and set to true in all drivers Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Daniel Schürmann	165b7f3a44	nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec. That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Iago Toral Quiroga	79a30543ee	v3d: implement simultaneous peripheral access exceptions for V3D 4.1+ Shader-db results: total instructions in shared programs: 9117550 -> 9102719 (-0.16%) instructions in affected programs: 1752873 -> 1738042 (-0.85%) helped: 7076 HURT: 478 helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2 helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07% HURT stats (abs) min: 1 max: 7 x̄: 1.41 x̃: 1 HURT stats (rel) min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54% 95% mean confidence interval for instructions value: -2.00 -1.92 95% mean confidence interval for instructions %-change: -1.58% -1.50% Instructions are helped. total max-temps in shared programs: 1327774 -> 1327728 (<.01%) max-temps in affected programs: 1025 -> 979 (-4.49%) helped: 47 HURT: 2 helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1 helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17% 95% mean confidence interval for max-temps value: -1.06 -0.82 95% mean confidence interval for max-temps %-change: -8.89% -5.49% Max-temps are helped. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-18 08:09:03 +02:00
Iago Toral Quiroga	360b832c58	v3d: do not setup execute flags for else block in uniform control flow Either all channels executed the 'then' block, in which case all channels will directly jump to the 'endif' block at the end of the 'then' block, or all channels execute the 'else' block (so no execution masking is necessary). Shader-db results: total instructions in shared programs: 9119238 -> 9117550 (-0.02%) instructions in affected programs: 401252 -> 399564 (-0.42%) helped: 855 HURT: 77 total uniforms in shared programs: 3022622 -> 3022605 (<.01%) uniforms in affected programs: 3566 -> 3549 (-0.48%) helped: 17 HURT: 0 total max-temps in shared programs: 1327762 -> 1327774 (<.01%) max-temps in affected programs: 619 -> 631 (1.94%) helped: 2 HURT: 15 Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-14 08:00:52 +02:00
Alejandro Piñeiro	17c2c9cd67	v3d: fix checking twice auf flag Seems a C&P error, and should check for auf/muf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110902 Fixes: `8f065596d2` "v3d: Add an optimization pass for redundant flags updates." Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-13 11:45:18 +02:00
Iago Toral Quiroga	9b96ae69bc	v3d: don't emit point coordinates varyings if the FS doesn't read them We still need to emit them in V3D 3.x since there there is no mechanism to disable them. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:29:42 +02:00
Iago Toral Quiroga	5e26e55e72	v3d: add a helper to track variables that need point coordinates Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:26:52 +02:00
Iago Toral Quiroga	09d230c6cf	v3d: fix scheduling dependency tracking for ALU with small immediates We were not accountint for small immediates in the B mux so the scheduler was interpreting these are regular register file accesses, which could lead to additional (incorrect) write-read dependencies. Shader-db changes: total instructions in shared programs: 9163664 -> 9137263 (-0.29%) instructions in affected programs: 3931035 -> 3904634 (-0.67%) helped: 12457 HURT: 2563 total max-temps in shared programs: 1325787 -> 1325597 (-0.01%) max-temps in affected programs: 5746 -> 5556 (-3.31%) helped: 186 HURT: 16 helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1 helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88% 95% mean confidence interval for max-temps value: -1.04 -0.84 95% mean confidence interval for max-temps %-change: -4.16% -3.07% Max-temps are helped. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-06 08:16:43 +02:00
Kenneth Graunke	b0e3bd79dc	v3d: Enable NIR's lower_fmod option. Currently, st/mesa is always calling the GLSL IR lower_instructions() pass with MOD_TO_FLOOR set, so mod operations will be lowered before ever reaching NIR. This enables the same lowering at the NIR level, which will let me shut off the GLSL IR path for NIR-based drivers. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Eric Anholt <eric@anholt.net>	2019-06-05 16:45:12 -07:00
Jason Ekstrand	f2dc0f2872	nir: Drop imov/fmov in favor of one mov instruction The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Rob Clark <robdclark@chromium.org>	2019-05-24 08:38:11 -05:00
Jonathan Marek	d0bff89159	nir: allow specifying a set of opcodes in lower_alu_to_scalar This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:10:41 +00:00
Ian Romanick	1f1007a4ed	nir: Initialize lower_flrp_progress everywhere I don't know why I thought NIR_PASS always set the progress variable. Derp. Fixes: `d41cdef2a5` ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Coverity CID: 1444996 Coverity CID: 1444995 Coverity CID: 1444994 Coverity CID: 1444993 Coverity CID: 1444991 Coverity CID: 1444989	2019-05-09 10:03:51 -07:00
Ian Romanick	d41cdef2a5	nir: Use the flrp lowering pass instead of nir_opt_algebraic I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Christian Gmeiner	4e110eca42	nir: nir_shader_compiler_options: drop native_integers Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 07:35:52 +02:00
Eric Engestrom	7ca8ba199f	delete autotools .gitignore files One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-29 21:17:19 +00:00
Eric Anholt	fb0611df3d	v3d: Fix detection of TMU write sequences in register spilling. We can't use the QPU functions to detect this until register allocation is done and we've moved inst->dst into inst->qpu. Fixes bad TMU sequences from register spilling in KHR-GLES31.core.compute_shader.shared-max.	2019-04-26 12:42:30 -07:00
Eric Anholt	18894a5e5a	v3d: Fix detection of the last ldtmu before a new TMU op. We were looking at the start instruction, instead of scanning through the list of following instructions to find any more ldtmus.	2019-04-26 12:42:30 -07:00
Eric Anholt	575caab895	v3d: Re-add support for memory_barrier_shared. Looks like I lost it in a rebase conflict resolution. We'd hit the unknown intrinsic assertion in KHR-GLES31.core.compute_shader.shared-struct. Fixes: `6b1c659825` ("v3d: Add Compute Shader compilation support.")	2019-04-26 12:42:30 -07:00
Eric Anholt	4358904c06	v3d: Add a note about i/o indirection for future performance work.	2019-04-26 12:42:30 -07:00
Eric Anholt	24587ae8ae	v3d: Assert that we do request the normal texturing return data. An unused tex should be DCEed, but if it wasn't we'd run into trouble with not doing a TMUWT.	2019-04-26 12:42:30 -07:00
Eric Anholt	12f6c34806	v3d: Fix atomic cmpxchg in shaders on hardware. In what might be my first case of finding a divergence between hardware and simpenrose for v3d 4.x, it seems that despite what the spec claims, you actually need specific values in the TYPE field for atomic ops. Fixes dEQP-GLES31.functional..compswap.	2019-04-18 13:24:55 -07:00
Eric Anholt	1ce143ca19	v3d: Fix an invalid reuse of flags generation from before a thrsw. Noticed while debugging the last GLES 3.1 failure, though it doesn't seem to affect that bug.	2019-04-18 13:24:55 -07:00
Eric Anholt	697e2e1f26	v3d: Always set up the qregs for CSD payload. We were failing to set up payload[1] for use by LocalInvocationIndex/ID and shared variable accesses if gl_WorkGroupID/gl_GlobalInvocationID wasn't used (possibly because you only have one workgroup). You're always going to use payload[1], and payload[0] is common enough and we have DCE in the backend to clean it up if it happens to not be used.	2019-04-16 12:10:39 -07:00
Eric Anholt	1bc71e8b65	v3d: Only look up the 3rd texture gather offset for non-arrays. Fixes assertion failures in the CTS since Karol's cleanup when NIR started noticing that we were reading an invalid component. Fixes: `5450f1c9fb` ("v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value")	2019-04-16 12:07:59 -07:00
Dylan Baker	95aefc94a9	Delete autotools Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Matt Turner <mattst88@gmail.com>	2019-04-15 13:44:29 -07:00
Karol Herbst	14531d676b	nir: make nir_const_value scalar v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-14 22:25:56 +02:00
Eric Anholt	dc402be73e	v3d: Use the new lower_to_scratch implementation for indirects on temps. We can use the same register spilling infrastructure for our loads/stores of indirect access of temp variables, instead of doing an if ladder. Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db. Also causes several other KSP shaders with large bodies and large loop counts to not be force-unrolled. The change was originally motivated by NOLTIS slightly modifying register pressure in piglit temp mat4 array read/write tests, triggering register allocation failures.	2019-04-12 16:16:58 -07:00
Eric Anholt	8a2d91e124	v3d: Detect the correct number of QPUs and use it to fix the spill size. We were missing a * 4 even if the particular hardware matched our assumption.	2019-04-12 15:59:31 -07:00
Eric Anholt	11ba8a46e4	v3d: Add missing dumping for the spill offset/size uniforms.	2019-04-12 15:59:31 -07:00
Eric Anholt	42cf57f186	v3d: Add missing base offset to CS shared memory accesses. This code is so touchy, trying to emit the minimum amount of address math. Some day we'll move it all to NIR, I hope.	2019-04-12 15:59:31 -07:00
Eric Anholt	6b1c659825	v3d: Add Compute Shader compilation support. While waiting for the CSD UABI to get reviewed, I keep having to rebase the CS patch. Just land the compiler side for now to keep it from diverging. For now this covers just GLES 3.1 compute shaders, not CL kernels.	2019-04-12 15:59:31 -07:00

... 62 63 64 65 66 ...

3596 commits