fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-17 22:38:06 +02:00

Author	SHA1	Message	Date
Eric Anholt	3c08ecf147	v3d: Whitespace consistency fix.	2019-02-05 15:46:42 -08:00
Eric Anholt	940501a446	v3d: Fix copy-propagation of input unpacks. I had a single function for "does this do float input unpacking" with two major flaws: It was missing the most common thing to try to copy propagate a f32 input nunpack to (the VFPACK to an FP16 render target) along with several other ALU ops, and also would try to propagate an f32 unpack into a VFMUL which only does f16 unpacks. instructions in affected programs: 659232 -> 655895 (-0.51%) uniforms in affected programs: 132613 -> 135336 (2.05%) and a couple of programs increase their thread counts. The uniforms hit appears to be a pattern in generated code of doing (-a >= a) comparisons, which when a is abs(b) can result in the abs instruction being copy propagated once but not fully DCEed.	2019-02-05 15:46:04 -08:00
Eric Anholt	d0fdbd4211	v3d: Fix dumping of shaders with alpha test. We were trying to print a NULL entry from the table.	2019-02-05 15:42:14 -08:00
Eric Anholt	bdef17b052	v3d: Store the actual mask of color buffers present in the key. If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).	2019-02-05 15:42:04 -08:00
Eric Anholt	ab4d5775b0	v3d: Fix image_load_store clamping of signed integer stores. This was copy-and-paste fail, that oddly showed up in the CTS's reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i to rgba8i or reinterprets to other signed int formats. Fixes: `6281f26f06` ("v3d: Add support for shader_image_load_store.")	2019-01-31 08:39:40 -08:00
Eric Anholt	f7769b5121	v3d: Fix the autotools build. Noticed while looking at the gitlab-CI MR.	2019-01-29 14:00:27 -08:00
Eric Anholt	3e743d8cd8	v3d: Avoid duplicating limits defines between gallium and v3d core. We don't want to pull the compiler into every include in the gallium driver, so just make a new little header to store the limits.	2019-01-27 08:30:03 -08:00
Eric Anholt	fe6a21c867	v3d: Fix overly-large vattr_sizes structs. We want one vector size per vector, not per component.	2019-01-27 08:30:03 -08:00
Eric Anholt	f72820c851	v3d: Add support for CS barrier() intrinsics.	2019-01-14 15:40:55 -08:00
Eric Anholt	9b45b06d7c	v3d: Add support for CS shared variable load/store/atomics. CS shared variables are handled effectively as SSBO access to a temporary buffer that will be allocated at CS dispatch time.	2019-01-14 15:40:55 -08:00
Eric Anholt	01d913cf90	v3d: Add support for CS workgroup/invocation id intrinsics. We get a payload for the ivec3 workgroup and an int local invocation index, and we use the core lowering to turn into the global invocation id and the local invocation id ivec3s.	2019-01-14 15:40:55 -08:00
Eric Anholt	6281f26f06	v3d: Add support for shader_image_load_store. This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).	2019-01-14 15:40:55 -08:00
Eric Anholt	5932c2f0b9	v3d: Add SSBO/atomic counters support. So far I assume that all the buffers get written. If they weren't, you'd probably be using UBOs instead.	2019-01-14 15:40:55 -08:00
Eric Anholt	1a63227ea0	v3d: Add support for matrix inputs to the FS. We've been relying on linking splitting up our varying matrices into separate vectors, but with SSO that doesn't happen. Supporting matrix inputs isn't too hard, though.	2019-01-14 13:18:02 -08:00
Eric Anholt	3790ee07e6	v3d: Fix txf_ms 2D_ARRAY array index. We need to pass the array index through our coordinate transform unchanged. Fixes dEQP-GLES31.functional.texture.multisample.samples_1.*_2d_array	2019-01-14 13:18:02 -08:00
Eric Anholt	051a41d3d5	v3d: Add support for the early_fragment_tests flag. If this flag hasn't been set by the shader and it has some visible side effects, then we need to disable EZ.	2019-01-14 13:18:02 -08:00
Eric Anholt	6051c11d17	nir: Add nir_lower_tex support for Broadcom's swizzled TG4 results. V3D returns the texels in a different order in the resulting vec4 from what GLSL wants, so we need to put in a swizzle. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 13:03:41 -08:00
Eric Anholt	8847370424	v3d: Use the core tex lowering. Even without any clever optimization on the unpack operations, this gives us a useful value for the channels read field, which we can use to avoid ldtmu instructions to the no-op register. instructions in affected programs: 890712 -> 881974 (-0.98%)	2019-01-04 15:59:59 -08:00
Eric Anholt	81b9361b68	v3d: Stop scalarizing our uniform loads. We can pull a whole vector in a single indirect load. This saves a bunch of round-trips to the TMU, instructions for setting up multiple loads, references to the UBO base in the uniforms, and apparently manages to reduce register pressure as well. instructions in affected programs: 3086665 -> 2454967 (-20.47%) uniforms in affected programs: 919581 -> 721039 (-21.59%) threads in affected programs: 1710 -> 3420 (100.00%) spills in affected programs: 596 -> 522 (-12.42%) fills in affected programs: 680 -> 562 (-17.35%) Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)	2019-01-04 15:41:23 -08:00
Eric Anholt	f8a8de8b9a	v3d: Do UBO loads a vector at a time. In the process of adding support for SSBOs and CS shared vars, I ended up needing a helper function for doing TMU general ops. This helper can be that starting point, and saves us a bunch of round-trips to the TMU by loading a vector at a time.	2019-01-04 15:41:23 -08:00
Eric Anholt	b0e0086257	v3d: Remove dead switch cases and comments from v3d_nir_lower_io. Moving things to NIR left this mess around. All we lower now is uniforms.	2019-01-04 15:41:23 -08:00
Eric Anholt	e1385e879d	v3d: Reinstate the new shader-db output after v3d_compile() refactor. I misplaced it in the rebase conflicts.	2019-01-04 15:26:19 -08:00
Eric Anholt	d2b899c0ec	v3d: Refactor compiler entrypoints. Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.	2019-01-02 14:12:29 -08:00
Eric Anholt	0805060573	v3d: Handle dynamically uniform IF statements with uniform control flow. Loops will be trickier, since we need some analysis to figure out if the breaks/continues inside are uniform. Until we get that in NIR, this gets us some quick wins. total instructions in shared programs: 6192844 -> 6174162 (-0.30%) instructions in affected programs: 487781 -> 469099 (-3.83%)	2019-01-02 14:12:29 -08:00
Eric Anholt	5e9ee6e841	v3d: Fold comparisons for IF conditions into the flags for the IF. total instructions in shared programs: 6193810 -> 6192844 (-0.02%) instructions in affected programs: 800373 -> 799407 (-0.12%)	2019-01-02 14:12:29 -08:00
Eric Anholt	078dc176bc	v3d: Don't try to fold non-SSA-src comparisons into bcsels. There could have been a write of a src in between the comparison and the bcsel that would invalidate the comparison.	2019-01-02 14:12:29 -08:00
Eric Anholt	2e0433b687	v3d: Move the "Find the ALU instruction generating our bool" out of bcsel. This will be reused for if statements.	2019-01-02 14:12:29 -08:00
Eric Anholt	c3ae0aa264	v3d: Simplify the emission of comparisons for the bcsel optimization. I wanted to reuse the comparison stuff for nir_ifs, but for that I just want the flags and no destination value. Splitting the conditions from the destinations ended up cleaning the existing code up, anyway.	2019-01-02 14:12:29 -08:00
Eric Anholt	ad1e59cf8d	v3d: Add support for gl_HelperInvocation. We can just look at the MSF flags -- if they're unset, then we're definitely in a helper invocation. Fixes dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.	2018-12-30 08:05:11 -08:00
Eric Anholt	20021e3473	v3d: Add support for textureSize() on MSAA textures. Fixes failures in dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d in the GLES3.1 suite.	2018-12-30 08:05:11 -08:00
Eric Anholt	906fca1b4b	v3d: Add support for non-constant texture offsets. Fixes dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat and others.	2018-12-30 08:05:11 -08:00
Eric Anholt	47caefc7b4	v3d: Force sampling from base level for tg4. This is what the GLSL ES 310 spec tells us to do, but apparently the "gather mode" flag doesn't imply it in the HW. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear	2018-12-30 08:05:11 -08:00
Eric Anholt	f9bdce9966	v3d: Add a note for a potential performance win on multop/umul24. Noticed while debugging a testcase.	2018-12-30 08:05:11 -08:00
Eric Anholt	b36757448d	v3d: Dead-code eliminate unused flags updates. The greedy comparison folding in bcsel means that we may have left the original bool-generating NIR ALU instruction dead, but DCE wasn't eliminating the VIR code for it because of the flags updates. total instructions in shared programs: 5186024 -> 5100894 (-1.64%) instructions in affected programs: 1448695 -> 1363565 (-5.88%)	2018-12-30 08:05:11 -08:00
Eric Anholt	20e3526298	v3d: Don't generate temps for comparisons. This was just generated work for vir_opt_dead_code and cluttered up the dumps.	2018-12-30 08:04:54 -08:00
Eric Anholt	ebde5afb93	v3d: Move "does this instruction have flags" from sched to generic helpers. I wanted to reuse it for DCE of flags updates.	2018-12-30 08:03:51 -08:00
Eric Anholt	39b1112189	v3d: Drop incorrect dependency for flpop. It is just shifting probably-means-flags bits out of a value, it doesn't actually update the flags on its own.	2018-12-30 08:03:51 -08:00
Eric Anholt	a7c9fd7573	v3d: Drop unused count_nir_instrs() helper. This was for shader-db, but I haven't cared about NIR instruction counts in a long time.	2018-12-30 08:03:51 -08:00
Eric Anholt	696f63f1b4	v3d: Hook up some shader-db output to GL_ARB_debug_output. This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.	2018-12-30 08:03:51 -08:00
Eric Anholt	9ec6a3d621	v3d: Fix uniform pretty printing assertion failure with branches. Fixes: `248a7fb392` ("v3d: Do uniform pretty-printing in the QPU dump.")	2018-12-29 13:52:09 -08:00
Eric Anholt	d80761b8f3	v3d: Drop shadow comparison state from shader variant key. The shadow state is now in the sampler.	2018-12-20 11:29:30 -08:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Eric Anholt	00e2cbc049	v3d: Fix the argument type for vir_BRANCH(). Apparently this has been spewing warnings for Jason's clang, but not my gcc.	2018-12-17 09:52:23 -08:00
Jason Ekstrand	11dc130779	nir: Add a bool to int32 lowering pass We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Eric Anholt	2977c77758	v3d: Use the original bit size when scalarizing uniform loads. Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 21:03:01 +00:00
Eric Anholt	29927e7524	v3d: Drop in a bunch of notes about performance improvement opportunities. These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.	2018-12-14 17:48:01 -08:00
Eric Anholt	248a7fb392	v3d: Do uniform pretty-printing in the QPU dump. If you're trying to trace what's going on in a QPU dump, this will definitely help you find your way.	2018-12-14 17:48:01 -08:00
Eric Anholt	532b6c5671	v3d: Move uniform pretty-printing to its own helper function. I want to reuse it in the QPU dump.	2018-12-14 17:48:01 -08:00

1 2 3 4

175 commits