fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-28 01:18:15 +02:00

Author	SHA1	Message	Date
Eric Anholt	078dc176bc	v3d: Don't try to fold non-SSA-src comparisons into bcsels. There could have been a write of a src in between the comparison and the bcsel that would invalidate the comparison.	2019-01-02 14:12:29 -08:00
Eric Anholt	2e0433b687	v3d: Move the "Find the ALU instruction generating our bool" out of bcsel. This will be reused for if statements.	2019-01-02 14:12:29 -08:00
Eric Anholt	c3ae0aa264	v3d: Simplify the emission of comparisons for the bcsel optimization. I wanted to reuse the comparison stuff for nir_ifs, but for that I just want the flags and no destination value. Splitting the conditions from the destinations ended up cleaning the existing code up, anyway.	2019-01-02 14:12:29 -08:00
Eric Anholt	ad1e59cf8d	v3d: Add support for gl_HelperInvocation. We can just look at the MSF flags -- if they're unset, then we're definitely in a helper invocation. Fixes dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.	2018-12-30 08:05:11 -08:00
Eric Anholt	20021e3473	v3d: Add support for textureSize() on MSAA textures. Fixes failures in dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d in the GLES3.1 suite.	2018-12-30 08:05:11 -08:00
Eric Anholt	20e3526298	v3d: Don't generate temps for comparisons. This was just generated work for vir_opt_dead_code and cluttered up the dumps.	2018-12-30 08:04:54 -08:00
Eric Anholt	a7c9fd7573	v3d: Drop unused count_nir_instrs() helper. This was for shader-db, but I haven't cared about NIR instruction counts in a long time.	2018-12-30 08:03:51 -08:00
Eric Anholt	696f63f1b4	v3d: Hook up some shader-db output to GL_ARB_debug_output. This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.	2018-12-30 08:03:51 -08:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Eric Anholt	29927e7524	v3d: Drop in a bunch of notes about performance improvement opportunities. These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.	2018-12-14 17:48:01 -08:00
Eric Anholt	acecee4c2d	v3d: Return the right gl_SampleMaskIn[] value. It's supposed to be the dispatched sample mask for this pixel, not the GL state's sample mask.	2018-12-07 16:48:23 -08:00
Eric Anholt	ca0e4ae4bc	v3d: Convert to using nir_src_as_uint() from const_value derefs. Follows `16870de8a0` ("nir: Use nir_src_is_const and nir_src_as_* in core code") to clean up v3d.	2018-12-07 16:48:23 -08:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00
Eric Anholt	4e1b163eed	v3d: Update the TLB config for depth writes on V3D 4.2. Fixes 311 piglit cases on the simulator.	2018-11-01 13:56:30 -07:00
Eric Anholt	c152c79d5e	v3d: Only add output slot tracking for the current varying slot. We always emit 4 slots per slot because things like color output and position processing in the epilogue will potentially look up more values than the variable declaration had. However, when we get a .location_frac != 0, we don't want to overwrite components of the following .driver_location.	2018-10-30 10:46:52 -07:00
Eric Anholt	fc85f7cfdc	v3d: Don't rely on sorting input vars for VPM read setup. For supporting scalar VPM i/o at the NIR level, we need to do a pass over the vars to figure out how big each attribute is after DCE. Once we've done that, we can just walk over c->vattr_sizes[] instead of bothering with vars.	2018-10-30 10:46:52 -07:00
Eric Anholt	cc78676030	v3d: Split out NIR input setup between FS and VPM. They don't share much code, and I'm about to rewrite the remaining shared code for the VPM case.	2018-10-30 10:46:52 -07:00
Eric Anholt	8ec83dc51e	v3d: Add support for hardware pack/unpack of half floats. Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.	2018-10-15 17:16:44 -07:00
Eric Anholt	d934d3206e	nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform. This is controlled by a new nir_shader_compiler_options flag, and fixes dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-26 11:00:34 -07:00
Eric Anholt	e7ae900341	v3d: Switch to using the new SFU instructions on V3D 4.x. These instructions let us write directly to the phys regfile, instead of just R4. That lets us avoid moving out of R4 to avoid conflicting with other SFU results, and to avoid conflicting with thread switches. There is still an extra instruction of latency, which is not represented in the scheduler at the moment. If you use the result before it's ready, the QPU will just stall, unlike the magic R4 mode where you'd read the previous value. That means that the following shader-db results aren't quite representative (since we now cause some stalls instead of emitting nops), but they're impressive enough that I'm happy with the change. total instructions in shared programs: 95669 -> 91275 (-4.59%) instructions in affected programs: 82590 -> 78196 (-5.32%)	2018-07-23 10:21:43 -07:00
Eric Anholt	a1beb333d8	v3d: Drop unused vir_SAT() operation. We lower saturates in NIR.	2018-07-23 10:21:42 -07:00
Eric Anholt	beeb94402f	v3d: Implement noperspective varyings on V3D 4.x. Fixes a bunch of piglit interpolation tests, and reduces my concern about some MSAA blit shaders with noperspective varyings.	2018-07-09 11:48:32 -07:00
Eric Anholt	5601ab3981	v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE. Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one	2018-07-05 12:39:36 -07:00
Eric Anholt	7b63371420	v3d: Respect swap_color_rb for the f32_color_rb case. We don't actually set the two flags together, but I want to use the r/g/b/a reordered fields in the next commit.	2018-07-05 12:39:36 -07:00
Eric Anholt	f49d112a01	v3d: Implement ALPHA_TO_COVERAGE. There's a convenient "FTOC" instruction for generating the coverage now, unlike vc4. This fixes dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage	2018-06-20 09:30:46 -07:00
Eric Anholt	e130ada243	v3d: Fix shaders using pixel center W but no varyings. The docs called this field "uses both center W and centroid W", but actually it's "do you need center W even if varyings don't obviously call for it?" Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w	2018-06-15 16:09:39 -07:00
Eric Anholt	d91e06a065	v3d: Fix configuration setup of mixed f32 and f16 render targets. Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.	2018-06-14 16:52:25 -07:00
Eric Anholt	a40bc33b11	v3d: Fix undefined results for a swap_color_rb RT from a float shader output. Fixes segfaults and undefined behavior in dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float	2018-06-14 16:52:25 -07:00
Eric Anholt	9d5860310d	v3d: Enable the new NIR bitfield operation lowering paths. These together get the GLSL 3.00 unorm/snorm pack functions and MESA_shader_integer operations working. v2: Fix commit message typo. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-06-06 13:44:28 -07:00
Eric Anholt	76ee9edcb4	broadcom/vc5: Add support for centroid varyings. It would be nice to share the flags packet emit logic with flat shade flags, but I couldn't come up with a good way while still using our pack macros. We need to refactor this to shader record setup at compile time, anyway. Fixes ext_framebuffer_multisample-interpolation * centroid-*	2018-04-26 11:30:22 -07:00
Ian Romanick	d76c204d05	util: Move util_is_power_of_two to bitscan.h and rename to util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-03-29 14:09:23 -07:00
Eric Anholt	81f82ecc56	broadcom/vc5: Start using nir_opt_move_load_ubo(). In the absence of a general NIR or VIR-level scheduler, this at least avoids spilling in GTF-GLES3.gtf.GL3Tests.uniform_buffer_object.uniform_buffer_object_storage_layouts	2018-03-28 17:48:41 -07:00
Eric Anholt	ba29b89dc7	broadcom/vc5: Set up a vertex position if the shader doesn't. Our backend needs some sort of vertex position value to emit the scaled viewport values and such. Fixes potential segfaults in KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx	2018-03-22 15:12:21 -07:00
Eric Anholt	facc3c6f58	broadcom/vc5: Add support for register spilling. Our register spilling support is nice to have since vc4 couldn't at all, but we're still very restricted due to needing to not spill during a TMU operation, or during the last segment of the program (which would be nice to spill a value of, when there's a long-lived value being passed through with little modification from the start to the end). We could do better by emitting unspills for the last-segment values just before the last thrsw, since the last segment is probably not the maximum interference area. Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3 others.	2018-03-19 16:44:06 -07:00
Eric Anholt	d721348dcd	broadcom/vc5: Add cursors to the compiler infrastructure, like NIR's. This will let me do lowering late in compilation using the same instruction builder as we use in nir_to_vir.	2018-03-19 16:42:59 -07:00
Eric Anholt	c81d681742	broadcom/vc5: Move the umul macro to a header. Anywhere we want to multiply, we probably want this.	2018-03-19 16:42:59 -07:00
Eric Anholt	55bf298333	broadcom/vc5: Re-do live variables after removing thrsws. Otherwise our start/ends ips won't line up with the actual instructions.	2018-03-19 16:42:59 -07:00
Timothy Arceri	a050ea60ee	nir: add lower_ldexp to nir compiler options Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-02-28 09:23:49 +11:00
Eric Anholt	353b42ccc7	broadcom/vc5: Fix a segfault on mix of booleans. We don't have a src1 to look up if the compare instruction is "i2b".	2018-02-01 11:02:29 -08:00
Timothy Arceri	9a2e085680	nir: add lower_all_io_to_temps flag This will be used for freedreno and vc4 which require all inputs and outputs to be copied to temps. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-01-31 09:14:08 +11:00
Eric Anholt	91f899cbc1	broadcom/vc5: Update the compiler for V3D 4.2.	2018-01-27 19:04:21 +11:00
Eric Anholt	5bc0b63799	broadcom/vc5: Use MSF to ignore discards/non-dispatched channels in loops. Prevents potential infinite loops when a non-dispatched or discarded channel never triggers the loop break condition.	2018-01-12 21:58:24 -08:00
Eric Anholt	762dd52951	broadcom/vc5: Use XOR instead of SUB for execute flags comparisons. I think this should be equivalent other than power, and it's the kind of comparison we use for nir_op_ieq.	2018-01-12 21:58:18 -08:00
Eric Anholt	368bab43fd	broadcom/vc5: Add support for loading varyings in V3D 4.1. The LDVARY signal now writes an arbitrary register, so I took out the magic src register file and replaced it with an instruction with LDVARY set so we have somewhere to hang a QFILE_TEMP destination for register allocation.	2018-01-12 21:57:21 -08:00
Eric Anholt	5aaea3c4a0	broadcom/vc5: Add compiler support for V3D 4.x texturing.	2018-01-12 21:56:57 -08:00
Eric Anholt	42a35da96d	broadcom/vc5: Move V3D 3.3 texturing to a separate file. V3D 4.x texturing changes enough that #ifdefs would just make a mess of it.	2018-01-12 21:56:37 -08:00
Eric Anholt	acf30e4916	broadcom/vc5: Move V3D 3.3 VPM write setup to a separate file. For V4.1 texturing, I need the V4.1 XML, so the main compiler needs to stop including V3.3 XML.	2018-01-12 21:56:24 -08:00
Eric Anholt	90269ba353	broadcom/vc5: Use THRSW to enable multi-threaded shaders. This is a major performance boost on all of V3D, but is required on V3D 4.x where shaders are always either 2- or 4-threaded.	2018-01-12 21:55:30 -08:00

1 2

72 commits