fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-18 13:48:06 +02:00

Author	SHA1	Message	Date
Eric Anholt	248a7fb392	v3d: Do uniform pretty-printing in the QPU dump. If you're trying to trace what's going on in a QPU dump, this will definitely help you find your way.	2018-12-14 17:48:01 -08:00
Eric Anholt	532b6c5671	v3d: Move uniform pretty-printing to its own helper function. I want to reuse it in the QPU dump.	2018-12-14 17:48:01 -08:00
Eric Anholt	a7e15a5086	v3d: Avoid assertion failures when removing end-of-shader instructions. After generating VIR, we leave c->cursor pointing at the end of the shader. If the shader had dead code at the end (for example from preamble instructions in a shader with no side effects), we would assertion fail that we were leaving the cursor pointing at freed memory. Since anything following DCE should be setting up a new cursor anyway, just clear the cursor at the start.	2018-12-14 17:48:01 -08:00
Eric Anholt	3f9bcf9136	v3d: Make sure that a thrsw doesn't split a multop from its umul24. The thrsw will invalidate rtop, just like accumulators and flags. Caught by simulator assertions in CS imulextended/umulextended tests. Fixes: `90269ba353` ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")	2018-12-14 17:48:01 -08:00
Eric Anholt	f1d98204c3	v3d: Fix a leak of the disassembled instruction string during debug dumps. Fixes: `ade416d023` ("broadcom: Add VC5 NIR compiler.")	2018-12-07 16:48:23 -08:00
Eric Anholt	bad95bb13c	v3d: Add VIR dumping of TMU config p0/p1. I had a bit of it for V3D 3.x, but didn't update it for 4.x.	2018-12-07 16:48:23 -08:00
Eric Anholt	1fc78ff3f1	v3d: Simplify VIR uniform dumping using a temporary.	2018-12-07 16:48:23 -08:00
Eric Anholt	5932575299	v3d: Garbage collect unused uniforms code.	2018-12-07 16:48:23 -08:00
Eric Anholt	acecee4c2d	v3d: Return the right gl_SampleMaskIn[] value. It's supposed to be the dispatched sample mask for this pixel, not the GL state's sample mask.	2018-12-07 16:48:23 -08:00
Eric Anholt	6870111051	v3d: Fix a comment typo	2018-12-07 16:48:23 -08:00
Eric Anholt	ca0e4ae4bc	v3d: Convert to using nir_src_as_uint() from const_value derefs. Follows `16870de8a0` ("nir: Use nir_src_is_const and nir_src_as_* in core code") to clean up v3d.	2018-12-07 16:48:23 -08:00
Eric Anholt	42652ea51e	v3d: Use combined input/output segments. The HW apparently has some issues (or at least a much more complicated VCM calculation) with non-combined segments, and the closed source driver also uses combined I/O. Until I get the last CTS failure resolved (which does look plausibly like some VPM stomping), let's use combined I/O too.	2018-12-07 16:48:23 -08:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00
Kenneth Graunke	5b682143da	nir: Make nir_lower_clip_vs optionally work with variables. The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:33:16 -08:00
Eric Anholt	538bca78e2	v3d: Don't try to set PF flags on a LDTMU operation We need an ALU op in order to set PF. Fixes a recent assertion failure in dEQP-GLES3.functional.ubo.single_basic_type.shared.bool_vertex	2018-11-15 11:12:54 -08:00
Eric Anholt	4e1b163eed	v3d: Update the TLB config for depth writes on V3D 4.2. Fixes 311 piglit cases on the simulator.	2018-11-01 13:56:30 -07:00
Eric Anholt	cc54e1acf9	v3d: Use nir_remove_unused_io_vars to handle binner shader output DCE We were doing this late after nir_lower_io, but we can just reuse the core code. By doing it at this stage, we won't even set up the VS attributes as inputs, reducing our VPM size.	2018-10-30 10:46:52 -07:00
Eric Anholt	c152c79d5e	v3d: Only add output slot tracking for the current varying slot. We always emit 4 slots per slot because things like color output and position processing in the epilogue will potentially look up more values than the variable declaration had. However, when we get a .location_frac != 0, we don't want to overwrite components of the following .driver_location.	2018-10-30 10:46:52 -07:00
Eric Anholt	17c8198952	v3d: Use nir_lower_io_to_scalar_early to DCE unused VS input components. This lets us trim unused trailing components in the vertex attributes, reducing the size of our VPM allocations.	2018-10-30 10:46:52 -07:00
Eric Anholt	fc85f7cfdc	v3d: Don't rely on sorting input vars for VPM read setup. For supporting scalar VPM i/o at the NIR level, we need to do a pass over the vars to figure out how big each attribute is after DCE. Once we've done that, we can just walk over c->vattr_sizes[] instead of bothering with vars.	2018-10-30 10:46:52 -07:00
Eric Anholt	cc78676030	v3d: Split out NIR input setup between FS and VPM. They don't share much code, and I'm about to rewrite the remaining shared code for the VPM case.	2018-10-30 10:46:52 -07:00
Eric Engestrom	bb84fa146f	util: use C99 declaration in the for-loop hash_table_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Eric Anholt	8ec83dc51e	v3d: Add support for hardware pack/unpack of half floats. Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.	2018-10-15 17:16:44 -07:00
Eric Anholt	a91b158bd9	v3d: Fix setup of the VCM cache size. There were two bugs working together to make things mostly work: I wasn't dividing the VPM output size available by the size of a batch (vertex), but I also had the size of the VPM reduced by a factor of 8. Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it seems also my intermittent varying failures. Fixes: `1561e4984e` ("v3d: Emit the VCM_CACHE_SIZE packet.")	2018-09-07 08:11:38 -07:00
Eric Anholt	1561e4984e	v3d: Emit the VCM_CACHE_SIZE packet. This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	50a8713d4f	v3d: Avoid spilling that breaks the r5 usage after a ldvary. Fixes bad rendering when forcing 2 spills in glxgears. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	f2c0d310d6	v3d: Make sure that QPU instruction-has-a-dest matches VIR. Found when debugging register spilling -- we would try to spill the dest of a STVPMV, inserting spill code after entering the last segment. In fact, we were likely to to choose to do this, given that the STVPMV "dest" temp was never read from, making it cheap to spill. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	3f9cb2eb05	v3d: Wait for TMU writes to complete before continuing after a spill. The simulator complained that we had write responses outstanding at shader end. It seems that a TMU read does not guarantee that previous TMU writes by the thread have completed, which surprised me. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	ccbe33af5b	v3d: Make sure we don't emit a thrsw before the last one finished. Found while forcing some spilling, which creates a lot of short tmua->thrsw->ldtmu sequences. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	f9d54dc3cf	v3d: Add some debug code for forcing register spilling. This is useful for periodically testing out register spilling to see how it goes on simple shaders, rather than only failing on insanely complicated ones.	2018-08-06 13:03:23 -07:00
Eric Anholt	3471ce9985	v3d: Add support for the TMUWT instruction. This instruction is used to ensure that TMU stores have been processed before moving on. In particular, you need any TMU ops to be done by the time the shader ends.	2018-07-31 16:05:04 -07:00
Eric Anholt	27f1bfe471	vc4: Fix meson build when enabled without v3d. Reported-by: Rob Clark <robdclark@gmail.com> Fixes: `e92959c4e0` ("v3d: Pass the whole clif_dump structure to v3d_print_group().")	2018-07-29 19:13:29 -07:00
Eric Anholt	d934d3206e	nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform. This is controlled by a new nir_shader_compiler_options flag, and fixes dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-26 11:00:34 -07:00
Eric Anholt	6b73a97f84	v3d: Implement a small immediates optimization, based on VC4's. We can do one per instruction, and we have to be careful not to overwrite raddr_b, but this greatly reduces the pressure on uniform loads (particularly around ldvpm/stvpm instructions). total instructions in shared programs: 90768 -> 88220 (-2.81%) instructions in affected programs: 82711 -> 80163 (-3.08%)	2018-07-23 10:21:43 -07:00
Eric Anholt	79e0f042bc	v3d: Return an invalid src number if asked for a missing implicit uniform. Sometimes when iterating over sources, we might want to check if it's the implicit one. We wouldn't want to match on a non-implicit src using this function.	2018-07-23 10:21:43 -07:00
Eric Anholt	f2ea936f48	v3d: Skip emitting texture config parameter 2 if it's just the defaults. shader-db: total instructions in shared programs: 91275 -> 90768 (-0.56%) instructions in affected programs: 20702 -> 20195 (-2.45%)	2018-07-23 10:21:43 -07:00
Eric Anholt	421e99d777	v3d: Update an XXX comment for a path we handled in HW on V3D 4.x.	2018-07-23 10:21:43 -07:00
Eric Anholt	e7ae900341	v3d: Switch to using the new SFU instructions on V3D 4.x. These instructions let us write directly to the phys regfile, instead of just R4. That lets us avoid moving out of R4 to avoid conflicting with other SFU results, and to avoid conflicting with thread switches. There is still an extra instruction of latency, which is not represented in the scheduler at the moment. If you use the result before it's ready, the QPU will just stall, unlike the magic R4 mode where you'd read the previous value. That means that the following shader-db results aren't quite representative (since we now cause some stalls instead of emitting nops), but they're impressive enough that I'm happy with the change. total instructions in shared programs: 95669 -> 91275 (-4.59%) instructions in affected programs: 82590 -> 78196 (-5.32%)	2018-07-23 10:21:43 -07:00
Eric Anholt	cdfa99657d	v3d: Fix the name of the "flpop" operation. Noticed while trying to sort a new op into the appropriate place to match the documentation.	2018-07-23 10:21:43 -07:00
Eric Anholt	a1beb333d8	v3d: Drop unused vir_SAT() operation. We lower saturates in NIR.	2018-07-23 10:21:42 -07:00
Eric Anholt	8dfc6ee317	v3d: Rotate through registers to improve post-RA scheduling options. Similarly to VC4's implementation, by not picking r0 immediately upon freeing it, we give the scheduler more of a chance to fit later writes in earlier. I'm not clear on whether there's any real cost to picking phys over accumulators, so keep that behavior for now. shader-db: total instructions in shared programs: 96831 -> 95669 (-1.20%) instructions in affected programs: 77254 -> 76092 (-1.50%)	2018-07-23 10:21:42 -07:00
Eric Anholt	1fb31819ae	v3d: Allow reading from physical regs written in the previous instruction. This restriction existed in V3D 2.x, but lifting it was a major change in 3.x. shader-db results: total instructions in shared programs: 98117 -> 96831 (-1.31%) instructions in affected programs: 48520 -> 47234 (-2.65%)	2018-07-23 10:21:23 -07:00
Eric Anholt	229836fb37	v3d: Disable shader-db cycle estimates until we sort out TMU estimates. I keep having to ignore these shader-db changes since I don't trust them, so just disable the reports entirely.	2018-07-16 14:39:59 -07:00
Eric Anholt	2baab6bf2a	v3d: Emit the lowered uniform just before its first use in a block. total instructions in shared programs: 98578 -> 98119 (-0.47%) instructions in affected programs: 27571 -> 27112 (-1.66%) and it also eliminates most spills/fills on the CTS's randomized uniform usage testcases.	2018-07-16 14:39:59 -07:00
Eric Anholt	26f830d9fc	v3d: Add an assert that we don't provide an invalid texture return words. The docs had an update noting this restriction, so reflect it in the code.	2018-07-16 14:39:59 -07:00
Eric Anholt	d661d78464	v3d: Apply GFXH-1625 restriction on TMUWT in the end of the shader. This doesn't affect us yet since we're not doing TMUWTs, but I think we will for GLES 3.1.	2018-07-16 14:39:59 -07:00
Eric Anholt	beeb94402f	v3d: Implement noperspective varyings on V3D 4.x. Fixes a bunch of piglit interpolation tests, and reduces my concern about some MSAA blit shaders with noperspective varyings.	2018-07-09 11:48:32 -07:00
Eric Anholt	5601ab3981	v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE. Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one	2018-07-05 12:39:36 -07:00
Eric Anholt	7b63371420	v3d: Respect swap_color_rb for the f32_color_rb case. We don't actually set the two flags together, but I want to use the r/g/b/a reordered fields in the next commit.	2018-07-05 12:39:36 -07:00
Eric Anholt	f49d112a01	v3d: Implement ALPHA_TO_COVERAGE. There's a convenient "FTOC" instruction for generating the coverage now, unlike vc4. This fixes dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage	2018-06-20 09:30:46 -07:00

1 2 3

127 commits