fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-19 07:08:05 +02:00

Author	SHA1	Message	Date
Iago Toral Quiroga	1d021539a2	broadcom/compiler: track pipelineable ldvary sequences If we have two (or more) smooth varyings like this: nop t3; ldvary.rf0 fmul t5, t3, t0 fadd t6, t5, r5 nop t7; ldvary.rf0 fmul t9, t7, t0 fadd t10, t9, r5 nop t11; ldvary.rf0 fmul t13, t11, t0 fadd t14, t13, r5 We may be able to pipeline them like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 ; ldvary.r1 fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3 fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0 But in order to do this, we will need to manually tweak the QPU scheduling. This patch tracks information about ldvary sequences that are good candidates for pipelining, and a follow-up patch will use this information to pipeline them when we emit the QPU code. v2 (apinheiro): - Rename the v3d_compile fields to avoid confusion with the qinst fields. - Assert that a sequence's start instruction is not the same as the end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	c2c2cdc3d3	broadcom/compiler: fix indentation style Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Iago Toral Quiroga	b17ec53c81	broadcom/compiler: use nir_opt_sink total instructions in shared programs: 14072341 -> 14062334 (-0.07%) instructions in affected programs: 1996685 -> 1986678 (-0.50%) helped: 3038 HURT: 2432 Instructions are helped. total uniforms in shared programs: 3797720 -> 3794523 (-0.08%) uniforms in affected programs: 191711 -> 188514 (-1.67%) helped: 831 HURT: 449 Uniforms are helped. total max-temps in shared programs: 2340632 -> 2335124 (-0.24%) max-temps in affected programs: 113632 -> 108124 (-4.85%) helped: 2728 HURT: 436 Max-temps are helped. total spills in shared programs: 6050 -> 5931 (-1.97%) spills in affected programs: 2869 -> 2750 (-4.15%) helped: 14 HURT: 4 total fills in shared programs: 13970 -> 13371 (-4.29%) fills in affected programs: 8831 -> 8232 (-6.78%) helped: 14 HURT: 4 total inst-and-stalls in shared programs: 14103668 -> 14093712 (-0.07%) inst-and-stalls in affected programs: 2004035 -> 1994079 (-0.50%) helped: 3009 HURT: 2426 Inst-and-stalls are helped. LOST: 0 GAINED: 10 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9209>	2021-02-24 08:02:00 +01:00
Iago Toral Quiroga	54c17e45ae	broadcom/compiler: skip unnecessary unifa writes If a new UBO load happens to read exactly at the offset right after the previous UBO load (something that is fairly common, for example when reading a matrix), we can skip the unifa write (with its 3 delay slots) and just continue to call ldunifa to continue reading consecutive addresses. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	e20ae14978	broadcom/compiler: fix ldunif optimization When we look back for a previous uniform definition we want to start looking from the current position of the cursor, not the end of the current block. The latter only works when translating from NIR, since in that case both always match, but any optimization pass may rewrite code and emit uniforms at any place in the middle of the program. Also, ntq_store_dest expects result to be written by the last instruction to handle the case where it is stored to a NIR register. That won't be the case if the result comes from an optimized uniform, so in that case we need to insert a MOV, like we do in non-uniform control flow. v2: fix ntq_store_dest for optimized uniforms. Fixes: `14af7b3085` ('broadcom/compiler: don't emit redundant ldunif') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Acked-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	82981ccbb1	broadcom/compiler: use unifa for UBO loads from uniform addresses This basically processes UBO loads as uniform loads by writing the load address to the unifa register and reading sequential values with ldunifa. This process is faster than going through the TMU, but we can only use it when the address we are reading from is uniform across all channels, since we are basically reading from the UBO address as if it was a uniform stream. This leads to better performance in the UE4 Shooter demo. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:22 +00:00
Arcady Goldmints-Orlov	9909fe6bac	broadcom/compiler: Skip bool_to_cond where possible This change keeps track of when a boolean temp is loaded into the flags by a comparison instruction and uses that information to skip emitting instructions to set the flags in ntq_emit_bool_to_cond when the flags already have the right contents. total instructions in shared programs: 11116502 -> 11112225 (-0.04%) instructions in affected programs: 631691 -> 627414 (-0.68%) helped: 1591 HURT: 754 helped stats (abs) min: 1 max: 94 x̄: 4.14 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.10% x̃: 1.58% HURT stats (abs) min: 1 max: 19 x̄: 3.07 x̃: 2 HURT stats (rel) min: 0.13% max: 19.67% x̄: 1.88% x̃: 1.15% 95% mean confidence interval for instructions value: -2.02 -1.63 95% mean confidence interval for instructions %-change: -0.94% -0.71% Instructions are helped. total uniforms in shared programs: 3281555 -> 3281513 (<.01%) uniforms in affected programs: 1754 -> 1712 (-2.39%) helped: 10 HURT: 5 helped stats (abs) min: 1 max: 19 x̄: 7.90 x̃: 5 helped stats (rel) min: 0.56% max: 11.11% x̄: 7.37% x̃: 11.05% HURT stats (abs) min: 1 max: 15 x̄: 7.40 x̃: 3 HURT stats (rel) min: 0.64% max: 9.55% x̄: 5.31% x̃: 3.41% 95% mean confidence interval for uniforms value: -8.57 2.97 95% mean confidence interval for uniforms %-change: -7.35% 1.07% Inconclusive result (value mean confidence interval includes 0). total max-temps in shared programs: 1758419 -> 1758174 (-0.01%) max-temps in affected programs: 7006 -> 6761 (-3.50%) helped: 290 HURT: 14 helped stats (abs) min: 1 max: 8 x̄: 1.13 x̃: 1 helped stats (rel) min: 0.79% max: 22.86% x̄: 6.61% x̃: 4.88% HURT stats (abs) min: 1 max: 13 x̄: 6.00 x̃: 3 HURT stats (rel) min: 1.54% max: 54.17% x̄: 23.99% x̃: 9.12% 95% mean confidence interval for max-temps value: -1.03 -0.58 95% mean confidence interval for max-temps %-change: -6.24% -4.16% Max-temps are helped. total sfu-stalls in shared programs: 23676 -> 23610 (-0.28%) sfu-stalls in affected programs: 1578 -> 1512 (-4.18%) helped: 257 HURT: 252 helped stats (abs) min: 1 max: 3 x̄: 1.37 x̃: 1 helped stats (rel) min: 11.11% max: 100.00% x̄: 46.70% x̃: 40.00% HURT stats (abs) min: 1 max: 2 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.00% max: 200.00% x̄: 41.65% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: -0.25 -0.01 95% mean confidence interval for sfu-stalls %-change: -8.24% 2.33% Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 11140178 -> 11135835 (-0.04%) inst-and-stalls in affected programs: 633972 -> 629629 (-0.69%) helped: 1581 HURT: 755 helped stats (abs) min: 1 max: 94 x̄: 4.26 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.12% x̃: 1.59% HURT stats (abs) min: 1 max: 17 x̄: 3.17 x̃: 2 HURT stats (rel) min: 0.05% max: 19.67% x̄: 1.93% x̃: 1.20% 95% mean confidence interval for inst-and-stalls value: -2.06 -1.66 95% mean confidence interval for inst-and-stalls %-change: -0.93% -0.70% Inst-and-stalls are helped. Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Arcady Goldmints-Orlov	8762f29e9c	broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Eric Anholt	bcb5f9f94a	v3d: Stop advertising support for flat shading. The GL frontend can lower this weird GL feature away for us. This should fix redeclaration of the gl_Color/SecondaryColor as centroid, since that case had been missed in the !flat special case here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Arcady Goldmints-Orlov	0b29a8a206	Revert "broadcom/compiler: improve generation of if conditions" This reverts commit `93f8f83a95`. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8903>	2021-02-08 06:52:59 +00:00
Iago Toral Quiroga	6630825dcf	broadcom/compiler: let QPUs stall on TMU input/config overflows We have been trying to avoid this by tracking fifo usages in the driver and flushing all outstanding TMU sequences if we overflowed any of these, however, this is actually not the most efficient strategy. Instead, we would like to flush only enough operations to get things going again, which is better for pipelining. Doing that in the driver would require some additional work, but thankfully, it is not required, since this seems to be what the hardware does automatically, so we can just remove overflow tracking for these two fifos and enjoy the benefits. This also further improves shader-db stats: total instructions in shared programs: 8975062 -> 8955145 (-0.22%) instructions in affected programs: 1637624 -> 1617707 (-1.22%) helped: 4050 HURT: 2241 Instructions are helped. total threads in shared programs: 236802 -> 237042 (0.10%) threads in affected programs: 252 -> 492 (95.24%) helped: 122 HURT: 2 Threads are helped. total sfu-stalls in shared programs: 19901 -> 19592 (-1.55%) sfu-stalls in affected programs: 4744 -> 4435 (-6.51%) helped: 1248 HURT: 1051 Sfu-stalls are helped. total inst-and-stalls in shared programs: 8994963 -> 8974737 (-0.22%) inst-and-stalls in affected programs: `1636184` -> 1615958 (-1.24%) helped: 4050 HURT: 2239 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	e18d6bbf2f	broadcom/compiler: disable TMU pipelining if we fail to register allocate TMU pipelining can severely reduce our capacity to emit TMU spills, causing us to fail to compile a shader we may otherwise be able to compile. This is because pipelining extends the liveness of TMU sequences by posponing the thread switch and LDTMU until a result is needed, and we can't emit TMU spills while in the middle of a TMU sequence. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	be45960d3e	broadcom/compiler: support pipelining of tex instructions This follows the same idea as for TMU general instructions of reusing the existing infrastructure to first count required register writes and flush outstanding TMU dependencies, and then emit the actual writes, which requires that we split the code that decides about register writes to a helper. We also need to start using a component mask instead of the number of components that we need to read with a particular TMU operation. v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	197090a3fc	broadcom/compiler: implement pipelining for general TMU operations This creates the basic infrastructure to implement TMU pipelining and applies it to general TMU. Follow-up patches will expand this to texture and image/load store operations. TMU pipelining means that we don't immediately end TMU sequences, and instead, we postpone the thread switch and LDTMU (for loads) or TMUWT (for stores) until we really need to do them. For loads, we may need to flush them if another instruction reads the result of a load operation. We can detect this because in that case ntq_get_src() will not find the definition for that ssa/reg (since we have not emitted the LDTMU instructions for it yet), so when that happens, we flush all pending TMU operations and then try again to find the definition for the source. We also need to flush pending TMU operations when we reach the end of a control flow block, to prevent the case where we emit a TMU operation in a block, but then we read the result in another block possibly under control flow. It is also required to flush across barriers and discards to honor their semantics. Since this change doesn't implement pipelining for texture and image load/store, we also need to flush outstanding TMU operations if we ever have to emit one of these. This will be corrected with follow-up patches. Finally, the TMU has 3 fifos where it can queue TMU operations. These fifos have limited capacity, depending on the number of threads used to compile the shader, so we also need to ensure that we don't have too many outstanding TMU requests and flush pending TMU operations if a new TMU operation would overflow any of these fifos. While overflowing the Input and Config fifos only leads to stalls (which we want to avoid anyway), overflowing the Output fifo is incorrect and would end up with a broken shader. This means that we need to know how many TMU register writes are required to emit a TMU operation and use that information to decide if we need to flush pending TMU operations before we emit any register writes for the new TMU operation. v2: fix TMU flushing for NIR registers reads (jasuarez) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	3926030183	broadcom/compiler: fix indentation with TABs Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Arcady Goldmints-Orlov	93f8f83a95	broadcom/compiler: improve generation of if conditions Where it is safe to do so, avoid the generation of code to convert a condition code into a boolean which is then tested to generate a condition code. This is only done in uniform ifs, and only for condition values that are SSA and only used once (in that if statement). shader-db relative to MR 7726: total instructions in shared programs: 8985667 -> 8974151 (-0.13%) instructions in affected programs: 390140 -> 378624 (-2.95%) helped: 810 HURT: 276 helped stats (abs) min: 1 max: 49 x̄: 17.77 x̃: 16 helped stats (rel) min: 0.10% max: 33.63% x̄: 7.97% x̃: 6.45% HURT stats (abs) min: 1 max: 46 x̄: 10.42 x̃: 10 HURT stats (rel) min: 0.16% max: 21.54% x̄: 2.26% x̃: 2.03% 95% mean confidence interval for instructions value: -11.46 -9.75 95% mean confidence interval for instructions %-change: -5.76% -4.97% Instructions are helped. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8709>	2021-02-02 06:55:49 +00:00
Arcady Goldmints-Orlov	8f583df7b6	broadcom/compiler: Enable PER_QUAD TMU access only in uniform control flow PER_QUAD TMU lookups will partially override the predication mask on TMU writes. If some but not all lanes in a quad are predicated out, setting PER_QUAD will force them all to be enabled. This can result in TMU access to bogus addresses when in nonuniform control flow. Also, since PER_QUAD is needed to make sure derivatives work with helper invocations, and derivatives are undefined in nonuniform control flow, there is no reason to leave it enabled in this case. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Arcady Goldmints-Orlov	79bde75131	broadcom/compiler: Emit uniform loops using uniform control flow Similarly to if statements, uniform loops are now emitted without predication, using simple branches for breaks and continues. The uniformity of the loop is determined by running the nir_divergence_analysis pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Arcady Goldmints-Orlov	6643bdbd53	broadcom/compiler: Use ANYA for branches in uniform ifs Using ANYAP instead of ALLAP makes things work correctly in cases where all lanes are masked out. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Caio Marcelo de Oliveira Filho	9f3d5e99ea	compiler: Use util/bitset.h for system_values_read It is currently a bitset on top of a uint64_t but there are already more than 64 values. Change to use BITSET to cover all the SYSTEM_VALUE_MAX bits. Cc: mesa-stable Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>	2021-01-26 20:20:47 +00:00
Alejandro Piñeiro	212b1516df	v3d/compiler: enable lower_add_sat NIR option We are enabling this option for the Vulkan driver, so it makes sense to enable it for the OpenGL one. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8582>	2021-01-20 12:41:52 +00:00
Daniel Schürmann	bd8e84eb8d	nir: replace .lower_sub with .has_fsub and .has_isub This allows a more fine-grained control about whether a backend supports one of these instructions. Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6597>	2021-01-11 19:13:51 +00:00
Jason Ekstrand	630e54a08b	nir: Add a halt instruction type Halt is like a return for the entire shader or exit() if you prefer to think of it that way. Once an invocation hits a halt, it's 100% dead. Any writes to output variables which happened before the halt do, however, still apply. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>	2020-11-25 05:37:09 +00:00
Juan A. Suarez Romero	1e723745dd	v3d/compiler: extend swapping R/B support to all vertex attributes So far the support for R/B swapping in vertex attributes were for the generic attributes. But there are cases like glSecondaryColorPointer() supporting BGRA formats that require the R/B swapping to be also allowed in the non-generic vertex attributes (in this case, in the COLOR1 attribute). v2: - Don't split line (Iago) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7196>	2020-11-05 12:15:28 +00:00
Iago Toral Quiroga	40788be134	v3d/compiler: fix BGRA vertex attributes for vec2/float size. We don't natively support BGRA format, instead we handle these as RGBA and we lower the loads to swap components R and B. However, the driver emits VPM loads based on the size of the input variables so when we have a vec2 or float BGRA input, it would only emit VPM loads for components 0 and 1, which is not correct since we emit a load of component 2 to swap with component 0. v2: handle GL legacy vertex inputs gracefully. Fixes: dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-highp-output-vec2 dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-mediump-output-vec2 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7271>	2020-10-23 09:19:02 +02:00
Erik Faye-Lund	8ad931808e	v3d: do not report alpha-test as supported This triggers lowering in the state-tracker, which makes things a bit simpler. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7251>	2020-10-21 16:33:43 +00:00
Iago Toral Quiroga	442f48f27b	v3d/compiler: implement load interpolated input intrinsics We will lower GLSL interpolateAt functions to these. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Acked-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7155>	2020-10-15 02:04:04 +02:00
Iago Toral Quiroga	3ec165bce9	broadcom/compiler: track partially interpolated fragment inputs We will need these to implement GLSL's interpolateAt*() functions where we are required to perform interpolation in the shader at arbitrary offsets. Acked-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7155>	2020-10-15 02:04:04 +02:00
Arcady Goldmints-Orlov	ac5f0ee19c	broadcom/compiler: support varyings with struct types This adds support for using structs as outputs from vertex shaders and inputs to fragment shaders. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6721>	2020-10-14 22:54:58 +00:00
Iago Toral Quiroga	7eb8eb10f6	v3d/compiler: allow to batch spills Some shaders that need to spill hundreds of registers can take very long times to compile as each allocation attempt spills a single register and restarts the allocation process. We can significantly cut down these times if we allow the compiler to spill in batches, which should be possible if we are spilling uniforms, which is in fact the kind of spills that we do first because they have lower cost than TMU spills. Doing this could cause us to slightly over spill in some cases (depending on the chosen batch size) leading to slightly worse performance, so we only enable this behavior after we have started to spill over a certain threshold, at which point we assume that performance won't be good and we want to favor compilation speed instead. v2: - Keep it simple and just try to spill a fixed amount of registers in a batch instead of trying to compute this dynamically based on accumulated spills and current register pressure. (Eric). v3: - Check if the node is valid before doing anything with it. - Drop the environment variable to select batch size and just fix it to 20. With this we can take this CTS test from 35 minutes down to about 3 minutes: dEQP-VK.ssbo.layout.random.all_shared_buffer.5 Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:33 +00:00
Iago Toral Quiroga	4401dde0e9	broadcom/compiler: rename QUNIFORM_GET_BUFFER_SIZE to QUNIFORM_GET_SSBO_SIZE Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:33 +00:00
Iago Toral Quiroga	d93d903a37	v3d/compiler: implement nir_intrinsic_get_ubo_size Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:33 +00:00
Alejandro Piñeiro	02b9670611	broadcom/compiler: allow GLSL_SAMPLER_DIM_BUF on txs emission Although we don't support texture buffers on the OpenGL driver, we are already doing that for the Vulkan driver. This would be needed for the OpenGL driver in any case. Fixes following tests on v3dv: dEQP-VK.memory.pipeline_barrier.host_write_uniform_texel_buffer.* dEQP-VK.memory.pipeline_barrier.transfer_dst_uniform_texel_buffer.* Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:33 +00:00
Iago Toral Quiroga	644a15e69e	v3dv: implement nir_texop_texture_samples Fixes: dEQP-VK.glsl.texture_functions.query.texturesamples.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:32 +00:00
Iago Toral Quiroga	1c4c7d95f7	broadcom/compiler: track if the fragment shader forces per-sample MSAA For example, regarding gl_SampleID, the GLSL spec states: "Any static use of this variable in a fragment shader causes the entire shader to be evaluated per-sample." So we need to track if the fragment shader does anything that implicitly enables per-sample shading in the compiler for the driver to auto-enable sample rate shading if needed. v2: - Instead of tracking reads of gl_SampleID, check SYSTEM_BIT_SAMPLE_ID and SYSTEM_BIT_SAMPLE_POS as well as the sample layout qualifier like other drivers are doing to activate this behavior (Eric). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1) Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:32 +00:00
Iago Toral Quiroga	531ea3596d	broadcom/compiler: implement nir_intrinsic_load_sample_pos This is intended to return the sample location within the pixel. Fixes: dEQP-VK.pipeline.multisample_shader_builtin.sample_position.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:32 +00:00
Iago Toral Quiroga	14d74c07aa	broadcom/compiler: handle gl_SampleMask writes in fragment shaders We didn't need this until now, since this was included with GLES 3.2, but we need it for Vulkan. Eric had already done the plumbing for it though, we just need to actually emit the mask. Fixes some tests in: dEQP-VK.renderpass.suballocation.multisample_resolve.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:32 +00:00
Iago Toral Quiroga	f41857eb48	v3d/compiler: implement nir_intrinsic_load_base_instance Vulkan lowers gl_InstanceIndex to load_base_instance + load_instance_id, so we need to implement loading the base instance in the compiler. The base instance is set by the BASE_VERTEX_BASE_INSTANCE command right before the instanced draw call and it is included in the VPM payload together with the InstanceID and VertexID if this is requested by the shader record. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:29 +00:00
Iago Toral Quiroga	1f41a128e0	v3d/compiler: implement nir_op_fquantize2f16 Reviewd-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:28 +00:00
Alejandro Piñeiro	c8212731e7	v3d/compiler: handle GL/Vulkan differences in uniform handling This also adds a v3d_execution_environment, so compiler could know if it is generating code for OpenGL or Vulkan needs. Reviewed-by: Iago Toral <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:27 +00:00
Alejandro Piñeiro	1c8226c682	v3d/compiler: update uses_vid/uses_iid check In order to take into account the vulkan specific system values SYSTEM_VALUE_INSTANCE_INDEX and SYSTEM_VALUE_VERTEX_ID_ZERO_BASE. Reviewed-by: Iago Toral <itoral@igalia.com> Reviewed-by: Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:26 +00:00
Alejandro Piñeiro	8de380d26a	broadcom/compiler: add V3D_DEBUG_RA option To ask to debug a registr allocation failure (V3D_DEBUG_REGISTER_ALLOCATION seemed too long to me). When a fallback register allocation algorithm was added, if the register allocation fails, it only dumpg the current vir with the register pressure info with the failed fallback. But if we want do debug the problem, we would be interested on both. Additionally, it was strange that we got the full vir dump with the failure even if no debug option was set. Additionally we add shaderdb like stats for those failures, to make easier to compare one and the other. v2: keep a small warning message in case both register allocation algorithms fails (Neil) Reviewed-by: Neil Roberts <nroberts@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6999>	2020-10-07 20:21:17 +00:00
Kenneth Graunke	140f53e646	Revert "nir: replace lower_ffma and fuse_ffma with has_ffma" This reverts commit `939ddf3f67`. Intel has a separate pass for fusing FFMAs selectively. We split these flags in commit `1b72c31e1f` and the reasoning still stands. The patch being reverted was just a cleanup, so there should be no issue with reverting it. Acked-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6849>	2020-09-24 13:11:50 -07:00
Marek Olšák	939ddf3f67	nir: replace lower_ffma and fuse_ffma with has_ffma Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6756>	2020-09-24 12:29:11 +00:00
Marek Olšák	771aad3027	nir: split lower_ffma into lower_ffma16/32/64 AMD wants different behavior for each bit size Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6756>	2020-09-24 12:29:11 +00:00
Jason Ekstrand	9750164c09	nir: Rename get_buffer_size to get_ssbo_size This makes it explicit that this intrinsic is only for SSBOs. For the v3dv driver, we'll be adding a get_ubo_size intrinsic and we want to be able to distinguish between the two. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6812>	2020-09-22 13:34:12 +00:00
Marek Olšák	ac55b1a9a6	nir: get ffma support from NIR options for nir_lower_flrp This also fixes the inverted last parameter of nir_lower_flrp in most drivers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6599>	2020-09-04 17:06:22 +00:00
Karol Herbst	e5899c1e88	nir: rename nir_op_fne to nir_op_fneu It was always fneu but naming it fne causes confusion from time to time. So lets rename it. Later we also want to add other unordered and fne, this is a smaller preparation for that. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>	2020-08-21 17:26:21 +00:00
Jason Ekstrand	1ccd681109	nir: Add an LOD parameter to image_*_size The OpenCL image_width/height/depth functions have variants which can take an LOD parameter. More importantly, LLVM-SPIRV-Translator always generates OpImageQuerySizeLod even if the LOD is guaranteed to be zero. Given that over half the hardware out there has an LOD field for image size queries (based on a rudimentary scan through their NIR -> whatever code), we may as well just add the source to the NIR intrinsic. If this is ever a problem for anyone, the lowering is pretty trivial. I've also added asserts to everyone's drivers that should alert them if they ever see an LOD other than zero. This will never happen with GL or Vulkan so there's no need for panic. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6396>	2020-08-20 20:48:10 +00:00
Arcady Goldmints-Orlov	a104902590	broadcom/compiler: Enable PER_QUAD for UBO and SSBO loads. Helper invocations need to be able to read from UBOs since those values can be used for flow control, but writes from helper invocations need to be dropped. Fixes CTS tests: dEQP-VK.glsl.derivate..uniform_loop. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6356>	2020-08-20 20:14:14 +00:00

1 2 3 4 5

225 commits