fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-05 07:28:11 +02:00

Author	SHA1	Message	Date
Connor Abbott	b178fdf486	lima/gp: Fix problem with complex moves When writing the scheduler, we forgot that you can't read the complex unit in certain sources because it gets overwritten to 0 or 1. Fixing this turned out to be possible without giving up and reducing GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't expect. There can be at most 4 next-max nodes that can't have moves scheduled in the complex slot, so it actually isn't a problem for getting the number of next-max nodes at 5 or lower. However, it is a problem for stores. If a given node is a next-max node whose move cannot go in the complex slot and is used by a store that we decide to schedule, we have to reserve one of the non-complex slots for a move instead of all the slots, or we can wind up in a situation where only the complex slot is free and we fail the move. This means that we have to add another term to the reservation logic, for stores whose children cannot be in the complex slot. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	54434fe670	lima/gpir: Rework the scheduler Now, we do scheduling at the same time as value register allocation. The ready list now acts similarly to the array of registers in value_regalloc, keeping us from running out of slots. Before this, the value register allocator wasn't aware of the scheduling constraints of the actual machine, which meant that it sometimes chose the wrong false dependencies to insert. Now, we assign value registers at the same time as we actually schedule instructions, making its choices reflect reality much better. It was also conservative in some cases where the new scheme doesn't have to be. For example, in something like: 1 = ld_att 2 = ld_uni 3 = add 1, 2 It's possible that one of 1 and 2 can't be scheduled in the same instruction as 3, meaning that a move needs to be inserted, so the value register allocator needs to assume that this sequence requires two registers. But when actually scheduling, we could discover that 1, 2, and 3 can all be scheduled together, so that they only require one register. The new scheduler speculatively inserts the instruction under consideration, as well as all of its child load instructions, and then counts the number of live value registers after all is said and done. This lets us be more aggressive with scheduling when we're close to the limit. With the new scheduler, the kmscube vertex shader is now scheduled in 40 instructions, versus 66 before. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	12645e8714	lima/gp: Mark more add-only nodes as maybe-two-slot Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	16de3dd7a6	lima/gpir: Fix some bugs in instruction handling Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	cc78a42577	lima: Reintroduce the standalone compiler I used this to test things without needing to have a device handy. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	4423552ff0	nir/lower_viewport: Check variable mode first The location is unused for shader_temp and function_temp variables, and due to the way we nir_lower_io_to_temproraries demotes shader_out variables to shader_temp variables, it happened to equal VARYING_SLOT_POS for the gl_Position temporary, which made this pass fail with the offline compiler due to this coming before vars_to_ssa. Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:21:41 +02:00
Samuel Pitoiset	6e5e4bf050	radv/gfx10: set BREAK_WAVE_AT_EOI if TES or GS enable the primitive ID Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-18 10:37:10 +02:00
Samuel Pitoiset	8c692ff512	radv/gfx10: move emitting VGT_PRIMITIVEID_EN into the NGG path And do not emit VGT_GS_MODE which is unnecessary on GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-18 10:36:38 +02:00
Samuel Pitoiset	8315dbe419	radv/gfx10: do not always execute a barrier before the second shader With NGG, empty waves may still be required to export data. This fixes dEQP-VK.ycbcr.format._unorm.geometry_. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-18 10:06:34 +02:00
Samuel Pitoiset	63d670e350	radv: fix VGT_GS_MODE if VS uses the primitive ID Found by inspection. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-18 10:03:12 +02:00
Iago Toral Quiroga	c23fa1ca07	v3d: emit correct lowering for logic operations with MSAA render targets v2: - Drop the writemask from the per-sample color intrinsic (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	93d05c1c1f	v3d: handle nir_intrinsic_store_tlb_sample_color_v3d v2: - Move handling of output intrinsics to ntq_emit_intrinsic() (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	50016d7718	nir: add a V3D-specific intrinsic for per-sample color writes For per-sample color writes we need the output intrinsic to pack the sample index, which is not provided with regular store_output intrinsics unless we figured out a way to encode it into the base or the offset. v2: - Drop the writemask (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	ba520b00c4	v3d: implement per-sample tlb color writes Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	b96c2219ca	v3d: refactor the tlb color write code We want to split the tlb specifier setup from the color writes, because when we implement per-sample color writes we want to do the latter for all the samples, but the former only once. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	fd3ec6f55d	v3d: move tlb color write emission to a helper function We will soon be adding per-sample color writes which means additional complexity and more indentation (we will need another loop to emit the writes for each individual sample), so this will help keeping things simple and a bit more readable. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	0c9919710e	v3d: implement per-sample tlb color reads Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Lionel Landwerlin	3adc32df92	anv: fix format mapping for depth/stencil formats anv_format is supposed to have a pointer back to the associated VkFormat, we were missed this for depth/stencil formats. This doesn't fix anything afaict, but will be needed for future changes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `465de47bad` ("anv: associate vulkan formats with aspects") Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-18 09:40:01 +03:00
Dave Airlie	a68f593a0e	radv: put back VGT_FLUSH at ring init on gfx10 I can find no evidence that removing this is a good idea. Fixes: `9b116173b6` ("radv: do not emit VGT_FLUSH on GFX10") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-18 16:24:44 +10:00
Gert Wollny	45951452aa	softpipe: Clamp border colors when needed unorm and snorm require that the border color values are clamped, so when picking the sampler view copy/clamp the border color from the sampler and use these adjusted values. Fixes: dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_compressed_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_snorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_srgb_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_unorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_compressed_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_snorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_srgb_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth_uint_stencil_sample_depth Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-18 05:49:00 +02:00
Gert Wollny	230b99ce2f	softpipe: set a lower minimum clamp value for texture coordinate border clamp The value of -0.5f is not small enough to produce negative coordinates, so lower the minimum clamp value to -1.0f. This fixes a number of tests from dEQP-GLES31.functional.texture.border_clamp.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-18 05:47:23 +02:00
Gert Wollny	eae4c6df8d	softpipe: Correct repeat-mirror evaluation when mirroring the texture corrdinates the indices must be mirrored as well and the half pixel shift must be applied in reverse. Fixes a number of tests from: dEQP-GLES31.functional.texture.gather.offset.* dEQP-GLES31.functional.texture.gather.offsets.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-18 05:47:23 +02:00
Gert Wollny	fff624fca4	softpipe: Also mark textures as dirty when updating the framebuffer state At this point all the draw caches are flushed to the old attached textures, so the read caches of these textures will need to be updated too. Fixes: dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-18 05:33:59 +02:00
Jonathan Marek	08514a9721	etnaviv: set DITHER_MODE This fixes a rendering glitch observed in SDL testscale test, where alpha blending samples with value (1.0, 1.0, 1.0, 0.0) whitens the target instead of having no effect. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:50 -04:00
Jonathan Marek	aaf0c47c76	etnaviv: update headers from rnndb Update to etna_viv commit a16a418. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:50 -04:00
Jonathan Marek	76adf041f2	etnaviv: fix blend color on newer GPUs Newer GPUs use the half float ALPHA_COLOR_EXT register. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:50 -04:00
Jonathan Marek	5f73726013	etnaviv: fix alpha blending cases We need to check rgb_func/alpha_func when determining if blend or separate alpha is required. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:35 -04:00
Jonathan Marek	6c3c05dc38	etnaviv: fix polygon offset Dividing the fui result by 65535 is obviously wrong, and from testing, on GC7000L at least there is no division by 65535. Fixes dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:07 -04:00
Timothy Arceri	a20a9d0c5e	radv: dont store disasm string unless keep_shader_info flag set This fixes the memory use regression from bug 111107. Fixes: `726a31df70` ("radv: Add the concept of radv shader binaries.") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111107	2019-07-18 00:25:55 +00:00
Dave Airlie	82a2f10529	radv/gfx10: set the pgm rsrc3/4 regs using index sh reg set This is ported from AMDVLK, it's probably not requires unless we want to use "real time queues", but it might be nice to just have in place. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-18 10:24:26 +10:00
Dave Airlie	de524b2c37	radv: use correct register setter for ngg hw addr this shouldn't matter, but it's good to be correct. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-18 10:17:37 +10:00
Eric Anholt	9689407c54	freedreno/a6xx: Drop the WFI in the program update stateobj. Rob Clark thinks this was likely a workaround for our const buffer update bugs, and now that it's passing tests, we should be able to drop it. renderdoc-traces results: traces/android/clashofclans.rdc: +6.1% +/- 1.1% traces/android/candycrush.rdc: +5.2% +/- 1.6% Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Eric Anholt	2170822603	freedreno/a6xx: Drop the WFI in constant uploads. Now that the bin vs render constlen is fixed, we can skip these waits. Improves webgl aquarium performance at 10k fish from 27fps to 33. Some highlights from renderdoc-traces: traces/android/minecraft.rdc: +17.1% +/- 3.4% traces/glmark2/ideas-speed=duration.rdc: +11.6% +/- 2.4% traces/android/candycrush.rdc: +5.4% +/- 1.1% traces/android/clashofclans.rdc: +4.4% +/- 1.3% Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Eric Anholt	85bbdaff6c	freedreno: Assert that we don't exceed constlen. We actually could go up to vs->constlen in the binning shader on a6xx, but for sanity let's make sure that we're always under constlen. This would have caught the bug fixed in `572c76fd88` ("freedreno: Clamp UBO uploads to the constlen decided by the shader.") Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Eric Anholt	bc50ecfa7a	freedreno: Fix more constlen overflows. Fixes constlen overflow in dEQP-GLES31.functional.shaders.builtin_var.compute.num_work_groups and dEQP-GLES31.functional.image_load_store.buffer.image_size.readonly_32 and probably others. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Eric Anholt	b9f7f3e497	freedreno: Drop stale comment about skipping uploads. We already skip the upload if it's unused, due to the constlen > offset check. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Lepton Wu	6109df58e4	virgl: Set meta data for textures from handle. The set of meta data was removed by commit `8083464`. It broke lots of dEQP tests when running with pbuffer surface type. Fixes: `8083464013` ("virgl: remove dead code") Signed-off-by: Lepton Wu <lepton@chromium.org> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-07-17 16:17:48 -07:00
Bas Nieuwenhuizen	f1a8967344	radv: Only save the descriptor set if we have one. After reset, if valid does not contain the relevant bit the descriptor can be != NULL but still not be valid. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-18 00:49:43 +02:00
Lionel Landwerlin	ce4c5474af	anv: report timestampComputeAndGraphics true Spec says : "timestampComputeAndGraphics specifies support for timestamps on all graphics and compute queues. If this limit is set to VK_TRUE, all queues that advertise the VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT in the VkQueueFamilyProperties::queueFlags support VkQueueFamilyProperties::timestampValidBits of at least 36." On gen7+ this should be true (we only have 32bits of timestamp on gen6 and below). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `802f00219a` ("anv/device: Update features and limits") Reported-by: Timothy Strelchun <timothy.strelchun@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-17 22:46:58 +00:00
Rafael Antognolli	393f659ed8	iris: Enable fast clears on other miplevels and layers than 0. Until now we only supported fast clear colors on the first miplevel and layer. The main reason for it is that we can't have different fast clear values at different levels/layers, since the surface state only supports one clear value. We can, however, enable it if we make sure we only use the same value for all levels/layers, and if one of them changes, we resolve all the others. We already do that for depth fast clears so hopefully it will be fine for color fast clears too. v2: Add check for partial clear too (Ken). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-17 14:53:37 -07:00
Rafael Antognolli	8bbd4f32bf	iris: Allow resolving clear color of CCS_D surfaces. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-17 14:53:16 -07:00
Kenneth Graunke	df4c2ec5e1	iris: Make iris_has_color_unresolved non-static We want to use this in the transfer code and possibly for fast clears.	2019-07-17 13:43:04 -07:00
Andreas Bergmeier	f92290a8d9	broadcom: Move v3d_get_device_info to common In common we can use implementation for Vulkan.	2019-07-17 20:02:34 +00:00
Caio Marcelo de Oliveira Filho	891a232214	nir/large_constants: Use dominance information to find more constants Relax the restriction that all the writes need to be in the first block: now accept variables that have all the writes in the same block, and all the reads are dominated by that block. This let the pass identify large constants that are local to a helper function. The writes will be at the place that the function is inlined, possibly not in the first block (but still all in the same block). Results for vkpipeline-db in SKL: total instructions in shared programs: 3624891 -> 3623145 (-0.05%) instructions in affected programs: 79416 -> 77670 (-2.20%) helped: 16 HURT: 0 total cycles in shared programs: 1458149667 -> 1458147273 (<.01%) cycles in affected programs: 30154164 -> 30151770 (<.01%) helped: 14 HURT: 2 total loops in shared programs: 2437 -> 2437 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 8813 -> 8745 (-0.77%) spills in affected programs: 2894 -> 2826 (-2.35%) helped: 8 HURT: 0 total fills in shared programs: 23470 -> 23392 (-0.33%) fills in affected programs: 12248 -> 12170 (-0.64%) helped: 6 HURT: 2 LOST: 0 GAINED: 0 Results for shader-db in SKL with Iris: total instructions in shared programs: 15379442 -> 15379392 (<.01%) instructions in affected programs: 837 -> 787 (-5.97%) helped: 2 HURT: 2 helped stats (abs) min: 27 max: 27 x̄: 27.00 x̃: 27 helped stats (rel) min: 10.47% max: 10.67% x̄: 10.57% x̃: 10.57% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 1.23% max: 1.23% x̄: 1.23% x̃: 1.23% 95% mean confidence interval for instructions value: -39.14 14.14 95% mean confidence interval for instructions %-change: -15.51% 6.17% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4880 -> 4880 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 370677237 -> 370676567 (<.01%) cycles in affected programs: 17852 -> 17182 (-3.75%) helped: 2 HURT: 1 helped stats (abs) min: 338 max: 356 x̄: 347.00 x̃: 347 helped stats (rel) min: 13.98% max: 14.64% x̄: 14.31% x̃: 14.31% HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24 HURT stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18% total spills in shared programs: 11772 -> 11772 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 24948 -> 24948 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-17 12:50:32 -07:00
Jason Ekstrand	7ceec21b76	intel/fs: Use a strided MOV instead of a conversion for load_* destinations In many cases, the compiler can just copy-prop the strided MOV whereas the conversion is a bit trickier. This cuts 5% of the instructions off of one particular Vulkan CTS test which does lots of load_ssbo. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	812b341578	nir/algebraic: Optimize comparisons and up-casts These seem like obvious enough optimizations in the world of multiple integer bit sizes. The only known thing which hits these at the moment is some Vulkan CTS tests for 16-bit SSBO values which like to up-cast and check for equality. However, it's something that's bound to come up as we start seeing more integers in shaders. The optimizations of comparisons of casted values with constants are something which we would ideally do with range analysis. However, lacking that, we can do it in opt_algebraic as long as one side is a constant. In dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13, this commit, along with the previous commit, reduce the number of instructions emitted on Skylake from 55328 to 44546, a reduction of 20%. Acked-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	e8505e982a	nir/algebraic: Optimize comparing unpacked values We could, in theory, add the same optimization for 64-bit unpack operations but that's likely to fight with 64-bit integer lowering on platforms which require it so it will require more infrastructure before that will be a good idea. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	9fed031e4e	nir/algebraic: Print out the list of transforms in the C file This helps greatly when debugging algebraic transform generators because you can now actually see the output and verify that your transforms are getting generated. Acked-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	68a4c796d5	intel/fs: Properly stride NULL replacement regs in DCE This fixes some validation errors generated by certain D->W conversions but is likely not a full solution. Calculating an actual register stride is a far more complex problem in general and should probably be handled by the brw_fs_generator. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Eric Anholt	28a808a11b	nir: Fix nir_lower_alu_to_scalar's instr filtering. It was checking if the dest or src[0] SSA values were vectors, rather than whether the ALU op was using the source as a vector resulting in a nir_fdot4 making it through to vc4 and v3d: vec1 32 ssa_6 = fdot4 ssa_4.xxxx, ssa_5 Fixes: `c1cffa4249` ("nir/alu_to_scalar: Use the new NIR lowering framework") v2: Use Jason's recommendation to look at input_sizes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-17 10:30:43 -07:00

1 2 3 4 5 ...

113254 commits