fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 13:38:06 +02:00

Author	SHA1	Message	Date
Nicolai Hähnle	4dd86631f4	radeonsi: update a comment for merged shaders Reviewed: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:45 +02:00
Nicolai Hähnle	4738dd9546	radeonsi/gfx9: dump previous stage LLVM IR for merged shaders Reviewed: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:45 +02:00
Nicolai Hähnle	760876a7b1	radeonsi: make sure TCS main output VGPRs don't alias inputs Avoids an unnecessary move introduce by "radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)" Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:42 +02:00
Nicolai Hähnle	081ac6e5c6	radeonsi/gfx9: always wrap GS and TCS in an if-block (v2) With merged ESGS shaders, the GS part of a wave may be empty, and the hardware gets confused if any GS messages are sent from that wave. Since S_SENDMSG is executed even when EXEC = 0, we have to wrap even non-monolithic GS shaders in an if-block, so that the entire shader and hence the S_SENDMSG instructions are skipped in empty waves. This change is not required for TCS/HS, but applying it there as well simplifies the logic a bit. Fixes GL45-CTS.geometry_shader.rendering.rendering.* v2: ensure that the TCS epilog doesn't run for non-existing patches Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:32 +02:00
Nicolai Hähnle	873789002f	radeonsi/gfx9: fix vertex idx in ES with multiple waves per threadgroup Cc: mesa-stable@lists.freedesktop.org Reviewed: Marek Olšák <marek.olsak@amd.com>	2017-07-27 21:16:32 +02:00
George Kyriazis	194ff5eed1	swr: fix transform feedback logic The shader that is used to copy vertex data out of the vs/gs shaders to the user-specified buffer (streamout or SO shader) was not using the correct offsets. Adjust the offsets that are used just for the SO shader: - Make sure that position is handled in the same special way as in the vs/gs shaders - Use the correct offset to be passed in the core - consolidate register slot mapping logic into one function, since it's been calculated in 2 different places (one for calcuating the slot mask, and one for the register offsets themselves Also make room for all attibutes in the backend vertex area. Fixes: - all vtk GL2PS tests - 18 piglit tests (16 ext_transform_feedback tests, arb-quads-follow-provoking-vertex and primitive-type gl_points v2: - take care of more SGV slots in slot mapping logic - trim feState.vsVertexSize - fix GS interface and incorporate GS while calculating vsVertexSize Note that vsVertexSize is used in the core as the one parameter that controls vertex size between all stages, so it has to be adjusted appropriately for the whole vs/gs/fs pipeline. Also note that GS and SO is not fully implemented. This will be addressed later. fixes: - fixes total of 20 piglit tests CC: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-07-27 13:54:19 -05:00
Tim Rowley	e21fc2c625	swr/rast: non-regex knob fallback code for gcc < 4.9 gcc prior to 4.9 didn't implement <regex>, causing a startup crash in the swr knob parameter reading code. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-07-27 08:31:21 -05:00
Timothy Arceri	2c34b49d9e	mesa: check that buffer object is not NULL before initializing it Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-07-27 22:19:52 +10:00
Timothy Arceri	6ee3323d7d	glsl: small builtin inline tidy up Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-07-27 22:14:37 +10:00
Dave Airlie	c4652a0a5b	virgl: encode index buffer offset. Fixes arb_vertex_buffer_object-combined-vertex-index Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-07-27 16:10:07 +10:00
Michel Dänzer	57132d126f	st/mesa: Fix inversed test in st_api_destroy_drawable Fixes a drawable leak. Fixes: `bbc29393d3` ("st/mesa: create framebuffer iface hash table per st manager") Bugzilla: https://bugs.freedesktop.org/101930 Tested-by: Nick Sarnie <commendsarnex@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2017-07-27 11:12:24 +09:00
Dave Airlie	e77ff11ffe	radv/ac: port SI TC L1 write corruption fix. This ports `72e46c988` to radv. radeonsi: apply a TC L1 write corruption workaround for SI Fixes: `f4e499ec7` (radv: add initial non-conformant radv vulkan driver) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-07-26 23:39:24 +01:00
Dave Airlie	d4b079e708	radv/winsys: fix padding command stream for SI We were adding pad to size after creating the object, so we could submit a CS bigger than the bo created for it. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-07-26 23:38:23 +01:00
Dave Airlie	a81e99f50a	radv/ac: realign SI workaround with radeonsi. This ports: `da7453666a` radeonsi: don't apply the Z export bug workaround to Hainan to radv. Just noticed in passing. Fixes: `f4e499ec7` (radv: add initial non-conformant radv vulkan driver) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-07-26 23:38:17 +01:00
Jason Ekstrand	f6e478c213	i965/clear: Don't perform redundant depth clears We already have this little optimization for color clears. Now that we're actually tracking whether or not a slice has any fast-clear blocks, it's easy enough to add for depth clears too. Improves performance of GFXBench 4 TRex at 1920x1080 by: - Skylake GT4: 0.905932% +/- 0.0620197% (n = 30) - Apollolake: 0.382434% +/- 0.1134730% (n = 25) v2: (by Ken) Rebase and drop intel_mipmap_tree.c changes, as they're no longer necessary (other patches already landed to do that part) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-07-26 14:43:01 -07:00
Jason Ekstrand	6db193701e	i965: Only do depth resolves prior to clearing when needed When changing the clear value, we need to resolve any fast cleared data. Previously, we were performing resolves on every slice with HiZ enabled. We only need to resolve slices that a) have fast clear data, and b) aren't about to be cleared to the new color. In the latter case, we were actually doing a resolve, and then a fast clear - when we could skip both, causing the existing fast cleared area to be updated to the new clear value for no additional work. This patch stops using intel_miptree_prepare_access in favor of a more optimal open coded loop that knows about our clear operation. v2: (by Ken) Rebase on islification, write a real commit message. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-07-26 14:43:01 -07:00
Kenneth Graunke	e1d4030b0b	i965: Expose get_num_logical_layers outside of intel_mipmap_tree.c. I want to use it in brw_clear.c. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-07-26 14:43:00 -07:00
Marek Olšák	5e81df0f10	ac/surface: fix hybrid graphics where APU=GFX9, dGPU=older v2: don't do it for compressed textures (bpp = 0) Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2017-07-26 19:53:26 +02:00
Marek Olšák	ed2b3f5c81	radeonsi: decrease the number of compiler threads Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-26 19:53:26 +02:00
Marek Olšák	433f6f7ac9	gallium/radeon: make S_FIXED function signed and move it to shared code This fixes a bug uncovered by: `2412c4c81e` util: Make CLAMP turn NaN into MIN. Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-26 19:53:26 +02:00
Marek Olšák	033b4e4340	st/mesa: also clamp and quantize per-unit lod bias Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-26 19:53:26 +02:00
Marek Olšák	914f11e75b	st/mesa: fix unconditional return in st_framebuffer_iface_remove Noticed by James Legg @ Feral. Cc: 17.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-07-26 16:47:17 +02:00
Marek Olšák	a7617a49fb	drirc: whitelist glthread for Mount and Blade Warband From 25-26 min fps to 31, used the game in conjuction with a mod (full invasion 2) beaumaris castle map and 200 bots.	2017-07-26 15:23:00 +02:00
Grigori Goronzy	39bf7756b9	egl: move KHR_no_error vs debug/robustness check further down We'll fail to flag an error if the context flags appear after the no-error attribute in the context attribute list. Delay the check to after attribute parsing to fix this. Fixes: `4909519a66` ("egl: Add EGL_KHR_create_context_no_error support") Cc: mesa-stable@lists.freedesktop.org [Emil Velikov: add fixes/stable tags, commit message polish] Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-07-26 11:50:32 +01:00
Andres Rodriguez	a973b9a9f8	radv: rename physical_device->uuid[] to cache_uuid[] We have a few UUIDs, so lets be more specific. Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-07-26 20:42:36 +10:00
Nicolai Hähnle	a0e6b9a2db	radeonsi/gfx9: reduce max threads per block to 1024 on gfx9+ The number of supported waves per thread group has been reduced to 16 with gfx9. Trying to use 32 waves causes hangs, and barriers might not work correctly with > 16 waves. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-26 11:51:00 +02:00
Nicolai Hähnle	65fbaab0b7	radeonsi: fix detection of DRAW_INDIRECT_MULTI on SI The firmware version numbers for SI were wrong. The new numbers are probably too conservative (we don't have a definitive answer by the firmware team), but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on Tahiti (by Gustaw) and on Verde (by myself). While this is technically adding a feature, it's a feature we thought we had for a long time. The change is small enough and we're early enough in the 17.2 release cycle that it should still go in. Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com> Cc: 17.2 <mesa-stable@lists.freedesktop.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-26 11:48:32 +02:00
Iago Toral Quiroga	31f1863ace	anv: only expose up to 28 vertex attributes The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs. However, the maximum allowed value of "Vertex URB Entry Read Length" in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements. Because we also need to reserve a vertex buffer to upload VertexIndex/InstanceIndex and another to upload DrawID when needed, we can only expose 28. Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-07-26 08:16:43 +02:00
Iago Toral Quiroga	a848e693ef	anv/cmd_buffer: fix off by one error in assertion Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-07-26 08:02:06 +02:00
Kenneth Graunke	445367242a	i965: Shut up Coverity warning about HiZ buffers. Here the AUX_USAGE_* mode indicates that we have HiZ, so we will have a HiZ buffer. But Coverity doesn't know that, so it thinks it might be NULL because we checked hiz_buf != NULL earlier. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-07-25 22:14:21 -07:00
Kenneth Graunke	698636cc97	i965: Fix = vs == in MCS aux usage assert. Caught by Coverity (CID 1415680). Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-07-25 22:14:21 -07:00
Kenneth Graunke	f6e674fa51	i965: Fix offset addition in get_isl_surf. Increase the value, not the pointer to the stack variable. Caught by Coverity (CID 1415574). Not shipped in a real release. Cc: "17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-07-25 22:14:21 -07:00
Andres Rodriguez	7b48163d7c	mesa/st: fix inconsistent indentation of st_cb_bufferobjects.c No changes, just re-indent. Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-07-26 14:54:46 +10:00
Timothy Arceri	b0333e55b7	compiler: move glsl_interface_packing enum to shader_enums.h This allows us to drop the duplicate gl_uniform_block_packing enum. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-26 10:39:52 +10:00
Timothy Arceri	7ee383669f	mesa/st: fix unused variable warnings Reviewed-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-26 10:39:52 +10:00
Timothy Arceri	87e5f39cf1	mesa/st: move st_pipe_format_to_mesa_format() call to where its used Reviewed-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-26 10:39:52 +10:00
Timothy Arceri	17f05e52e7	gallium/util: fix unused variable warning Reviewed-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-07-26 10:39:52 +10:00
Timothy Arceri	5fac8c116e	mesa: drop useless assert NewBufferObj() is called when the shared state is allocated so we wouldn't get this far if it was NULL. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-07-26 10:16:20 +10:00
Timothy Arceri	6be1c69b97	mesa: call binding functions directly from glDeleteBuffers This avoids useless error checking. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-07-26 10:16:20 +10:00
Timothy Arceri	003c8b1167	mesa: move static binding functions above _mesa_DeleteBuffers() Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-07-26 10:16:20 +10:00
Timothy Arceri	4943353bff	mesa: don't try to re-generate the default buffer It should have been created by this point. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-07-26 10:16:20 +10:00
Eric Anholt	4d4872708e	broadcom/vc4: Switch the V3D 2.1 XML over to restricted address fields. This keeps the flags out of v3d_decode.c's output. In the generated code, only the unpack functions see any change (where they now get the restricted start value), and vc4 doesn't use the unpack functions yet.	2017-07-25 14:55:12 -07:00
Eric Anholt	82fdc10606	broadcom/genxml: Support address fields with <32 bits I was writing the XML such that the address field overlapped various flags in the alignment bits, which caused pain when trying to unpack for decode. Instead, keep the XML matching the docs (address fields don't overlap), and just infer the appropriate shift value during decode. During pack, the address is just applied to the appropriate bits already, ignoring the sub-byte start/end fields.	2017-07-25 14:55:12 -07:00
Eric Anholt	53492917e2	broadcom/vc4: Use the RA callback to improve register selection's choices. We simply pick r4 if available (anything else would force a MOV), then round-robin through accumulators (avoids physical regfile RAW delay slots), then round-robin through the physical regfile. The effect on instruction count is pretty impressive: total instructions in shared programs: 76563 -> 74526 (-2.66%) instructions in affected programs: 66463 -> 64426 (-3.06%) and we could probably do better with a little heuristic of "if we're going to choose a physical reg, and other operands of instructions using this as a src have the same physical regfile, then use the other regfile".	2017-07-25 14:55:10 -07:00
Eric Anholt	7a34a0e890	ra: Add a callback for selecting a register from what's available. VC4 has had a tension, similar to pre-Sandybridge Intel, where we want to use low-numbered registers (more parallelism on Intel, fewer delay slots on vc4), but in order to give instruction scheduling the most freedom to avoid delays we want to round-robin between registers of the same cost. Our two heuristics so far have chosen one end or the other of that tradeoff. The callback, instead, hands the driver the set of registers that are available, and the driver gets to make its own choice. This will be used in vc4 to round-robin between registers of the same cost, and might be used in the future for improving bank selection. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-25 14:44:52 -07:00
Eric Anholt	3dae034423	ra: Don't put a node in its own adjacency set. All the paths looping over adjacency had guards against considering themselves (the non-obvious one was ra_any_neighbors_conflict(), which has in_stack set). Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-25 14:44:52 -07:00
Eric Anholt	30146f29a7	ra: Pull the body of a loop out to a helper function. I was going to indent this code another level, and decided it would be easier to read as a helper. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-07-25 14:44:52 -07:00
Eric Anholt	16e17ce04b	broadcom/vc4: Scissor blits performed using the rendering engine. Without this, a BlitFramebuffer would mark the whole framebuffer as being changed (so we emit loads/stores of all of it) rather than just the modified subset.	2017-07-25 14:44:52 -07:00
Eric Anholt	93fec49a75	broadcom/vc4: Prefer blit via rendering to the software fallback. I don't know how I managed to leave this here for so long. Found when working on a 1:1 overlapping blit extension for X11. Cc: mesa-stable@lists.freedesktop.org	2017-07-25 14:44:52 -07:00
Eric Anholt	b3c78a51f3	broadcom/vc4: Switch the Viewport Center fields to a fixed-point representation. This gets us automatic CL decoding to a floating-point value, and drops a magic number from the emit code. 250x250 shader runner tests now say they have a center of 125.0 instead of 2000.	2017-07-25 14:44:52 -07:00

... 4 5 6 7 8 ...

94688 commits