fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-04-05 06:00:36 +02:00

Author	SHA1	Message	Date
Tapani Pälli	ac557b4c12	mesa: fix error reported on gTexSubImage2D when level not valid Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>	2014-10-10 15:01:51 +03:00
Kenneth Graunke	94841b6d5d	i965: Fix register write checks. When mapping the buffer a second time, we need to use the new pointer, not the one from the previous mapping. Otherwise, we will most likely crash. Apparently, we've just been getting lucky and getting the same bo->virtual pointer in both cases. libdrm probably has a hand in that. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2014-10-10 00:04:39 +02:00
Eric Anholt	7e67ea994c	vc4: Optimize out adds of 0.	2014-10-09 21:47:06 +02:00
Eric Anholt	0401f55fff	vc4: Optimize fmul(x, 0) and fmul(x, 1). This was being generated frequently by matrix multiplies of 2 and 3-channel vertex attributes (which have the 0 or 1 loaded in the shader).	2014-10-09 21:47:06 +02:00
Eric Anholt	1cd8c1aab0	vc4: Factor out the turn-it-into-a-mov in opt_algebraic. This will be used more in the next commits.	2014-10-09 21:47:06 +02:00
Eric Anholt	40748cf8d9	vc4: Eliminate unused texture instructions.	2014-10-09 21:47:06 +02:00
Eric Anholt	b73cab6826	vc4: Dead code eliminate unused SF instructions.	2014-10-09 21:47:06 +02:00
Eric Anholt	93cac2637b	vc4: Prevent copy propagating out the MOVs from r4. Copy propagating these might result in reading the r4 after some other instruction has written r4. Just prevent all copy propagation of this for now. Fixes bad rendering with upcoming indirect register access support, where the copy propagation was consistently happening across another read.	2014-10-09 21:47:06 +02:00
Eric Anholt	c4b0dd5356	vc4: Split the coordinate shader to its own vc4_compiled_shader. Merging VS and CS into the same struct wasn't winning us anything except for not allocating a separate BO (but if we want to pack programs into BOs, we should pack not just those 2 programs together). What it was getting us was a bunch of code duplication about hash table lookups and propagating vc4_compile contents into a vc4_compiled_shader. I was about to make the situation worse with indirect uniform buffer access.	2014-10-09 21:47:06 +02:00
Eric Anholt	5c72d7706c	vc4: Add #defines for the texture uniform fields. I wanted to make another set of texture uploads for handling reladdr constants, and duplicating all the bitshifting looked like a terrible idea. In the process, this fixes a swap of the s/t texture wrap modes.	2014-10-09 21:47:06 +02:00
Eric Anholt	5cfab07639	vc4: Initialize undefined temporaries to 0. Under the simulator, reading registers before writing them triggers an assertion failure. c->undef gets treated as r0, which will usually be written, but not if it's used in the first instruction. We should definitely not be aborting in this case, and return some sort of undefined value instead. Fixes glsl-user-varying-ff.	2014-10-09 21:47:06 +02:00
Kenneth Graunke	4ce11de4ae	i965: Skip uploading border color when unnecessary. The border color is only needed when using the GL_CLAMP_TO_BORDER or (deprecated) GL_CLAMP wrap modes; all others ignore it, including the common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes. In those cases, we can skip uploading it entirely, saving a bit of space in the batchbuffer. Instead, we just point it at the start of the batch (offset 0); we have to program something, and that address is safe to read. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-10-09 15:43:18 +02:00
Kenneth Graunke	b7844d1248	i965: Use BDW_MOCS_PTE for renderbuffers. Write-back caching cannot be used for buffers being scanned out by the display engine; surfaces used for scan-out must be write-through or uncached. I originally chose WT for render targets because it works in all cases. However, we really want to use write-back caching where possible, as it is more efficient. Most renderbuffers are not used for scanout - off-screen FBOs certainly are fine, and non-pageflipped backbuffers should be fine as well. So in most cases WB will work. However, we don't know what will be used for scan-out, so we instead simply use the PTE value specified by the kernel, as it knows these things. This matches our MOCS choice on Haswell. Fixes performance regressions since commit `ee4484be3d` in a microbenchmark (spotted by Eero Tamminen). Improves performance in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a Broadwell GT2. Improves performance in a bunch of other microbenchmarks by ~15% or so. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reported-by: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Cc: mesa-stable@lists.freedesktop.org	2014-10-09 15:03:32 +02:00
Kenneth Graunke	d489a8a73d	i965: Add a BRW_MOCS_PTE #define. Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all three caches (L3, LLC, and eLLC where available), but leaves the LLC caching mode up to the kernel's page table entry. This allows the kernel to pick WB/WT/UC based on whether it's using a buffer for scanout. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Cc: mesa-stable@lists.freedesktop.org	2014-10-09 15:02:55 +02:00
Kenneth Graunke	8892a6b79c	mesa: Make _mesa_print_arrays use stderr. These days, most driver debug output happens via stderr, not stdout. Some applications (such as Xephyr) also appear to close stdout which makes these messages go nowhere. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2014-10-09 15:02:42 +02:00
Michel Dänzer	7b4276d7ac	r600g,radeonsi: Always use GTT again for PIPE_USAGE_STREAM buffers Putting those in VRAM can cause long pauses due to buffers being moved into / out of VRAM. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84662 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2014-10-09 18:11:44 +09:00
Eric Anholt	5a13522898	vc4: Optimize SF(ITOF(x)) -> SF(x). This is a common production of st_glsl_to_tgsi, because CMP takes a float argument.	2014-10-09 11:01:18 +02:00
Eric Anholt	00a9aebfe0	vc4: Add some optimization of FADD(FSUB(0, x)). This is a common production of st_glsl_to_tgsi, which uses negate flags on source arguments to handle subtraction.	2014-10-09 11:01:18 +02:00
Eric Anholt	67aea92964	vc4: Mostly fix offset calculation for NPOT mipmap levels. The non-base NPOT levels are stored as POT-aligned images. We get that POT alignment by minifying the POT-aligned base level. This means that level strides are also POT aligned, so we have to tell the rendering mode config that our resource is larger than the actual requested area. Fixes the fbo-generatemipmap-formats NPOT cases. Regresses depthstencil-render-miplevels 273 * -- the texture presentation now works (where it was completely broken before), it looks like there's some overflow of image bounds happening at the lower miplevels.	2014-10-09 11:01:09 +02:00
Eric Anholt	0b96a086cb	vc4: Move the mirrored kernel code to a kernel/ directory. Now this whole setup matches the kernel's file layout much more closely.	2014-10-09 09:46:39 +02:00
Eric Anholt	ef9914aa74	vc4: Enable LIT lowering in TGSI instead of our own code. This brings us the -128/128 clamping on the w component.	2014-10-08 22:47:39 +02:00
Eric Anholt	9773d45908	vc4: Fix scalar math opcodes to replicate their result from the X channel. Thanks to robclark for pointing out that I was probably failing to do this when I reported a "bug" in his lowering code.	2014-10-08 22:47:39 +02:00
Chia-I Wu	4e50a32be6	ilo: fix rectlist on GEN7+ It was broken by `343b014b57`. Signed-off-by: Chia-I Wu <olvaffe@gmail.com>	2014-10-09 03:37:04 +08:00
Eric Anholt	581418585e	vc4: Add support for two-sided color. It's fairly easy, thanks to Rob Clark's lowering code. Fixes two-sided-lighting and 4 vertex-program-two-side testcases, while regressing 8 testcases that involve enabling two-sided color while only initializing one of the two colors in the VS. If you're enabling two sided color, it's of course expected that you really do set up both colors, so this is still an improvement (and when we set up a linker for TGSI, we'll hopefully fix those 8 fails).	2014-10-08 17:45:16 +02:00
Eric Anholt	4dccdbf5cb	vc4: Enable POW lowering in TGSI instead of our own code.	2014-10-08 17:42:59 +02:00
Eric Anholt	1aef5a337f	vc4: Enable DP lowering in TGSI instead of our own code.	2014-10-08 17:42:59 +02:00
Eric Anholt	4f6e4c7370	vc4: Start using tgsi_lowering for opcodes we haven't supported before.	2014-10-08 17:42:59 +02:00
Eric Anholt	f9854e169f	gallium: Rename freedreno parts of tgsi_lowering.[ch]. Acked-by: Rob Clark <robclark@freedesktop.org>	2014-10-08 17:42:59 +02:00
Eric Anholt	19df602b39	gallium: Reformat tgsi_lowering.c for the normal style. Acked-by: Rob Clark <robclark@freedesktop.org>	2014-10-08 17:42:59 +02:00
Eric Anholt	3141dc8e87	gallium: Copy fd_lowering.[ch] to tgsi_lowering.[ch] for code sharing. Lots of drivers need to transform the weird instructions in TGSI into reasonable scalar ops, and this code can make those translations canonical. Acked-by: Rob Clark <robclark@freedesktop.org>	2014-10-08 17:42:59 +02:00
Eric Anholt	84caf5a861	vc4: Set unused raddr fields to QPU_R_NOP. The simulator assertion fails if you have a write to a reg and then a read (for example, in the NOP side of an instruction), even if the read isn't used for anything. By setting unused raddrs to NOP, we avoid the problem (since only the phsyical registers are tracked).	2014-10-08 17:42:59 +02:00
Eric Anholt	48af7426f2	vc4: Abstract out the field-merging logic for instructions. I'm going to be doing the same logic for some more fields next.	2014-10-08 17:42:59 +02:00
Niels Ole Salscheider	acdcef6788	r600: Use DMA transfers in r600_copy_global_buffer v2: Do not demote items that are already in the pool Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>	2014-10-07 15:59:43 -04:00
Iago Toral Quiroga	fd31628c49	glsl: Optimize min/max expression trees Original patch by Petri Latvala <petri.latvala@intel.com>: Add an optimization pass that drops min/max expression operands that can be proven to not contribute to the final result. The algorithm is similar to alpha-beta pruning on a minmax search, from the field of AI. This optimization pass can optimize min/max expressions where operands are min/max expressions. Such code can appear in shaders by itself, or as the result of clamp() or AMD_shader_trinary_minmax functions. This optimization pass improves the generated code for piglit's AMD_shader_trinary_minmax tests as follows: total instructions in shared programs: 75 -> 67 (-10.67%) instructions in affected programs: 60 -> 52 (-13.33%) GAINED: 0 LOST: 0 All tests (max3, min3, mid3) improved. A full shader-db run: total instructions in shared programs: 4293603 -> 4293575 (-0.00%) instructions in affected programs: 1188 -> 1160 (-2.36%) GAINED: 0 LOST: 0 Improvements happen in Guacamelee and Serious Sam 3. One shader from Dungeon Defenders is hurt by shader-db metrics (26 -> 28), because of dropping of a (constant float (0.00000)) operand, which was compiled to a saturate modifier. Version 2 by Iago Toral Quiroga <itoral@igalia.com>: Changes from review feedback: - Squashed various cosmetic changes sent by Matt Turner. - Make less_all_components return an enum rather than setting a class member. (Suggested by Mat Turner). Also, renamed it to compare_components. - Make less_all_components, smaller_constant and larger_constant static. (Suggested by Mat Turner) - Change mixmax_range to call its limits "low" and "high" instead of "range[0]" and "range[1]". (Suggested by Connor Abbot). - Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by Connor Abbot). - Make the logic more clearer by rearrenging the code and commenting. (Suggested by Connor Abbot). - Added comment to explain why we need to recurse twice. (Suggested by Connor Abbot). - If we cannot prune an expression, do not return early. Instead, attempt to prune its children. (Suggested by Connor Abbot). Other changes: - Instead of having a global "valid" visitor member, let the various functions that can determine this status return a boolean and check for its value to decide what to do in each case. This is more flexible and allows to recurse into children of parents that could not be prunned due to invalid ranges (so related to the last bullet in the review feedback). - Make sure we always check if a range is valid before working with it. Since any use of get_range, combine_range or range_intersection can invalidate a range we should check for this situation every time we use any of these functions. Version 3 by Iago Toral Quiroga <itoral@igalia.com>: Changes from review feedback: - Now we can make get_range, combine_range and range_intersection static too (suggested by Connor Abbot). - Do not return NULL when looking for the larger or greater constant into mixed vector constants. Instead, produce a new constant by doing a component-wise minmax. With this we can also remove of the validations when we call into these functions (suggested by Connor Abbot). - Add a comment explaining the meaning of the baserange argument in prune_expression (suggested by Connor Abbot). Other changes: - Eliminate minmax expressions operating on constant vectors with mixed values by resolving them. No piglit regressions observed with Version 3. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2014-10-07 12:37:51 +02:00
Tapani Pälli	16b53005a7	glsl: do not emit error for non written varyings on OpenGL ES Patch fixes following test case from 'shaders-with-varyings' WebGL conformance suite: "vertex shader with unused varying and fragment shader with used varying must succeed" v2: emit still a warning if the condition happens (Ian) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-10-07 08:28:51 +03:00
Michel Dänzer	be0a994fb8	radeonsi: Use dummy pixel shader if compilation of the real shader failed Instead of crashing. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79155#c5 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2014-10-07 12:07:13 +09:00
Chia-I Wu	f358462640	ilo: let shaders determine surface counts When a shader needs N surfaces, we should upload N surfaces and not depend on how many are bound. This commit is larger than it should be because we did not export how many surfaces a surface uses before. Signed-off-by: Chia-I Wu <olvaffe@gmail.com>	2014-10-06 15:10:30 +08:00
Chia-I Wu	ca824e6940	ilo: let shaders determine sampler counts When a shader needs N samplers, we should upload N samplers and not depend on how many are bound. Signed-off-by: Chia-I Wu <olvaffe@gmail.com>	2014-10-04 23:18:51 +08:00
Marek Olšák	0c4bc1e292	tgsi: change tgsi_shader_info::properties to a one-dimensional array Reviewed-by: Roland Scheidegger <sroland@vmware.com> v2: fix svga too	2014-10-04 15:36:39 +02:00
Marek Olšák	1f6c0b55df	radeonsi: set number of userdata SGPRs of GS copy shader to 4 It only needs the constant buffer with clip planes and read-write resources for the GS->VS ring and streamout. That's 2 pointers. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:15 +02:00
Marek Olšák	68d36c0bb5	radeonsi: pass the GS shader directly to si_generate_gs_copy_shader Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:15 +02:00
Marek Olšák	aeb05f011e	radeonsi: set LLVMByValAttribute for all descriptor arrays I hope this is correct. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:15 +02:00
Marek Olšák	91f1a79f78	radeonsi: make the vertex shader key smaller We only support 16 vertex attribs, not 32. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	90611297fa	radeonsi: don't flush shader caches when building PM4 shader states This is a wrong place to flush caches to say the least. I don't think we need to flush the instruction caches if we don't patch shaders with DMA. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	10e386f4aa	radeonsi: remove interp_at_sample from the key, use TGSI_INTERPOLATE_LOC_SAMPLE st/mesa has the same flag in its shader key, we don't need to do it in the driver anymore. Instead, use TGSI_INTERPOLATE_LOC_SAMPLE, which is what st/mesa sets. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	0a2d6f0c4e	radeonsi: move geometry shader properties from si_shader to si_shader_selector Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	54de709911	radeonsi: always compile shaders on demand The first compiled shader is sometimes useless, because the key doesn't match the key for the draw call where it's used. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	6c9f61c97e	radeonsi: remove unused variable si_shader::gs_input_prim Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00
Marek Olšák	7dc0164192	tgsi: remove some not so useful variables from tgsi_shader_info	2014-10-04 15:16:14 +02:00
Marek Olšák	8860584045	radeonsi: get fs_write_all from tgsi_shader_info directly Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2014-10-04 15:16:14 +02:00

1 2 3 4 5 ...

65954 commits