fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 11:28:05 +02:00

Author	SHA1	Message	Date
Rob Clark	9a9f2a893b	freedreno/ir3: simplify RA Group inputs/outputs, in addition to fanin/fanout, as they must also exist in sequential scalar registers. This lets us simplify RA by working in terms of neighbor groups. NOTE: has the slight problem that it can't optimize out mov's for things like: MOV OUT[n], IN[m] To avoid this, instead of trying to figure out what mov's we can eliminate, we first remove all mov's prior to grouping, and then re-insert mov's as needed while grouping inputs/outputs/fanins. Eventually we'd prefer the frontend to not insert extra mov's in the first place (so we don't have to bother removing them). This is the plan for an eventual NIR based frontend, so separate out the instr grouping (which will still be needed for NIR frontend) from the mov elimination (which won't). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	dddfe6c21e	freedreno/ir3: regmask support for relative addr For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't need to support an arbitrary mask. So for this case use a simple size field rather than a bitmask. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	9bb865b3cf	freedreno/ir3: split up ssa_src Slight bit of refactoring that will be needed for indirect gpr addressing (TEMP[ADDR[]]). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	d15db9e7c0	freedreno/ir3: drop instr_clone() stuff Unnecessary and overly complicated. And gets in the way for temp arrays (TEMP[ADDR[]]). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	212b909643	freedreno/ir3: runtime enable RA debug for DEBUG builds Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	8c3952051e	freedreno/ir3: handle relative addr in ir3_dump Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	56370b9feb	freedreno/ir3: legalize vs unused sam dst components We probably could be more clever elsewhere and mask out components that are not used. But either way, legalize should realize that there is also a write-after-write hazard with texture sample instructions. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	063e2ef76a	freedreno/ir3: hack for old compiler Old compiler doesn't have ir3_block's.. so we need a special path. This hack can be dropped when ir3_compiler_old is retired. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	18899d1b80	tgsi: track max array per file NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can optionally have them. So we implicitly assume that ArrayID==0 always exists for each file. This is why array_max[file] is never less than zero. You can tell from indirect_files(_read/written) if the legacy array- id zero was actually used. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	49b4a6331f	tgsi: keep track of read vs written indirects At least temporarily, I need to fallback to old compiler still for relative dest (for freedreno), but I can do relative src temp. Only a temporary situation, but seems easy/reasonable for tgsi-scan to track this. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-01-07 19:37:28 -05:00
Marek Olšák	d7cd9bfc7f	Revert "radeonsi: reduce the size of si_pm4_state" This reverts commit `9141d88555`. It broke OpenCL.	2015-01-08 00:10:36 +01:00
Tom Stellard	e28f9d0e60	radeonsi: Fix crash when destroying si_screen We were invalidating si_screen:tm by calling r600_destroy_common_screen() which frees the si_screen object. This caused the driver to crash in LLVMDisposeTargetMachine() since we were passing it an invalid pointer. https://bugs.freedesktop.org/show_bug.cgi?id=88170	2015-01-07 16:28:40 -05:00
José Fonseca	2b7fd5b11d	mesa: Don't use _mesa_generic_nop on Windows. It doesn't work on Windows because of STDCALL calling convention -- it's the callee responsibility to pop the arguments, and the number of arguments vary with the prototype --, so the stack pointer ends up getting corrupted. This is just a non-invasive stop-gap fix. A proper fix would be more elaborate, and require either: - a variation of __glapi_noop_table which sets GL_INVALID_OPERATION error - stop using APIENTRY on all internal _mesa_* functions. Tested with piglit gl-1.0-beginend-coverage (it now fails instead of crashing). VMware PR1350505 Reviewed-by: Brian Paul <brianp@vmware.com>	2015-01-07 19:35:35 +00:00
José Fonseca	fd1f79f7dd	glapi: Force frame pointer elimination on Windows. To catch mismatches in cdecl vs stdcall calling convention. See code comment for more detailed explanation. Tested with piglit gl-1.0-beginend-coverage (it now also crashes on debug builds.) VMware PR1350505. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-01-07 19:35:34 +00:00
Marek Olšák	1829f9c928	radeonsi: enable LLVM optimizations that assume no NaNs for non-compute shaders v2: complete rewrite Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-01-07 18:27:54 +01:00
Marek Olšák	d8185aa9a8	radeonsi: emit SURFACE_SYNC last This fixes a case where a transform feedback buffer is fed back as an index buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	7c9ec6ca7e	radeonsi: flush all CB/DB caches unconditionally when changing the framebuffer This is easier to read and will work better with shader image stores. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	a1bbccf521	radeonsi: change TC cache flushing strategy for textures Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	ca9c5b2be5	radeonsi: improve and fix streamout flushing - we don't usually need to flush TC L2 - we should flush KCACHE (not really an issue now since we always flush KCACHE when updating descriptors, but it could be a problem if we used CE, which doesn't require flushing KCACHE) - add an explicit VS_PARTIAL_FLUSH flag Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	18a30c9778	radeonsi: use TC L2 for CP DMA operations with shader resources on CIK So that TC L2 doesn't need to be flushed. The only problem is with index buffers, which don't use TC. A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty). Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	11b76369f5	radeonsi: use TC L2 for updating descriptors on CIK This allows not flushing TC L2 on CIK later. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	02ba7334d3	radeonsi: don't use TC L2 for updating descriptors on SI It's causing problems, because we mix uncached CP DMA with cached WRITE_DATA when updating the same memory. The solution for SI is to use uncached access here, because CP DMA doesn't support cached access. CIK will be handled in the next patch. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	edf18da85d	radeonsi: only flush the right set of caches for CP DMA operations That's either framebuffer caches or caches for shader resources. The motivation is that framebuffer caches need to be flushed very rarely here. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	73c2b0d18c	radeonsi: implement separate ICACHE and KCACHE flush for SI Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	0aecf9e2d1	radeonsi: add a combined flag for flushing a framebuffer Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	2bfe9d4538	radeonsi: rename flush flags, split the TC flag into L1 and L2 Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	d217819e78	r600g,radeonsi: separate cache flush flags I will rename them for radeonsi. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	d14f2ab4ad	r600g: move r6xx-specific streamout flush flagging into r600g Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	0543630d0b	radeonsi: only set BC_OPTIMIZE_DISABLE when necessary SPI_PS_IN_CONTROL is moved into the SPI mapping state. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	5d8e838dae	radeonsi: do not define FACE as an ordinary PS input Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	15a7fff69a	radeonsi: remove flatshade from the shader key Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	13de9475fc	radeonsi: remove special handling of TGSI_INTERPOLATE_COLOR in shader codegen It doesn't do anything useful. And colors are floating-point, so we can use fs.interp, remove "flatshade" from the shader key, and rely on the FLAT_SHADE state only (in the next patch). Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	e3d4bdd6a8	radeonsi: implement VERTEXID_NOBASE and BASEVERTEX system values Only done for completeness. Not used by anything yet. Tested by advertising PIPE_CAP_VERTEXID_NOBASE. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	d7c6f397f4	radeonsi: fix VertexID for OpenGL This fixes all failing piglit VertexID tests. Cc: 10.4 <mesa-stable@lists.freedesktop.org> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	368b0a7340	radeonsi: clarify a hw bug in shader exports Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	d1d2af2398	radeonsi: use ordered compares for SSG and face selection Ordered compares are what you have in C. Unordered compares are the result of negating ordered compares (they return true if either argument is NaN). That special NaN behavior is completely useless here, and unordered compares produce horrible code with all stable LLVM versions. (I think that has been fixed in LLVM git) Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	a38e8de643	radeonsi: remove unused and not useful variables Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	638fa8016a	radeonsi: remove init config from states It really doesn't do anything there. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	9141d88555	radeonsi: reduce the size of si_pm4_state - the relocs array is unused, remove it - ndw is at most 115 (init), set 140 as the maximum - compute needs 4 buffers per state, graphics only needs 1; set 4 as the maximum Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00
Marek Olšák	1b82eb677d	tgsi: add uses_centroid into tgsi_shader_info	2015-01-07 12:06:43 +01:00
Marek Olšák	eaae92a349	st/mesa: fix GL_PRIMITIVE_RESTART_FIXED_INDEX Cc: 10.2 10.3 10.4 <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-07 12:06:43 +01:00
Marek Olšák	8f5d309521	vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays From GL 4.4 Core profile: If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not performed for array elements transferred by any drawing command not taking a type parameter, including all of the Draw commands other than DrawEle- ments. Cc: 10.2 10.3 10.4 <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-01-07 12:06:42 +01:00
Eric Anholt	426fd535d9	vc4: Fix scaling W projection of the Z coordinate when there's a Z offset. Fixes piglit glsl-fs-fragcoord-zw-perspective, es3conform gl_FragCoord_z_frag, and the rest of the piglit glsl 1.10 interpolation tests.	2015-01-06 17:22:13 -08:00
Eric Anholt	49b5c901e8	vc4: Fix deletion from the program cache. They key is, oddly enough, in the key field, not in the data field (which is the vc4_compiled_shader *). Fixes regular failures in fp-long-alu.	2015-01-06 15:41:36 -08:00
Eric Anholt	b295403971	vc4: Skip storing the Z/S contents when it's invalidated. Improves framerate of 5 seconds of es2gears by 1.57473% +/- 0.669409% (n=67). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-01-06 15:40:41 -08:00
Eric Anholt	239db93888	gallium: Plumb the swap INVALIDATE_ANCILLARY flag through more layers. v2: Instead of telling the driver that the window system ancillaries have been invalidated (when the driver doesn't know which of its buffers are the window system's!), introduce a method for invalidating specific surfaces. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-01-06 15:40:41 -08:00
Eric Anholt	70e8ccc459	egl: Inform the client API when ancillary buffers may become undefined. This is part of the EGL spec, and is useful for a tiled renderer to avoid the memory bandwidth cost of storing the depth/stencil buffers. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-01-06 15:40:40 -08:00
Vinson Lee	5ae1305124	ax_prog_flex.m4: Merge upstream OpenBSD fixes. Merge the following upstream autoconf-archive patches. ax_prog_flex: change grep syntax to accept e.g. "flex.real" in case a wrapper or symlink is used. AX_PROG_FLEX: avoid use of grep empty string escape extension (fix for OpenBSD) AX_PROG_FLEX: Also accept gflex. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jonathan Gray <jsg@openbsd.org>	2015-01-06 15:06:54 -08:00
Tom Stellard	a8ef880a1b	radeon/llvm: Use amdgcn triple for SI+ on LLVM >= 3.6	2015-01-06 12:53:21 -08:00
Tom Stellard	761e36b4ca	radeonsi: Cache LLVMTargetMachine object in si_screen Rather than building a new one every compile. This should reduce some of the overhead of compiling shaders. One consequence of this change is that we lose the MachineInstrs dumps when dumping the shaders via R600_DEBUG. The LLVM IR and assembly is still dumped, and if you still want to see the MachineInstr dump, you can run the dumped LLVM IR through llc.	2015-01-06 12:53:21 -08:00

1 2 3 4 5 ...

67219 commits