fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-06-01 02:48:17 +02:00

Author	SHA1	Message	Date
Tom Stellard	dfdaf3eb7e	radeon: Teach radeon_elf_read() how to parse reloc information v3 v2: - Use strdup for copying reloc names. - Free reloc memory. v3: - Add free_relocs parameter to radeon_shader_binary_free_members()	2015-01-20 09:55:43 -05:00
Tom Stellard	5667aa58c4	radeon: Add a helper function for freeing members of radeon_shader_binary	2015-01-20 09:55:43 -05:00
Marek Olšák	ccc5b60b06	winsys/radeon: increase the size of buffer cache This should fix this performance regression: https://bugs.freedesktop.org/show_bug.cgi?id=88227 Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-19 20:15:27 +01:00
Eric Anholt	84ef2d4156	vc4: Add some dumping for STORE_TILE_BUFFER_GENERAL.	2015-01-15 22:21:29 +13:00
Eric Anholt	1b241c59e8	vc4: Add dumping for the TILE_RENDERING_MODE_CONFIG packet. I wanted to read it, so I wrote parsing.	2015-01-15 22:19:25 +13:00
Eric Anholt	d0d6d24723	vc4: Fix CL dumping trying to dump too far. Execution will end at the cl->next, because that's what ct0ea/ct1ea get programmed to.	2015-01-15 22:19:25 +13:00
Eric Anholt	0471f72755	vc4: Fix texture type masking. Everything from ETC1 to RGBA64 was getting its top bit dropped, but we didn't use any of those formats.	2015-01-15 22:19:25 +13:00
Eric Anholt	6313a2c8f0	vc4: Colormask should apply after all other fragment ops (like logic op). Theoretically it should apply after dithering as well, but ditehring for 565 happens in fixed function in the TLB store.	2015-01-15 22:19:25 +13:00
Eric Anholt	0289a26201	vc4: No turning unpack arguments into small immediates. Since unpack only happens on things read from the A register file, we have to leave them as something that can be allocated to A (temp or uniform).	2015-01-15 22:19:25 +13:00
Eric Anholt	772c47aefe	vc4: Move the tests for src needing to be an A register to vc4_qir.c. I want it from another location.	2015-01-15 22:19:25 +13:00
Eric Anholt	8f2fb68026	vc4: Don't swap the raddr on instructions doing unpacks. It would mean different unpacking behavior, since only the A file does unpack (with PM==0).	2015-01-15 22:19:25 +13:00
Eric Anholt	5d5707707f	vc4: Don't let pairing happen with badly mismatched unpack flags. No difference on shader-db, but prevents definite regressions in the blending changes.	2015-01-15 22:19:25 +13:00
Eric Anholt	3820866e40	vc4: Don't let pairing happen with badly mismatched pack flags. No difference on shader-db, but will become more important as I introduce more use of pack flags with the blending changes.	2015-01-15 22:19:25 +13:00
Eric Anholt	d1f2fc834d	vc4: Fix early Z behavior on hardware. It turns out the simulator was not treating this bit the same as the RPi, and I'd forgotten to remove it when turning on early Z. The result was that you'd get big chunks of your rendering missing.	2015-01-15 22:19:25 +13:00
Michel Dänzer	82b7ee62fc	Revert "radeonsi: only set BC_OPTIMIZE_DISABLE when necessary" This reverts commit `0543630d0b`. It caused flickering artifacts in Steam games such as Team Fortress 2 or Left 4 Dead 2. We could probably only enable this optimization by also making sure the shader code only uses either SI_PARAM_LINEAR_CENTROID or SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader variant. Sorry I didn't remember this when reviewing the reverted change. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-01-15 15:09:48 +09:00
Michel Dänzer	a6a75f1286	st/clover: Adapt to TargetLibraryInfo.h move in LLVM SVN r226078 Trivial.	2015-01-15 12:57:05 +09:00
Rob Clark	876550ff97	freedreno/ir3: handle "holes" in inputs If, for example, only the x/y/w components of in.xyzw are actually used, we still need to have a group of four registers and assign all four components. The hardware can't write in.xy and in.w to discontiguous registers. To handle this, pad with a dummy NOP instruction, to keep the neighbor chain contiguous. This fixes a problem noticed with firefox OMTC. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-13 08:17:18 -05:00
Marek Olšák	bed6f20f28	r600g: fix build failure when building the driver without LLVM	2015-01-12 23:20:26 +01:00
Eric Anholt	ff1948a1be	vc4: Clamp the inputs to the blend equation to [0, 1]. Fixes the remaining ARB_color_buffer_float rendering tests.	2015-01-11 17:17:20 +13:00
Eric Anholt	1519a1928a	vc4: Add a little helper for clamping to [0,1].	2015-01-11 17:17:20 +13:00
Eric Anholt	1a328120d3	vc4: Fix up statechange management for uncompiled/compiled FS/VS. No need to recheck the FS compile when the VS source has changed, but there is a need to recheck the VS compile when the compiled VS has changed (since the live inputs may change). Fixes es3conform's blend test.	2015-01-11 17:17:20 +13:00
Eric Anholt	c122662984	vc4: Fix clear color setup for RGB565. The util_pack_color() thing only sets up the low bits of the union, so only return them, too. Fixes intermittent failure on fbo-alphatest-formats and es3conform's framebuffer-objects test under simulation.	2015-01-11 17:17:19 +13:00
Eric Anholt	355156d2f7	vc4: Avoid the save/restore of r3 for raddr conflicts, just use ra31. Turns out this was harmful in code quality: total instructions in shared programs: 39487 -> 38845 (-1.63%) instructions in affected programs: 22522 -> 21880 (-2.85%) This costs us yet another register, which is painful since it means more programs might fail to compile). However, the alternative was causing us trouble where we'd save/restore r3 while it contained a MIN-ed direct texture offset, causing the kernel to fail to validate our shaders (such as in GLB2.7).	2015-01-11 08:57:24 +13:00
Eric Anholt	a8e14c293b	vc4: Allow dead code elimination of VPM reads. This gets a bunch of dead reads out of the CSes, which don't read most attributes generally. total instructions in shared programs: 39753 -> 39487 (-0.67%) instructions in affected programs: 4721 -> 4455 (-5.63%)	2015-01-10 20:55:37 +13:00
Eric Anholt	b920ecf793	vc4: Cook up the draw-time VPM setup info during shader compile. This will give the compiler the chance to dead-code eliminate unused VPM reads. This is particularly a big deal in the CS where a bunch of vattrs are just not going to be used.	2015-01-10 15:24:56 +13:00
Eric Anholt	c772c92153	vc4: Split two notions of instructions having side effects. Some ops can't be DCEd, while some of the ops that are just important due to the args they have can be.	2015-01-10 15:24:46 +13:00
Eric Anholt	a58ae83882	vc4: Redo VPM reads as a read file. This will let us do copy propagation of the VPM reads.	2015-01-10 14:35:24 +13:00
Eric Anholt	06b6a72a3e	vc4: Fix miscalculation of the VPM space. We pass in a byte offset, not dword. I'm rather scared that this actually managed to pass piglit, but it does fix gears.	2015-01-10 14:35:06 +13:00
Eric Anholt	92a0b0bd70	vc4: Pack VPM attr contents according to just the size of the attribute. total instructions in shared programs: 40960 -> 39753 (-2.95%) instructions in affected programs: 20871 -> 19664 (-5.78%)	2015-01-10 13:54:12 +13:00
Eric Anholt	72cb6619cb	vc4: Restructure color packing as a series of channel replacements. I'm using this in some WIP commits for doing blending in 8888 instead of vec4. But it also gives us these results immediately, thanks to allowing more uniforms/immediates in the arguments: total instructions in shared programs: 41027 -> 40960 (-0.16%) instructions in affected programs: 4381 -> 4314 (-1.53%)	2015-01-10 13:54:12 +13:00
Eric Anholt	3093bfacf0	vc4: Fix the no-copy-propagating-from-TLB_COLOR_READ check. Our MOV's dst obviously won't be the TLB_COLOR_READ's def, because we're ssa.	2015-01-10 13:54:12 +13:00
Eric Anholt	1d04432677	vc4: Move global seqno short-circuiting to vc4_wait_seqno(). Any other caller would want it, too.	2015-01-10 13:54:12 +13:00
José Fonseca	6c9b695a9c	st/wgl: Ignore ulVersion in DrvValidateVersion. We never used ulVersion for proper version checks. Most 3rd party drivers use version 1, but recently NVIDIA OpenGL driver started using a different version number, so the handy trick of renaming Mesa's ICDs as nvoglv32.dll on Windows machines with NVIDIA hardware for quick testing of Mesa software renderers stopped working. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-01-08 18:57:04 +00:00
Rob Clark	e7026ac486	freedreno/ir3: fix pos_regid > max_reg We can't (or don't know how to) turn this off. But it can end up being stored to a higher reg # than what the shader uses, leading to corruption. Also we currently aren't clever enough to turn off frag_coord/frag_face if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org this a bit so both vp and fp reg footprint fixup are called by a common fxn used also by ir3_cmdline. Also add a few more output lines for ir3_cmdline to make it easier to see what is going on. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	1e5c207dba	freedreno/ir3: start on indirect gpr reads Handle TEMP[ADDR[]] src registers by generating a fanin to group array elements, similarly to how texture fetch instructions work. NOTE: For all the scalar instructions generated for a single tgsi vector operation which uses an array src (or possibly even uses the same array as multiple srcs), re-use the same fanin node. Since a vector operation operates on all components at the same time, it should never see more than one version of the same array. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	63e5b72da8	freedreno/ir3: make reg array dynamic To use fanin's to group registers in an array, we can potentially have a much larger array of registers. Rather than continuing to bump up the array size, just make it dynamically allocated when the instruction is created. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	9a9f2a893b	freedreno/ir3: simplify RA Group inputs/outputs, in addition to fanin/fanout, as they must also exist in sequential scalar registers. This lets us simplify RA by working in terms of neighbor groups. NOTE: has the slight problem that it can't optimize out mov's for things like: MOV OUT[n], IN[m] To avoid this, instead of trying to figure out what mov's we can eliminate, we first remove all mov's prior to grouping, and then re-insert mov's as needed while grouping inputs/outputs/fanins. Eventually we'd prefer the frontend to not insert extra mov's in the first place (so we don't have to bother removing them). This is the plan for an eventual NIR based frontend, so separate out the instr grouping (which will still be needed for NIR frontend) from the mov elimination (which won't). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	dddfe6c21e	freedreno/ir3: regmask support for relative addr For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't need to support an arbitrary mask. So for this case use a simple size field rather than a bitmask. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	9bb865b3cf	freedreno/ir3: split up ssa_src Slight bit of refactoring that will be needed for indirect gpr addressing (TEMP[ADDR[]]). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	d15db9e7c0	freedreno/ir3: drop instr_clone() stuff Unnecessary and overly complicated. And gets in the way for temp arrays (TEMP[ADDR[]]). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	212b909643	freedreno/ir3: runtime enable RA debug for DEBUG builds Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	8c3952051e	freedreno/ir3: handle relative addr in ir3_dump Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	56370b9feb	freedreno/ir3: legalize vs unused sam dst components We probably could be more clever elsewhere and mask out components that are not used. But either way, legalize should realize that there is also a write-after-write hazard with texture sample instructions. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	063e2ef76a	freedreno/ir3: hack for old compiler Old compiler doesn't have ir3_block's.. so we need a special path. This hack can be dropped when ir3_compiler_old is retired. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	18899d1b80	tgsi: track max array per file NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can optionally have them. So we implicitly assume that ArrayID==0 always exists for each file. This is why array_max[file] is never less than zero. You can tell from indirect_files(_read/written) if the legacy array- id zero was actually used. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-01-07 19:37:28 -05:00
Rob Clark	49b4a6331f	tgsi: keep track of read vs written indirects At least temporarily, I need to fallback to old compiler still for relative dest (for freedreno), but I can do relative src temp. Only a temporary situation, but seems easy/reasonable for tgsi-scan to track this. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-01-07 19:37:28 -05:00
Marek Olšák	d7cd9bfc7f	Revert "radeonsi: reduce the size of si_pm4_state" This reverts commit `9141d88555`. It broke OpenCL.	2015-01-08 00:10:36 +01:00
Tom Stellard	e28f9d0e60	radeonsi: Fix crash when destroying si_screen We were invalidating si_screen:tm by calling r600_destroy_common_screen() which frees the si_screen object. This caused the driver to crash in LLVMDisposeTargetMachine() since we were passing it an invalid pointer. https://bugs.freedesktop.org/show_bug.cgi?id=88170	2015-01-07 16:28:40 -05:00
Marek Olšák	1829f9c928	radeonsi: enable LLVM optimizations that assume no NaNs for non-compute shaders v2: complete rewrite Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-01-07 18:27:54 +01:00
Marek Olšák	d8185aa9a8	radeonsi: emit SURFACE_SYNC last This fixes a case where a transform feedback buffer is fed back as an index buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-01-07 12:06:43 +01:00

1 2 3 4 5 ...

22789 commits