fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-26 20:50:23 +01:00

Author	SHA1	Message	Date
Caio Marcelo de Oliveira Filho	a3cba3c771	intel/fs: Only stall after sending all memory fence messages In Gen11+, when emitting a fence for both L3 and SLM, the generated code would look like SEND, MOV (for stall), SEND, MOV (for stall) This commit change that so two SENDs are emitted before the MOVs for stall. This is similar to the approach used in Ivy Bridge for the render fence. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho	f858fa26b4	intel/fs,vec4: Pull stall logic for memory fences up into the IR Instead of emitting the stall MOV "inside" the SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when creating the IR. For IvyBridge, every (data cache) fence is accompained by a render cache fence, that now is explicit in the IR, two SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs). Because Begin and End interlock intrinsics are effectively memory barriers, move its handling alongside the other memory barrier intrinsics. The SHADER_OPCODE_INTERLOCK is still used to distinguish if we are going to use a SENDC (for Begin) or regular SEND (for End). This change is a preparation to allow emitting both SENDs in Gen11+ before we can stall on them. Shader-db results for IVB (i965): total instructions in shared programs: 11971190 -> 11971200 (<.01%) instructions in affected programs: 11482 -> 11492 (0.09%) helped: 0 HURT: 8 HURT stats (abs) min: 1 max: 3 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10% 95% mean confidence interval for instructions value: 0.66 1.84 95% mean confidence interval for instructions %-change: 0.01% 0.27% Instructions are HURT. Unlike the previous code, that used the `mov g1 g2` trick to force both `g1` and `g2` to stall, the scheduling fence will generate `mov null g1` and `mov null g2`. During review it was decided it was not worth keeping the special codepath for the small effect will have. Shader-db results for HSW (i965), BDW and SKL don't have a change on instruction count, but do report changes in cycles count, showing SKL results below total cycles in shared programs: 341738444 -> 341710570 (<.01%) cycles in affected programs: 7240002 -> 7212128 (-0.38%) helped: 46 HURT: 5 helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154 helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95% HURT stats (abs) min: 2 max: 1768 x̄: 646.40 x̃: 362 HURT stats (rel) min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08% 95% mean confidence interval for cycles value: -777.71 -315.38 95% mean confidence interval for cycles %-change: -1.42% -0.83% Cycles are helped. This seems to be the effect of allocating two registers separatedly instead of a single one with size 2, which causes different register allocation, affecting the cycle estimates. while ICL also has not change on instruction count but report changes negative changes in cycles total cycles in shared programs: 352665369 -> 352707484 (0.01%) cycles in affected programs: 9608288 -> 9650403 (0.44%) helped: 4 HURT: 104 helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101 helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49% HURT stats (abs) min: 2 max: 2016 x̄: 408.36 x̃: 48 HURT stats (rel) min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45% 95% mean confidence interval for cycles value: 256.67 523.24 95% mean confidence interval for cycles %-change: 0.63% 1.03% Cycles are HURT. AFAICT this is the result of the case above. Shader-db results for TGL have similar cycles result as ICL, but also affect instructions total instructions in shared programs: 17690586 -> 17690597 (<.01%) instructions in affected programs: 64617 -> 64628 (0.02%) helped: 55 HURT: 32 helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3 helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74% HURT stats (abs) min: 1 max: 65 x̄: 7.44 x̃: 2 HURT stats (rel) min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69% 95% mean confidence interval for instructions value: -2.03 2.28 95% mean confidence interval for instructions %-change: -0.41% 0.15% Inconclusive result (value mean confidence interval includes 0). Now that more is done in the IR, more dependencies are visible and more SWSB annotations are emitted. Mixed with different register allocation decisions like above, some shaders will see more `sync nops` while others able to avoid them. Most of the new `sync nops` are also redundant and could be dropped, which will be fixed in a separate change. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho	0e96b0d6dd	intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Bas Nieuwenhuizen	9248b04528	radv: Expose 4G element texel buffers. Old value seems to be copied from anv. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4787>	2020-04-29 07:05:04 +00:00
Kenneth Graunke	506414e837	iris: Fix downcast of bound_vertex_buffers from uint64_t to int This is the wrong data type, the original field - and the values we're adding in - are both 64-bit unsigned. Keep the original data type. Thanks to Dave Airlie for finding this while reading the code. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4802>	2020-04-29 06:50:54 +00:00
Francisco Jerez	5e2a7e11b4	intel/ir: Remove scheduling-based cycle count estimates. The cycle count estimation logic part of the scheduler is now redundant with the shader performance modeling pass, and the estimates can be consolidated into the brw::performance analysis result object instead of being part of the CFG, which guarantees that the estimates cannot be accessed without previously calling the performance_analysis::require() method, which makes sure that the right analysis pass is executed at the right time if we don't already have up-to-date cached results. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	486f3b04a5	intel/ir: Pass block cycle count information explicitly to disassembler. So we can eventually remove the cycle count estimates from the CFG data structure and consolidate performance information in the brw::performance object. It would be cleaner to pass the brw::performance object directly to the disassembler but that isn't straightforward since the disassembler is built as a plain C file unlike the rest of the compiler back-end. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	6579f562c3	intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats. These should be more accurate than the current cycle counts, since among other things they consider the effect of post-scheduling passes like the software scoreboard on TGL. In addition it will enable us to clean up some of the now redundant cycle-count estimation functionality in the instruction scheduler. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	65342be3ae	intel/fs: Add INTEL_DEBUG=no32 debugging flag. This is useful in order to identify codegen issues caused by SIMD32. It doesn't currently have any effect on compute shaders since SIMD32 dispatch is only enabled for CS when it's strictly necessary to do so in order to support the workgroup size requested for the shader -- That might change in the future though when we hook up the SIMD32 heuristic to CS compilation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	14f0a5cf64	intel/fs: Implement performance analysis-based SIMD32 heuristic for fragment shaders. The heuristic enables the SIMD32 fragment shader based on whether the IR performance modeling pass predicts it to have greater throughput than the SIMD16 and SIMD8 variants of the same shader. It would be straightforward to do the same thing in order to control whether SIMD16 dispatch is enabled, but it's pending additional performance evaluation. The INTEL_DEBUG=do32 option is left around in order to force the SIMD32 shader to be used regardless of the result of the heuristic, since it's useful as a debugging aid e.g. in order to identify SIMD32-specific codegen issues which may be masked by the SIMD32 heuristic, or cases where the heuristic is incorrectly disabling SIMD32 shaders that offer a performance advantage. Currently this is only enabled on Gen6+, since SIMD32 codegen support is incomplete on earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	d6aa0c261f	intel/fs: Heap-allocate fs_visitors in brw_compile_fs(). This makes brw_compile_fs() look a bit more similar to brw_compile_cs(). It saves us three v*_shader_stats local variables, and will save us additional triplicated declarations as we start tracking IR performance analysis results. The triplicated cfg pointers are left around because they're set to NULL to mark specific dispatch modes as disabled (e.g. in order to enforce hardware restrictions). Doing the same thing with the visitor pointers would cause data leaks. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	188a3659ae	intel/ir: Import shader performance analysis pass. This introduces an analysis pass intended to estimate several performance statistics of the shader, including cycle count latency and throughput values, based on static modeling. It has instruction performance information more comprehensive than the current scheduling pass for all platforms between Gen4-11, and works on both the FS and VEC4 back-end. The most immediate purpose of this pass is to implement a heuristic meant to determine whether using SIMD32 dispatch for a fragment shader can be expected to help more than it hurts. In addition this will allow the effect of passes run after scheduling (e.g. the TGL software scoreboard pass and the VEC4 dependency control pass) to be visible in shader-db statistics. But that isn't the end of the story, other potential applications of this pass (not part of this MR) I've been playing around with are: - Implement a similar SIMD16 heuristic allowing the identification of inefficient SIMD16 fragment shaders. - Implement similar SIMD16 and SIMD32 heuristics for the compute shader stage -- Currently compute shader builds always use the SIMD16 shader if available and never use the SIMD32 shader unless strictly necessary, which is suboptimal under certain conditions. - Hook up to the instruction scheduler in order to improve the accuracy of its timing information. - Use as heuristic in order to drive the selection of scheduling modes (Matt was experimenting with that). - Plug to the TGL software scoreboard pass in order to implement a more effective SBID token allocation algorithm, since in general the optimal token allocation depends on the timings of all instructions in the program. - Use its bottleneck detection functionality in order to implement a heuristic computing a more optimal bound for the number of fragment shader threads executed in parallel (by adjusting the MaximumNumberofThreadsPerPSD control of 3DSTATE_PS). As a follow-up I'm planning to submit updated timing information for Gen12 platforms -- Everything else required to support Gen12 like SWSB handling is already included in this patch, but there were some IP concerns regarding the TGL timing parameters since they cannot currently be obtained with the documentation and hardware which is publicly available. The timing parameters for any previous Gen7-11 platforms can be obtained by anyone by sampling the timestamp register using e.g. shader_time, though I have some more convenient instrumentation coming up. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:03 -07:00
Francisco Jerez	c8ce1cfc9c	intel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	bda1d72dd9	intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function. This will be re-usable by the IR performance analysis pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	d2ed740795	intel/fs: Fix constness of argument of fs_instruction_scheduler::is_compressed(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	6310a05f68	intel/fs: Rename half() helpers to quarter(), allow index up to 3. Makes more sense considering SIMD32. Relaxing the assertion in brw_ir_fs.h will be required in order to avoid assertion failures on SNB with SIMD32 fragment shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	bdad7f429a	intel/ir: Add missing initialization of backend_reg::offset during construction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	e549e4f6c0	intel/fs/gen12: Fix Render Target Read header setup for new thread payload layout. In Gen12 the Poly 0 Info DWORD containing the Viewport Index and Render Target Index fields were moved from r0.0 to r1.1 in order to make room for dual-polygon dispatch. The render target message format was updated to expect that information in the same location, so we didn't need to make any changes for framebuffer fetch to work with SIMD8 and SIMD16 dispatch. Unfortunately that won't work with SIMD32, since the render target message header is assembled from r0 and r2 instead of r1, and the r2 thread payload wasn't updated with an additional copy of the same information. We need to fix things up manually instead. This avoids a handful of EXT_shader_framebuffer_fetch regressions in combination with SIMD32 fragment shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:29 -07:00
Francisco Jerez	72324035fb	intel/fs/gen12: Work around dual-source blending hangs in combination with SIMD32. This applies the same work-around I commited as `b84fa0b31e` "intel/fs/gen11: Work around dual-source blending hangs in combination with SIMD32." to Gen12, which seems to suffer from the same hardware bug found empirically. The failure mode seems to be identical. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:28 -07:00
Francisco Jerez	d6ae079771	intel/fs/gen12: Fix hangs with per-sample SIMD32 fragment shader dispatch. The Gen12 docs are rather contradictory regarding the dispatch configurations supported by the fragment shader -- The same table present in previous generations seems to imply that only one dispatch mode can be enabled when doing per-sample shading, but a restriction documented in the 3DSTATE_PS_BODY page implies the opposite: That SIMD32 can only be used in combination with some other dispatch mode. The latter seems to match the behavior of real hardware as I could tell from my testing: A bunch of multisample test-cases that do per-sample shading hang if we only provide a SIMD32 shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:00:28 -07:00
Dylan Baker	35ee6b3d36	mesa: Follow OpenGL conversion rules for values that exceed storage size Section 2.2.2 (Data Conversions For State Query Commands) of the OpenGL 4.5 spec says: Following these steps, if a value is so large in magnitude that it cannot be represented by the returned data type, then the nearest value representable using that type is returned. The current code doesn't do the correct thing, because it truncates a long (potentially a 64bit values) to an int. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2828 Fixes: `53c36dfcfe` ("replace IROUND with util functions") Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4673>	2020-04-29 04:26:41 +00:00
Alyssa Rosenzweig	76c5688018	pan/bit: Add BITWISE test Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>	2020-04-29 00:30:05 +00:00
Alyssa Rosenzweig	844c3f94b5	pan/bit: Interpret BI_BITWISE No shifting yet. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>	2020-04-29 00:30:05 +00:00
Alyssa Rosenzweig	a077da6273	pan/bi: Handle iand/ior/ixor in NIR->BIR Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>	2020-04-29 00:30:05 +00:00
Alyssa Rosenzweig	ef9582738e	pan/bi: Pack BI_BITWISE Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>	2020-04-29 00:30:05 +00:00
Alyssa Rosenzweig	9b415bf6a0	pan/bi: Add bitwise modifiers Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>	2020-04-29 00:30:05 +00:00
Rob Clark	6de01faac5	freedreno/a6xx: invalidate tex state cache entries on rebind When a resource's backing bo changes, its seqno will be incremented. Which would result in a new tex state cache key, and nothing to clean up the old tex state until the sampler view/state is destroyed. But in some games, that may never happen, or at least not happen before we run out of memory. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2830 Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>	2020-04-29 00:08:57 +00:00
Rob Clark	ca05e6b04d	freedreno: rebind_resource() before bo changes This will matter in the next patch, where we need the original rsc->seqno. It means slight shuffling of where we call rebind_resource() in the `fd_try_shadow_resource()` path. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>	2020-04-29 00:08:57 +00:00
Rob Clark	d9e56d8a69	freedreno: rebind resource in all contexts If the resource is rebound, we need to invalidate in all contexts. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>	2020-04-29 00:08:57 +00:00
Rob Clark	f12188ff52	freedreno: optimize rebind_resource() Track how resources are used, ie. which state they may potentially dirty if the backing bo is changed/reallocated, to optimize rebind_resource(). This will be more important in a later patch when we hook up eviction of entries in a6xx tex state cache. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>	2020-04-29 00:08:57 +00:00
Rob Clark	1e18c58047	freedreno: mark more state dirty when rebinding resources Plus a bonus typo fix. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>	2020-04-29 00:08:57 +00:00
Rob Clark	bf97cc9221	freedreno: don't realloc idle bo's The `DISCARD_WHOLE_RESOURCE` is just a hint. And `rebind_resource()` is a bunch of faffing about (and going to get worse in a later patch), so let's not bother when the bo is already idle. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>	2020-04-29 00:08:57 +00:00
Rob Clark	938b6ed645	freedreno: small whitespace fix Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>	2020-04-29 00:08:57 +00:00
Jan Zielinski	a93b728bc6	gallium/swr: Fix crashes and failures in vertex fetch This commit fixes two problems: - In some cases SWR does not correctly report to Gallium which formats are supported. - Incorrect LLVM instructions are used in vertex fetch in some situations Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4788>	2020-04-28 23:53:08 +00:00
Rob Clark	de0d3d1726	freedreno/log-parser: support to read gzip'd logs ~50MB gzip'd log files are nicer than ~300MB uncompressed Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>	2020-04-28 23:31:58 +00:00
Rob Clark	f561e516c8	freedreno/a6xx: pre-calculate expected vsc stream sizes We should only rely on overflow detection for indirect draws, where we have no other option. This doesn't use quite the worst-possible-case sizes, which in practice seem to be ~20x larger than what is required. But instead uses roughly half of that. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>	2020-04-28 23:31:58 +00:00
Rob Clark	99d802ccc7	freedreno: add helper to estimate # of bins per pipe For vsc size calculation, we need to know the # of bins per pipe. Or at least the worst-case # of bins, assuming we don't eliminate an unused depth/ stencil buffer. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>	2020-04-28 23:31:58 +00:00
Rob Clark	a9c255d70c	freedreno/a6xx+tu: rename VSC_DATA/VSC_DATA2 These are the draw-stream and primitive-stream, so lets give them more descriptive names. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>	2020-04-28 23:31:58 +00:00
Rhys Perry	3ee3ad561a	aco: fix vgpr nir_op_vecn with sgpr operands Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	c5eda3c746	aco: improve clamped integer addition disassembly workaround Make it work with 16-bit and GFX10. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	4ed83e2f94	aco: add various GFX10 int16 opcodes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	43f2ba39ef	aco: fix sub-dword overwrite check in RA validator Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	cca8d6ce06	aco: fix sub-dword out-of-bounds check in RA validator Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	307aca83a2	aco: add missing adjust_max_used_regs() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	99ca96fbf5	aco: improve RA for uneven p_split_vector Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	24116a8a56	aco: don't recurse in sub-dword get_reg_simple() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	09c584caeb	aco: split self-intersecting copies instead of swapping Example situation: v1 = {v0.hi, v1.lo} v0.hi = v1.hi The 4-byte copy's definition is completely used, but swapping it makes no sense. We have to split it to generate correct code: swap(v0.hi, v1.lo) swap(v0.hi, v1.hi) Found in dEQP-VK.spirv_assembly.type.vec3.i16.constant_composite_vert Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	be4a34966c	aco: fix neighboring register check in get_reg_simple() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	fb59ed6bb9	aco: check alignment of non-subdword registers in get_reg_specified() When splitting a v6b vector into v1 and v2b components, we should ensure the v1 definition doesn't start at the upper half. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00
Rhys Perry	916cc3e231	aco: make RegisterFile::block() take a regclass Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>	2020-04-28 23:16:55 +00:00

... 95 96 97 98 99 ...

127836 commits