fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-17 20:00:20 +01:00

Author	SHA1	Message	Date
Eric Anholt	ee69cfd11d	vc4: Convert vc4_opt_dead_code to work in the presence of control flow. With control flow, we can't be sure that we'll see the uses of a variable before its def as we walk backwards. Given that NIR is eliminating our long chains of dead code, a simple solution for now seems fine. This slightly changes the order of some optimizations, and so an opt_vpm happens before opt_dce, causing 3 dead MOVs to be turned into dead FMAXes in Minecraft: instructions in affected programs: 52 -> 54 (3.85%)	2016-07-13 23:54:15 -07:00
Eric Anholt	4e797bd98f	vc4: Update copy propagation for control flow. Previously, we could assume that a MOV from a temp was always an available copy, because all temps were SSA in NIR, and their non-SSA state in QIR was just due to the fact that they were from a bcsel or pack_unorm_4x8, so we could use the current value of the temp after that series of QIR instructions to define it. However, this is no longer the case with control flow. Instead, we track a new array of MOVs defined within the block that haven't had their source or dest killed yet, and use that primarily. We fall back to looking through the QIR defs array to handle across-block MOVs, but now require that copies from the SSA defs have an SSA src as well.	2016-07-13 23:54:15 -07:00
Samuel Iglesias Gonsálvez	94135e8736	i965/fs: emit DIM instruction to load 64-bit immediates in HSW v2 (Matt): - Use brw_imm_df() as source argument of DIM instruction. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-07-14 08:11:50 +02:00
Samuel Iglesias Gonsálvez	0534863c47	i965/eu: set DF imm value to the source of DIM According to HSW's PRM, vol02b, the DIM instruction has the following restriction: "Restriction : src0 must be immediate. src0 must specify the :f (F, Float) type encoding but is an immediate 64-bit DF (Double Float) value. dst must have type DF." This commit allows to upload the immediate 64-bit DF value to the source of a DIM instruction even when it is of float type encoding. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-07-14 08:06:01 +02:00
Samuel Iglesias Gonsálvez	6e28976d35	i965: enable the emission of the DIM instruction v2 (Matt): - Take a DF source argument for the DIM instruction emission in the visitors. - Indentation. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-07-14 08:06:01 +02:00
Jason Ekstrand	b9e99282a6	anv: Add a stub for CmdCopyQueryPoolResults on Ivy Bridge Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-07-13 20:31:27 -07:00
Timothy Arceri	a738732abf	i965: fix compiler warnings for 32bit build Reviewed-by: Matt Turner <mattst88@gmail.com>	2016-07-14 12:03:59 +10:00
Tim Rowley	29f53d7937	Revert "gallium: Force blend color to 16-byte alignment" This reverts commit `d8d6091a84`. Heap allocations may be only 8-byte aligned on 32-bit system, and so having members with 16-byte alignment (such as in the case where pipe_blend_color is embedded in radeonsi's si_context) is undefined behavior which indeed causes crashes when compiled with gcc -O3. Cc: <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96835 Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com> Acked-by: Chuck Atkins <chuck.atkins@kitware.com>	2016-07-13 13:55:33 -05:00
Jason Ekstrand	48ed8b6f26	isl/state: Add support for handling auxiliary surfaces Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	76e2dcc131	isl: Add an auxiliary surface usage enum Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	3ab3d97ac9	isl: Add support for color control surfaces Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	219024b9a7	isl: Add support for multisample compression surfaces Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	33dc8549fb	isl: Add support for HiZ surfaces Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	fc3650a0a9	isl: Kill off isl_format_layout::bs Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	1f0433f075	isl: Take bpb rather than bs in tiling_get_info Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	01855d7331	isl: Use bpb in a few places where it's more natural than bs Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	8c76b9bdce	isl: Use bpb for determining YUV image padding When we initially dropped bpb in favor of bs, we accidentally didn't change this one line properly. This brings it back to what it should be. Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	cf9ff082b4	isl: Bring back isl_format_layout::bpb A while ago we got rid of the bits-per-block because we thought we didn't need it. We're about to introduce some very useful 1 and 2-bit formats so we really should be able to handle them again. Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	0bd3a7e931	isl: Change the physical size of a W-tile to 128x32 Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	4b62c19c32	isl: Rework the way we define tile sizes. This is based on a very long set of discussions between Chad and myself about how we should properly represent HiZ and CCS buffers. The end result of that discussion was that a tiling actually has two different sizes, a logical size in elements, and a physical size in bytes and rows. This commit reworks ISL's pitch and size calculations to work in terms of these two sizes. Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	7270bd0607	isl: Rework the way we handle surface padding Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	a52f26d6e8	isl: Use ARRAY_PITCH_SPAN_FULL for depth/stencil surfaces on gen7 We helpfully inserted a PRM quotation about how we need to use ARRAY_PITCH_SPAN_FULL and then set it to COMPACT. Oops... Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	0d48ac627a	isl: Stop multiplying height by block size The row pitch already specifies the size of a row of elements. Multiplying by the block height simply causes us to allocate as muc as 12 times more memory than needed for compressed textures. Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	58c1b1088b	isl: Get rid of tiling_get_extent It was unused Reviewed-by: Chad Versace <chad.versace@intel.com>	2016-07-13 11:47:37 -07:00
Jason Ekstrand	49476576dd	nir/spirv: Don't multiply the push constant block size by 4 I have no idea why we were multiplying by 4 before. The offsets we get from SPIR-V are in bytes and so is nir->num_uniforms so there's no need to do any adjustment whatsoever. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-07-13 11:35:29 -07:00
Jason Ekstrand	1eed753ee8	anv/pipeline: Assert that the number of uniforms from NIR fits	2016-07-13 11:35:24 -07:00
Marek Olšák	0f7a6ea5e7	radeonsi: report accurate SGPR and VGPR spills Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	d227dbe272	radeonsi: add a workaround for a compute VGPR-usage LLVM bug v2: use abort(), describe which LLVM version is affected Cc: 12.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	f4d1de7f86	radeonsi: use LLVMGetTypeKind to tell if an input is an array of descriptors just a cleanup Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	785073ed0b	radeonsi: replace !tbaa with !invariant.load no change in generated code thanks to dereferenceable(n) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	348b9a5b1c	radeonsi: set dereferenceable attribute on descriptor arrays This allows moving the loads arbitrarily in the Sinking pass. 26002 shaders in 14643 tests Totals: SGPRS: 2080160 -> 2080160 (0.00 %) VGPRS: 798875 -> 797826 (-0.13 %) Spilled SGPRs: 108485 -> 79165 (-27.03 %) Spilled VGPRs: 327 -> 327 (0.00 %) Scratch VGPRs: 1656 -> 1652 (-0.24 %) dwords per thread Code Size: 36127192 -> 35559780 (-1.57 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 212464 -> 212672 (0.10 %) Wait states: 0 -> 0 (0.00 %) PERCENTAGES / App Shaders SGPRs VGPRs SpillSGPR SpillVGPR Scratch CodeSize MaxWaves Waits (unknown) 4 . . . . . . . . 0ad 6 . . . . . . . . alien_isolation 2938 . 0.04 % -8.53 % . . -0.71 % -0.06 % . anholt 10 . . . . . . . . batman_arkham_origins 589 . -0.58 % -79.54 % . . -6.72 % 0.57 % . bioshock-infinite 1769 . -0.65 % -89.32 % . . -4.73 % 0.48 % . borderlands2 3968 . -0.31 % -51.21 % . . -4.09 % 0.22 % . brutal-legend 338 . -0.03 % -2.95 % . . -0.06 % . . civilization_beyond.. 116 . . -14.17 % . . -0.88 % . . counter_strike_glob.. 1142 . . . . . . . . dirt-showdown 541 . -0.56 % -40.14 % . -3.45 % -1.82 % 0.35 % . dolphin 22 . . . . . 0.16 % . . dota2 1747 . . . . . 0.01 % . . europa_universalis_4 76 . -0.23 % -42.11 % . . -0.96 % . . f1-2015 774 . -0.09 % -28.89 % . . -2.60 % 0.09 % . furmark-0.7.0 4 . . . . . . . . gimark-0.7.0 10 . . . . . . . . glamor 16 . . . . . . . . humus-celshading 4 . . . . . . . . humus-domino 6 . . . . . . . . humus-dynamicbranching 24 . 0.71 % . . . 0.29 % -0.45 % . humus-hdr 10 . . . . . . . . humus-portals 2 . . . . . . . . humus-volumetricfog.. 6 . . . . . . . . left_4_dead_2 1762 . . . . . . . . metro_2033_redux 2670 . -0.10 % -7.15 % . . -0.03 % . . nexuiz 80 . . . . . . . . pixmark-julia-fp32 2 . . . . . . . . pixmark-julia-fp64 2 . . . . . . . . pixmark-piano-0.7.0 2 . . . . . . . . pixmark-volplosion-.. 2 . . . . . . . . plot3d-0.7.0 8 . . . . . . . . portal 474 . . . . . . . . sauerbraten 7 . . . . . . . . serious_sam_3_bfe 392 . . -13.20 % . . -1.81 % . . supertuxkart 4 . . . . . . . . talos_principle 324 . -0.21 % -18.39 % . . -2.73 % 0.14 % . team_fortress_2 808 . . . . . . . . tesseract 430 . 0.08 % -68.57 % . . -0.45 % . . tessmark-0.7.0 6 . . . . . . . . thea 172 . . . . . 0.03 % . . ue4_effects_cave 299 . -0.04 % -10.15 % . . -0.25 % 0.04 % . ue4_elemental 586 . -0.02 % -13.93 % . . -0.13 % 0.02 % . ue4_lightroom_inter.. 74 . -0.17 % -70.00 % . . -1.27 % . . ue4_realistic_rende.. 92 . . -32.58 % . . -0.35 % . . unigine_heaven 322 . 0.12 % -54.17 % . . -1.42 % -0.12 % . unigine_sanctuary 264 . . . . . . . . unigine_tropics 210 . . . . . . . . unigine_valley 278 . -0.15 % -40.74 % . . -2.00 % 0.09 % . unity 72 . . . . . 0.03 % . . warsow 176 . . . . . . . . warzone2100 4 . . . . . 0.13 % . . witcher2 1040 . -0.03 % -86.28 % . . -0.28 % 0.01 % . xcom_enemy_within 1236 . -0.24 % -63.54 % . . -0.93 % 0.18 % . yofrankie 82 . -0.61 % -100.00 % . . -0.83 % 0.41 % . ----------------------------------------------------------------------------------------------------------- Total 26002 . -0.13 % -27.03 % . -0.24 % -1.57 % 0.10 % . Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	6596ecf8c5	gallivm: add helper lp_add_attr_dereferenceable Not sure if this is the right way to do it, but it seems to work. v2: make it a no-op on LLVM <= 3.5 Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	bccf9de4df	radeonsi: clean up shader value metadata code No change in behavior. BTW, tbaa_md_kind == 1, which was the magic number in the code. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	d7d7e6adbe	radeonsi: remove LLVMNoUnwindAttribute uses always set by gallivm Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	c4807505c0	radeonsi: fix a typo in SI_PARAM_LINEAR_* handling introduced in `476e9cee1d` Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	f2f573e777	gallium/radeon: normalize the code style no change in behavior Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Marek Olšák	ed3912d0da	radeonsi: just save buffer sizes instead of buffers while recording IBs whole buffer objects are not needed Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-13 19:46:16 +02:00
Jon Turney	fc8139b146	Add c99_alloca.h include to fix compilation on Cygwin Fix compilation on Cygwin, since `50b22354`, by adding c99_alloca.h include, which should know how to portably make the alloc() prototype available. Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-07-13 16:11:36 +01:00
Topi Pohjolainen	7d29fee4a8	i965/blorp: Cleanup leftovers from push constant disabling Setup for pixel shader push constants is the same as for other stages. Note that on gen8+ the if-else branches were identical and the generation check for packet size redundant. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-13 12:10:03 +03:00
Topi Pohjolainen	26778da571	i965/blorp/gen7+: Bring back push constant setup This is partial revert of commit `cc2d0e64`. It looks that even though blorp disables a stage the corresponding 3DSTATE_CONSTANT_XS packet is needed to be programmed. Hardware seems to try to fetch the constants even for disabled stages. Therefore care needs to be taken that the constant buffer is set up properly. Blorp will continue to trash it into non-existing such as before. It is possible that this could be omitted on SKL where the constant buffer is considered when the corresponding binding table settings are changed. Bspec: "The 3DSTATE_CONSTANT_* command is not committed to the shader unit until the corresponding (same shader) 3DSTATE_BINDING_TABLE_POINTER_* command is parsed." However, as CONSTANT_XS packet itself does not seem to stall on its own, it is safer to emit the packets for SKL also. Possible alternative to blorp trashing could have been to setup defaults in the beginning of each batch buffer. However, hardware doesn't seem to tolerate these packets being programmed multiple times per primitive. Bspec for IVB: "It is invalid to execute this command more than once between 3D_PRIMITIVE commands." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96878 Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-13 12:09:35 +03:00
Nicolai Hähnle	65d48fcf8c	radeonsi: silence Coverity warning Coverity's analysis is too weak to understand that r600_init_flushed_depth(_, _, NULL) only returns true when flushed_depth_texture was assigned a non-NULL value. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-07-13 09:52:39 +02:00
Samuel Iglesias Gonsálvez	a2bd7334ed	i965/fs: do d2x lowering before simd splitting So that we can have gen7 split large writes produced by this lowering pass. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-07-13 07:09:41 +02:00
Iago Toral Quiroga	376d7ee587	i965/fs: do pack lowering before simd splitting So that we can have gen7 split large writes produced by the pack lowering. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-07-13 07:09:41 +02:00
Samuel Iglesias Gonsálvez	9979a3f2ac	i965/fs: do not require force_writemask_all with exec_size 4 So far we only used instructions with this size in situations where we did not operate per-channel and we wanted to ignore the execution mask, but gen7 fp64 will need to emit code with a width of 4 that needs normal execution masking. v2: - Modify the assert instead of deleting it (Curro) Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-07-13 07:09:41 +02:00
Iago Toral Quiroga	aa4796ae81	i965/fs/gen7: split instructions that run into exec masking bugs In fp64 we can produce code like this: mov(16) vgrf2<2>:UD, vgrf3<2>:UD That our simd lowering pass would typically split in instructions with a width of 8, writing to two consecutive registers each. Unfortunately, gen7 hardware has a bug affecting execution masking and as a result, the second GRF register write won't work properly. Curro verified this: "The problem is that pre-Gen8 EUs are hardwired to use the QtrCtrl+1 (where QtrCtrl is the 8-bit quarter of the execution mask signals specified in the instruction control fields) for the second compressed half of any single-precision instruction (for double-precision instructions it's hardwired to use NibCtrl+1, at least on HSW), which means that the EU will apply the wrong execution controls for the second sequential GRF write if the number of channels per GRF is not exactly eight in single-precision mode (or four in double-float mode)." In practice, this means that we cannot write more than one consecutive GRF in a single instruction if the number of channels per GRF is not exactly eight in single-precision mode (or four in double-float mode). This patch makes our SIMD lowering pass split this kind of instructions so that the split versions only write to a single register. In the example above this means that we split the write in 4 instructions, each one writing 4 UD elements (width = 4) to a single register. v2 (Curro): - Make explicit that the thing about hardwiring NibCtrl+1 for the second compressed half is known to happen in Haswell and the issue with IVB might not be exactly the same. - Assign max_width instead of returning early so that we can handle multiple restrictions affecting to the same instruction. - Avoid division by 0 if the instruction does not write any registers. - Ignore instructions what have WE_all set. - Use the instruction execution type size instead of the dst type size. v3 (Curro): - Move the implementation down so it is not placed in the middle of another workaround. - Declare channels_per_grf as const. - Don't break the loop early if we find a BAD_FILE source. - Fix the number of channels that the hardware shifts for the second half of a compressed instruction to be 8 in single precision and 4 in double precision. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-07-13 07:09:41 +02:00
Iago Toral Quiroga	87a13f598b	i965/fs: use the new helper function to create double immediates Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-13 07:09:41 +02:00
Iago Toral Quiroga	9e196e907e	i965/fs: add a helper function to create double immediates Gen7 hardware does not support double immediates so these need to be moved in 32-bit chunks to a regular vgrf instead. Instead of doing this every time we need to create a DF immediate, create a helper function that does the right thing depending on the hardware generation. v2: - Define setup_imm_df() as an independent function (Curro) - Create a specific builder to get rid of some instruction field assignments (Curro). v3: - Get devinfo from builder (Kenneth) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-13 07:09:41 +02:00
Eric Anholt	93794145dd	vc4: Validate QPU uniform pointer updates.	2016-07-12 17:42:42 -07:00
Eric Anholt	420845acb2	vc4: Add support for NIR loops and break/continue.	2016-07-12 17:42:42 -07:00
Eric Anholt	0adf2ec0ee	vc4: Add support for emitting NIR IF nodes.	2016-07-12 17:42:42 -07:00

... 48 49 50 51 52 ...

85652 commits