fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 00:30:13 +01:00

Author	SHA1	Message	Date
Ian Romanick	90430d0488	intel/fs: Eliminate dead code first This simplifies the later patch "i965/fs: Generate better code for fsign multiplied by a value". shader-db results: Broadwell and Skylake had similar results. (Skylake shown) total cycles in shared programs: 372808735 -> 372809551 (<.01%) cycles in affected programs: 1519520 -> 1520336 (0.05%) helped: 243 HURT: 277 helped stats (abs) min: 1 max: 226 x̄: 34.05 x̃: 5 helped stats (rel) min: 0.01% max: 13.88% x̄: 1.46% x̃: 0.27% HURT stats (abs) min: 1 max: 1810 x̄: 32.82 x̃: 5 HURT stats (rel) min: 0.01% max: 16.03% x̄: 1.56% x̃: 0.29% 95% mean confidence interval for cycles value: -7.18 10.32 95% mean confidence interval for cycles %-change: -0.17% 0.46% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge, Haswell and Ivy Bridge had similar results. (Sandy Bridge shown) total cycles in shared programs: 155091458 -> 155091109 (<.01%) cycles in affected programs: 370797 -> 370448 (-0.09%) helped: 24 HURT: 36 helped stats (abs) min: 1 max: 331 x̄: 103.17 x̃: 41 helped stats (rel) min: 0.02% max: 7.70% x̄: 2.07% x̃: 0.56% HURT stats (abs) min: 1 max: 291 x̄: 59.08 x̃: 10 HURT stats (rel) min: 0.02% max: 5.29% x̄: 1.02% x̃: 0.15% 95% mean confidence interval for cycles value: -37.92 26.28 95% mean confidence interval for cycles %-change: -0.88% 0.45% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (GM45 shown) total cycles in shared programs: 129133970 -> 129133978 (<.01%) cycles in affected programs: 111966 -> 111974 (<.01%) helped: 3 HURT: 1 helped stats (abs) min: 2 max: 4 x̄: 2.67 x̃: 2 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 HURT stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07% 95% mean confidence interval for cycles value: -12.93 16.93 95% mean confidence interval for cycles %-change: -0.05% 0.08% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Jason Ekstrand	c6463f8ac2	nir: Add a nir_src_as_intrinsic() helper Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Jason Ekstrand	85c35885b3	nir: Rework nir_src_as_alu_instr to not take a pointer Other nir_src_as_* functions just take a nir_src. It's not that much more memory copying and the constness preserving really isn't worth the cognitive dissonance. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Iago Toral Quiroga	8ed6d74c92	intel/compiler: validate region restrictions for mixed float mode v2: - Adapted unit tests to make them consistent with the changes done to the validation of half-float conversions. v3 (Curro): - Check all the accummulators - Constify declarations - Do not check src1 type in single-source instructions. - Check for all instructions that read accumulator (either implicitly or explicitly) - Check restrictions in src1 too. - Merge conditional block - Add invalid test case. v4 (Curro): - Assert on 3-src instructions, as they are not validated. - Get rid of types_are_mixed_float(), as we know instruction is mixed float at that point. - Remove conditions from not verified case. - Fix brackets on conditional. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 13:22:46 +02:00
Iago Toral Quiroga	58d6417e59	intel/compiler: validate conversions between 64-bit and 8-bit types v2: - Add some tests with UB type too (Jason) v3: - consider implicit conversions from 2src instructions too (Curro). v4: - Do not check src1 type in single-source instructions (Curro). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	7376d57a9c	intel/compiler: validate region restrictions for half-float conversions v2: - Consider implicit conversions in 2-src instructions too (Curro) - For restrictions that involve destination stride requirements only validate them for Align1, since Align16 always requires packed data. - Skip general rule for the dst/execution type size ratio for mixed float instructions on CHV and SKL+, these have their own set of rules that we'll be validated separately. v3 (Curro): - Do not check src1 type in single-source instructions. - Check restriction on src1. - Remove invalid test. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	6ff52f0628	intel/compiler: also set F execution type for mixed float mode in BDW The section 'Execution Data Types' of 3D Media GPGPU volume, which describes execution types, is exactly the same in BDW and SKL+. Also, this section states that there is a single execution type, so it makes sense that this is the wider of the two floating point types involved in mixed float mode, which is what we do for SKL+ and CHV. v2: - Make sure we also account for the destination type in mixed mode (Curro). Acked-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	100debc3c9	intel/compiler: implement SIMD16 restrictions for mixed-float instructions v2: f32to16/f16to32 can use a :W destination (Curro) v3: check destination is packed (Curro). Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	6d87c651c9	intel/compiler: skip MAD algebraic optimization for half-float or mixed mode It is very likely that this optimzation is never useful and we'll probably just end up removing it, so let's not bother adding more cases to it for now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	64b93292ac	intel/compiler: remove inexact algebraic optimizations from the backend NIR already has these and correctly considers exact/inexact qualification, whereas the backend doesn't and can apply the optimizations where it shouldn't. This happened to be the case in a handful of Tomb Raider shaders, where NIR would skip the optimizations because of a precise qualification but the backend would then (incorrectly) apply them anyway. Besides this, considering that we are not emitting much math in the backend these days it is unlikely that these optimizations are useful in general. A shader-db run confirms that MAD and LRP optimizations, for example, were only being triggered in cases where NIR would skip them due to precise requirements, so in the near future we might want to remove more of these, but for now we just remove the ones that are not completely correct. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	ddd1706ab3	intel/compiler: fix cmod propagation for non 32-bit types v2: - Do not propagate if the bit-size changes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	66002eeebe	intel/compiler: add a brw_reg_type_is_integer helper v2: - Fixed typo: meant BRW_REGISTER_TYPE_UB instead BRW_REGISTER_TYPE_UV Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	44e1affaec	intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit There are no 8-bit immediates, so assert in that case. 16-bit immediates are replicated in each word of a 32-bit immediate, so we only need to check the lower 16-bits. v2: - Fix is_zero with half-float to consider -0 as well (Jason). - Fix is_negative_one for word type. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	e64be391dd	intel/compiler: generalize the combine constants pass At the very least we need it to handle HF too, since we are doing constant propagation for MAD and LRP, which relies on this pass to promote the immediates to GRF in the end, but ideally we want it to support even more types so we can take advantage of it to improve register pressure in some scenarios. v2 (Jason): - Support 64-bit types too. - Check if we need to set the half-float flag if the immediate already existed. - Multiply the size of the immediate by the width of the copy Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	fb990bd76e	intel/eu: force stride of 2 on NULL register for Byte instructions The hardware only allows a stride of 1 on a Byte destination for raw byte MOV instructions. This is required even when the destination is the NULL register. Rather than making sure that we emit a proper NULL:B destination every time we need one, just fix it at emission time. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	ce68a061de	intel/compiler: ask for an integer type if requesting an 8-bit type v2: - Assign BRW_REGISTER_TYPE_B directly for 8-bit (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	092b147774	intel/compiler: rework conversion opcodes Now that we have the regioning lowering pass we can just put all of these opcodes together in a single block and we can just assert on the few cases of conversion instructions that are not supported in hardware and that should be lowered in brw_nir_lower_conversions. The only cases what we still handle separately are the conversions from float to half-float since the rounding variants would need to fallthrough and we are already doing this for boolean opcodes (since they need to negate), plus there is also a large comment about these opcodes that we probably want to keep so it is just easier to keep these separate. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	472244b374	intel/compiler: activate 16-bit bit-size lowerings also for 8-bit Particularly, we need the same lowewrings we use for 16-bit integers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	40b3abb4d1	intel/compiler: split is_partial_write() into two variants This function is used in two different scenarios that for 32-bit instructions are the same, but for 16-bit instructions are not. One scenario is that in which we are working at a SIMD8 register level and we need to know if a register is fully defined or written. This is useful, for example, in the context of liveness analysis or register allocation, where we work with units of registers. The other scenario is that in which we want to know if an instruction is writing a full scalar component or just some subset of it. This is useful, for example, in the context of some optimization passes like copy propagation. For 32-bit instructions (or larger), a SIMD8 dispatch will always write at least a full SIMD8 register (32B) if the write is not partial. The function is_partial_write() checks this to determine if we have a partial write. However, when we deal with 16-bit instructions, that logic disables some optimizations that should be safe. For example, a SIMD8 16-bit MOV will only update half of a SIMD register, but it is still a complete write of the variable for a SIMD8 dispatch, so we should not prevent copy propagation in this scenario because we don't write all 32 bytes in the SIMD register or because the write starts at offset 16B (wehere we pack components Y or W of 16-bit vectors). This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit instructions, which lose a number of optimizations because of this, most important of which is copy-propagation. This patch splits is_partial_write() into is_partial_reg_write(), which represents the current is_partial_write(), useful for things like liveness analysis, and is_partial_var_write(), which considers the dispatch size to check if we are writing a full variable (rather than a full register) to decide if the write is partial or not, which is what we really want in many optimization passes. Then the patch goes on and rewrites all uses of is_partial_write() to use one or the other version. Specifically, we use is_partial_var_write() in the following places: copy propagation, cmod propagation, common subexpression elimination, saturate propagation and sel peephole. Notice that the semantics of is_partial_var_write() exactly match the current implementation of is_partial_write() for anything that is 32-bit or larger, so no changes are expected for 32-bit instructions. Tested against ~5000 tests involving 16-bit instructions in CTS produced the following changes in instruction counts: Patched \| Master \| % \| ================================================ SIMD8 \| 621,900 \| 706,721 \| -12.00% \| ================================================ SIMD16 \| 93,252 \| 93,252 \| 0.00% \| ================================================ As expected, the change only affects SIMD8 dispatches. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	0986199b31	intel/compiler: workaround for SIMD8 half-float MAD in gen8 Empirical testing shows that gen8 has a bug where MAD instructions with a half-float source starting at a non-zero offset fail to execute properly. This scenario usually happened in SIMD8 executions, where we used to pack vector components Y and W in the second half of SIMD registers (therefore, with a 16B offset). It looks like we are not currently doing this any more but this would handle the situation properly if we ever happen to produce code like this again. v2 (Jason): - Move this workaround to the lower_regioning pass as an additional case to has_invalid_src_region() - Do not apply the workaround if the stride of the source operand is 0, testing suggests the problem doesn't exist in that case. v3 (Jason): - We want offset % REG_SIZE > 0, not just offset > 0 - Use a helper to compute the offset Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	aaae24179f	intel/compiler: fix ddy for half-float in Broadwell Broadwell has restrictions that apply to Align16 half-float that make the Align16 implementation of this invalid for this platform. Use the gen11 path for this instead, which uses Align1 mode. The restriction is not present in cherryview, gen9 or gen10, where the Align16 implementation seems to work just fine. v2: - Rework the comment in the code, move the PRM citation from the commit message to the comment in the code (Matt) - Cherryview isn't affected, only Broadwell (Matt) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	60c7c6d3ba	intel/compiler: fix ddx and ddy for 16-bit float We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components in a single SIMD register, so for example, component Y of a 16-bit vec2 starts is at byte offset 16B. This means that when we compute the offset of the elements to be differentiated we should not stomp whatever base offset we have, but instead add to it. v2 - Use byte_offset() helper (Jason) - Merge the fix for SIMD8: using byte_offset() fixes that too. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	8f40d392b9	intel/compiler: set correct precision fields for 3-source float instructions Source0 and Destination extract the floating-point precision automatically from the SrcType and DstType instruction fields respectively when they are set to types :F or :HF. For Source1 and Source2 operands, we use the new 1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1 means half-precision. Since we always use the type of the destination for all operands when we emit 3-source instructions, we only need set Src1Type and Src2Type to 1 when we are emitting a half-precision instruction. v2: - Set the bit separately for each source based on its type so we can do mixed floating-point mode in the future (Topi). v3: - Use regular citation style for the comment referencing the PRM (Matt). - Decided not to add asserts in the emission code to check that only mixed HF/F types are used since such checks would break negative tests for brw_eu_validate.c (Matt) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	e6b7410187	intel/compiler: allow half-float on 3-source instructions since gen8 Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	ee049f6b71	intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits We are now using these bits, so don't assert that they are not set. In gen8, if these bits are set compaction is not possible. On gen9 and CHV platforms set_3src_control_index() checks these bits (and others) against a table to validate if the particular bit combination is eligible for compaction or not. v2 - Add more detail in the commit message explaining the situation for SKL+ and CHV (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	120c970619	intel/compiler: add new half-float register type for 3-src instructions This is available since gen8. v2: restore previously existing assertion. v3: don't use separate tables for gen7 and gen8, just assert that we don't use half-float before gen8 (Matt) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	4ab2b97a8f	intel/compiler: add instruction setters for Src1Type and Src2Type. The original SrcType is a 3-bit field that takes a subset of the types supported for the hardware for 3-source instructions. Since gen8, when the half-float type was added, 3-source floating point operations can use use mixed precision mode, where not all the operands have the same floating-point precision. While the precision for the first operand is taken from the type in SrcType, the bits in Src1Type (bit 36) and Src2Type (bit 35) define the precision for the other operands (0: normal precision, 1: half precision). Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	a8d8b1a139	intel/compiler: drop unnecessary temporary from 32-bit fsign implementation Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	19cd2f5deb	intel/compiler: implement 16-bit fsign v2: - make 16-bit be its own separate case (Jason) v3: - Drop the result_int temporary (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	4588f4a604	intel/compiler: handle extended math restrictions for half-float Extended math with half-float operands is only supported since gen9, but it is limited to SIMD8. In gen8 we lower it to 32-bit. v2: quashed together the following patches (Jason): - intel/compiler: allow extended math functions with HF operands - intel/compiler: lower 16-bit extended math to 32-bit prior to gen9 - intel/compiler: extended Math is limited to SIMD8 on half-float Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (allow extended math functions with HF operands, extended Math is limited to SIMD8 on half-float)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	114f4e6c29	intel/compiler: lower some 16-bit float operations to 32-bit The hardware doesn't support half-float for these. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	b6a454791b	intel/compiler: assert restrictions on conversions to half-float There are some hardware restrictions that brw_nir_lower_conversions should have taken care of before we get here. v2: - rebased on top of regioning lowering pass Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	66806405af	intel/compiler: handle b2i/b2f with other integer conversion opcodes Since we handle booleans as integers this makes more sense. v2: - rebased to incorporate new boolean conversion opcodes v3: - rebased on top regioning lowering pass Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	92f4761198	intel/compiler: split float to 64-bit opcodes from int to 64-bit Going forward having these split is a bit more convenient since these two groups have different restrictions. v2: - Rebased on top of new regioning lowering pass. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	3e377c68f8	intel/compiler: add a NIR pass to lower conversions Some conversions are not directly supported in hardware and need to be split in two conversion instructions going through an intermediary type. Doing this at the NIR level simplifies a bit the complexity in the backend. v2: - Consider fp16 rounding conversion opcodes - Properly handle swizzles on conversion sources. v3 - Run the pass earlier, right after nir_opt_algebraic_late (Jason) - NIR alu output types already have the bit-size (Jason) - Use 'is_conversion' to identify conversion operations (Jason) v4: - Be careful about the intermediate types we use so we don't lose range and avoid incorrect rounding semantics (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Topi Pohjolainen	ea42ba36b9	intel/compiler/icl: Use tcs barrier id bits 24:30 instead of 24:27 Similarly to `1cc17fb731` Fixes gpu hangs with dEQP-VK.tessellation.shader_input_output.barrier Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-17 14:55:49 +03:00
Kenneth Graunke	fad7801afd	i965: Move program key debugging to the compiler. The i965 driver has a bunch of code to compare two sets of program keys and print out the differences. This can be useful for debugging why a shader needed to be recompiled on the fly due to non-orthogonal state dependencies. anv doesn't do recompiles, so we didn't need to share this in the past - but I'd like to use it in iris. This moves the bulk of the code to the compiler where it can be reused. To make that possible, we need to decouple it from i965 - we can't get at the brw program cache directly, nor use brw_context to print things. Instead, we use compiler->shader_perf_log(), and simply pass in keys. We put all of this debugging code in brw_debug_recompile.c, and only export a single function, for simplicity. I also tidied the code a bit while moving it, now that it all lives in one file. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-04-16 09:01:15 -07:00
Danylo Piliaiev	04508f57d1	intel/compiler: Do not reswizzle dst if instruction writes to flag register If we write to the flag register changing the swizzle would change what channels are written to the flag register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110201 Fixes: `4cd1a0be` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: <ian.d.romanick@intel.com>	2019-04-16 09:42:08 +00:00
Karol Herbst	14531d676b	nir: make nir_const_value scalar v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-14 22:25:56 +02:00
Jason Ekstrand	9b1e4bab6b	nir/builder: Add a nir_imm_zero helper v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst) Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Karol Herbst	bbf2ecaf35	intel/nir: use nir_src_is_const and nir_src_as_uint Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	6b1c398bcb	intel/nir: Take a nir_tex_instr and src index in brw_texture_offset This makes things a bit simpler and it's also more robust because it no longer has a hard dependency on the offset being a 32-bit value.	2019-04-14 22:25:56 +02:00
Sagar Ghuge	066d2aebc0	intel/fs: Remove unused condition from opt_algebraic case We will never hit a condition where we have src1 and src2 as immediate operands. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 13:47:57 -07:00
Timothy Arceri	035759b61b	nir/i965/freedreno/vc4: add a bindless bool to type size functions This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	3b2a9ffd60	nir: move brw_nir_rewrite_image_intrinsic into common code Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Mark Janes	2393cc7f00	intel/common: move gen_debug to intel/dev libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-10 13:15:33 -07:00
Caio Marcelo de Oliveira Filho	94abc53030	intel/fs: Use NIR_PASS_V when lowering CS intrinsics This will make that step visible in NIR_PRINT=1. v2: Also use the macro for the cleanup passes. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	0425b34b79	intel/fs: Don't loop when lowering CS intrinsics This was needed when certain intrinsics were lowered to other ones that were defined by the same pass. After `060817b2` "intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values" we don't need the loop anymore. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	3ee3024804	intel/fs: Add support for CS to group invocations in quads When using quads, instead of mapping the elements to the next 4 local invocation indices, we map the two next in the "current" row and two next in the "next row". A side effect is that a thread will execute the indices in a different order. We now perform the lowering of both local invocation ID and index together -- and don't rely anymore on lowering done by nir_lower_system_values. That is convenient when doing the math for quads, because we need X and Y to get the right invocation index. When the pass progresses, fold the constants and clean up to reduce the noise from the indexing math. This implements the derivative_group_quadsNV semantics from NV_compute_shader_derivatives. v2: Take subgroup_id into account, otherwise only values in the first subgroup would be used. (Jason) v3: Calculate invocation index and ID together, to avoid duplicating some math in the quads case when both index and ID are used. (Jason) v4: Don't call cleanup passes as part of the lowering, let that to the call site. (Jason) Change calculation to use less instructions. (Jason) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	ef0339d5ea	intel/fs: Use TEX_LOGICAL whenever implicit lod is supported Make sure we include compute shaders that have a derivative group defined. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00

... 9 10 11 12 13 ...

1355 commits