fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-01-02 22:30:11 +01:00

Author	SHA1	Message	Date
Dave Airlie	52e426fd8b	intel/compiler: add support for compiling fixed function gs This is ported from i965, but the interface is cleaned up Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9721>	2021-05-04 03:39:45 +00:00
Anuj Phogat	61e8636557	intel: Rename gen_device prefix to intel_device export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "gen_device" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen_device/intel_device/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>	2021-04-20 20:06:33 +00:00
Anuj Phogat	cd39d3b1ad	intel: Rename gen_device prefix in filenames export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" find $SEARCH_PATH -type f -name "gen_device" -exec sh -c 'f="{}"; mv -- "$f" "${f/gen_device/intel_device}"' \; grep -E "gen_device_info\.[cph]" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen_device_info$.\.[cph]$/intel_device_info\1/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>	2021-04-20 20:06:33 +00:00
Francisco Jerez	12479abded	intel/fs: Implement representation of SWSB cross-pipeline synchronization annotations. The execution units of XeHP platforms have multiple asynchronous ALU pipelines instead of (as far as software is concerned) the single in-order pipeline that handled most ALU instructions except for extended math in the original Xe. It's now the compiler's responsibility to identify cross-pipeline dependencies and insert synchronization annotations whenever necessary, which are encoded as some additional bits of the SWSB instruction field. This commit represents the cross-pipeline synchronization annotations as part of the existing tgl_swsb structure used for codegen. The existing tgl_swsb_*() helpers used by hand-crafted assembly are extended to default to TGL_PIPE_ALL big-hammer synchronization in order to ensure backwards compatibility with the existing assembly. The following commits will extend the software scoreboard lowering pass in order to keep track of cross-pipeline dependencies across IR instructions, and insert more specific pipeline annotations in the SWSB field. The disassembler is also extended here to print out any existing pipeline sync annotations. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>	2021-04-16 08:27:34 +00:00
Anuj Phogat	e7e55af4d6	intel: Rename GENx keyword to GFXx Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "GEN[[:digit:]]+" -rIl $SEARCH_PATH \| xargs sed -ie "s/GEN$[[:digit:]]\+$/GFX\1/g" Exclude the changes to modifiers: grep -E "I915_.GFX" -rIl $SEARCH_PATH \| xargs sed -ie "s/$I915_.$GFX/\1GEN/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	1d296484b4	intel: Rename Genx keyword to Gfxx Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "Gen[[:digit:]]+" -rIl $SEARCH_PATH \| xargs sed -ie "s/Gen$[[:digit:]]\+$/Gfx\1/g" Exclude changes in src/intel/perf/oa-.xml: find src/intel/perf -type f $ -name ".xml" $ \| xargs sed -ie "s/Gfx/Gen/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	b75f095bc7	intel: Rename genx keyword to gfxx in source files Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "gen[[:digit:]]+" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen$[[:digit:]]\+$/gfx\1/g" Exclude pack.h and xml changes in this patch: grep -E "gfx[[:digit:]]+_pack\.h" -rIl $SEARCH_PATH \| xargs sed -ie "s/gfx$[[:digit:]]\+_pack\.h$/gen\1/g" grep -E "gfx[[:digit:]]+\.xml" -rIl $SEARCH_PATH \| xargs sed -ie "s/gfx$[[:digit:]]\+\.xml$/gen\1/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	c1f3a778de	intel: Rename GENx prefix in macros to GFXx in source files Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "GEN" -rIl src/intel/genxml \| grep -E ".py" \| xargs sed -ie "s/GEN$[%{]$/GFX\1/g" grep -E "[^_]GEN[[:digit:]]+" -rIl $SEARCH_PATH \| grep -E ".(\.c\|\.h\|\.y\|\.l)" \| xargs sed -ie "s/$[^_]$GEN$[[:digit:]]\+$/\1GFX\2/g" Leave out renaming GFX12_CCS_E macros. They fall under renaming pattern like "_GEN[[:digit:]]+": grep -E "GFX12_CCS_E" -rIl $SEARCH_PATH \| xargs sed -ie "s/GFX12_CCS_E/GEN12_CCS_E/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Jason Ekstrand	91192696e6	intel/fs: Add support for 16-bit A64 float and integer atomics The messages for those 16-bit operations still use 32-bit sources and destinations, so expand them accordingly when building the payload. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8750>	2021-03-18 00:13:40 +00:00
Jason Ekstrand	369eab9420	intel/fs: Emit code for Gen12-HP indirect compute data Reworks: * Jordan: Apply to gen > 12 * Jordan: Adjust comment about loading constants Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8342>	2021-01-13 13:10:28 -08:00
Jason Ekstrand	f9d549b2bf	intel/fs: Use BRW_OPCODE_HALT for discards We're about to start using it to implement nir_jump_halt which has nothing inherently to do with fragment shaders or discards. May as well name it for the HW instruction it generates. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:19:08 -06:00
Jason Ekstrand	e76e359007	intel/fs: Rename PLACEHOLDER_HALT to HALT_TARGET It's a bit more explicit and will play more nicely with what we're about to do. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:18:50 -06:00
Jason Ekstrand	75209d5bd1	intel/fs: Add and implement intel-specific ray-tracing intrinsics Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>	2020-11-25 05:37:10 +00:00
Jason Ekstrand	7280b0911d	intel/compiler: Add support for bindless shaders The Intel bindless thread dispatch model is very simple. When a compute shader is to be used for bindless dispatch, it can request a set of stack IDs. These are allocated per-dual-subslice by the hardware and recycled automatically when the stack ID is returned. Passed to the bindless dispatch are a global argument address, a stack ID, and an address of the BINDLESS_SHADER_RECORD to invoke. When the bindless shader is dispatched, it is passed its stack ID as well as the global and local argument pointers. The local argument pointer is the address of the BINDLESS_SHADER_RECORD plus some offset which is specified as part of the BINDLESS_SHADER_RECORD. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>	2020-11-25 05:37:09 +00:00
Caio Marcelo de Oliveira Filho	d372abe397	intel/fs: Add surface OWORD BLOCK opcodes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>	2020-11-04 20:24:48 +00:00
Caio Marcelo de Oliveira Filho	d3d2b73fa3	intel/fs: Add A64 OWORD BLOCK opcodes Based on a patch for OWORD BLOCK READ from Jason Ekstrand. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7448>	2020-11-04 20:24:48 +00:00
Jason Ekstrand	06ebf23283	intel/fs: Add a SCRATCH_HEADER opcode This opcode is responsible for setting up the buffer base address and per-thread scratch space fields of a scratch message header. For the most part, it's a copy of g0 but some messages need us to zero out g0.2 and the bottom bits of g0.5. This may actually fix a bug when nir_load/store_scratch is used. The docs say that the DWORD scattered messages respect the per-thread scratch size specified in gN.3[3:0] in the message header but we've been leaving it zero. This may mean that we've been ignoring any scratch reads/writes from a load/store_scratch intrinsic above the 1KB mark. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>	2020-10-13 21:59:27 +00:00
Ian Romanick	1d71b1a311	intel/vec4: Remove everything related to VS_OPCODE_SET_SIMD4X2_HEADER_GEN9 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6826>	2020-09-28 11:43:10 -07:00
Danylo Piliaiev	77486db867	intel/fs: Disable sample mask predication for scratch stores Scratch stores are being lowered to the instructions with side-effects, however they should be enabled in fs helper invocations, since they are produced from operations which don't imply side-effects. To fix this - we move the decision of whether the sample mask predication is enable to the point where logical brw instructions are created. GLSL example of the issue: int tmp[1024]; ... do { // changes to tmp } while (some_condition(tmp)) If `tmp` is lowered to scrach memory, `some_condition` would be undefined if scratch write is predicated on sample mask, making possible for the while loop to become infinite and hang the GPU. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3256 Fixes: `53bfcdeecf` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6056>	2020-09-25 09:48:06 +00:00
Jason Ekstrand	91becd84ae	intel/fs: Add support for a new load_reloc_const intrinsic Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>	2020-09-02 19:48:44 +00:00
Jason Ekstrand	4985e380dd	intel/eu: Use non-coherent mode (BTI=253) for stateless A64 messages We don't care about full IA coherency since we always have the opportunity in GL or Vulkan to flush the data cache. Using IA-coherent mode is likely just making A64 access slower than it needs to be. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4819>	2020-04-30 14:45:50 +00:00
Caio Marcelo de Oliveira Filho	f858fa26b4	intel/fs,vec4: Pull stall logic for memory fences up into the IR Instead of emitting the stall MOV "inside" the SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when creating the IR. For IvyBridge, every (data cache) fence is accompained by a render cache fence, that now is explicit in the IR, two SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs). Because Begin and End interlock intrinsics are effectively memory barriers, move its handling alongside the other memory barrier intrinsics. The SHADER_OPCODE_INTERLOCK is still used to distinguish if we are going to use a SENDC (for Begin) or regular SEND (for End). This change is a preparation to allow emitting both SENDs in Gen11+ before we can stall on them. Shader-db results for IVB (i965): total instructions in shared programs: 11971190 -> 11971200 (<.01%) instructions in affected programs: 11482 -> 11492 (0.09%) helped: 0 HURT: 8 HURT stats (abs) min: 1 max: 3 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10% 95% mean confidence interval for instructions value: 0.66 1.84 95% mean confidence interval for instructions %-change: 0.01% 0.27% Instructions are HURT. Unlike the previous code, that used the `mov g1 g2` trick to force both `g1` and `g2` to stall, the scheduling fence will generate `mov null g1` and `mov null g2`. During review it was decided it was not worth keeping the special codepath for the small effect will have. Shader-db results for HSW (i965), BDW and SKL don't have a change on instruction count, but do report changes in cycles count, showing SKL results below total cycles in shared programs: 341738444 -> 341710570 (<.01%) cycles in affected programs: 7240002 -> 7212128 (-0.38%) helped: 46 HURT: 5 helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154 helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95% HURT stats (abs) min: 2 max: 1768 x̄: 646.40 x̃: 362 HURT stats (rel) min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08% 95% mean confidence interval for cycles value: -777.71 -315.38 95% mean confidence interval for cycles %-change: -1.42% -0.83% Cycles are helped. This seems to be the effect of allocating two registers separatedly instead of a single one with size 2, which causes different register allocation, affecting the cycle estimates. while ICL also has not change on instruction count but report changes negative changes in cycles total cycles in shared programs: 352665369 -> 352707484 (0.01%) cycles in affected programs: 9608288 -> 9650403 (0.44%) helped: 4 HURT: 104 helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101 helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49% HURT stats (abs) min: 2 max: 2016 x̄: 408.36 x̃: 48 HURT stats (rel) min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45% 95% mean confidence interval for cycles value: 256.67 523.24 95% mean confidence interval for cycles %-change: 0.63% 1.03% Cycles are HURT. AFAICT this is the result of the case above. Shader-db results for TGL have similar cycles result as ICL, but also affect instructions total instructions in shared programs: 17690586 -> 17690597 (<.01%) instructions in affected programs: 64617 -> 64628 (0.02%) helped: 55 HURT: 32 helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3 helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74% HURT stats (abs) min: 1 max: 65 x̄: 7.44 x̃: 2 HURT stats (rel) min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69% 95% mean confidence interval for instructions value: -2.03 2.28 95% mean confidence interval for instructions %-change: -0.41% 0.15% Inconclusive result (value mean confidence interval includes 0). Now that more is done in the IR, more dependencies are visible and more SWSB annotations are emitted. Mixed with different register allocation decisions like above, some shaders will see more `sync nops` while others able to avoid them. Most of the new `sync nops` are also redundant and could be dropped, which will be fixed in a separate change. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho	0e96b0d6dd	intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Dylan Baker	8e3696137f	remove final imports.h and imports.c bits This moves the fi_types to a new mesa_private.h and removes the imports.c file. The vast majority of this patch is just removing pound includes of imports.h and fixing up the recursive includes. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>	2020-04-21 11:09:04 -07:00
Caio Marcelo de Oliveira Filho	79788b8f7f	intel/gen12: Take into account opcode when decoding SWSB The interpretation of the fields is different depending whether the instruction is a SEND/MATH or not. This fixes the disassembly output for non-SEND/MATH instructions that have both in-order and out-of-order dependencies. Their dependencies were wrongly represented as `@A $B` when the correct would be `@A $B.dst`. Fixes: `6154cdf924` ("intel/eu/gen12: Add auxiliary type to represent SWSB information during codegen.") Fixes: `83612c0127` ("intel/disasm/gen12: Disassemble software scoreboard information.") Acked-by: Francisco Jerez <currojerez@riseup.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>	2020-02-18 09:17:51 -08:00
Francisco Jerez	008f95a043	intel/fs: Add virtual instruction to load mask of live channels into flag register. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: 20.0 <mesa-stable@lists.freedesktop.org>	2020-02-14 14:31:48 -08:00
Ian Romanick	58907568ec	intel/fs: Add SHADER_OPCODE_[IU]SUB_SAT pseudo-ops v2: Add a big comment explaining the [IU]SUB_SAT lowering. Suggested by Caio. v3: Use get_fpu_lowered_simd_width in get_lowered_simd_width. Suggested by Ken on IRC. v4: Fix a typo in a comment. Noticed by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>	2020-01-23 00:18:57 +00:00
Caio Marcelo de Oliveira Filho	18e72ee210	intel/fs: Add FS_OPCODE_SCHEDULING_FENCE Like a SHADER_OPCODE_MEMORY_FENCE but doesn't doesn't generate any assembly code. Will be used when the compiler shouldn't reorder certain instructions but there's no need to generate code for the HW to do it -- as the ordering will be guaranteed by other means. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>	2020-01-21 23:41:35 +00:00
Jason Ekstrand	a0999bc049	intel/fs: Add DWord scattered read/write opcodes Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-11-11 17:17:02 +00:00
Francisco Jerez	6154cdf924	intel/eu/gen12: Add auxiliary type to represent SWSB information during codegen. v2: Introduce extra tgl_swsb_sbid() constructor (Caio). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-10-11 12:24:16 -07:00
Francisco Jerez	c22db5e188	intel/fs/gen12: Add codegen support for the SYNC instruction. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	0e57dbc55c	intel/ir/gen12: Add SYNC hardware instruction. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	6e1daba3b4	intel/eu/gen12: Codegen three-source instruction source and destination regions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	1b570456ca	intel/ir: Drop hard-coded correspondence between IR and HW opcodes. Having the IR opcodes locked to their hardware representation is risky because it causes opcodes as different as BRC and IFF to compare equal at the IR level (luckily the back-end only ever uses one opcode from each group, right now), and it prevents us from supporting instructions that change their hardware representation across generations, which will become a problem on Gen12+ platforms. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Francisco Jerez	25dd67099d	intel/eu: Rework opcode description tables to allow efficient look-up by either HW or IR opcode. This rewrites the current opcode description tables as a more compact flat data structure. The purpose is to allow efficient constant-time look-up by either HW or IR opcode, which will allow us to drop the hard-coded correspondence between HW and IR opcodes -- See the next commits for the rationale. brw_eu.c is now built as C++ source so we can take advantage of pointers to member in order to make the look-up function work regardless of the opcode_desc member used as look-up key. v2: Optimize devinfo struct comparison (Caio) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-11 12:24:16 -07:00
Samuel Iglesias Gonsálvez	8a6507b6fe	i965/fs/generator: add new opcode to set float controls modes in control register Before this commit, we had only FPRoundingMode decoration (the per instruction one) that is applied during the SPIR-V handling. In vtn_alu we find out the rounding mode, and generate the code accordingly that later will be used to look for the respective nir_op_f2f16_{rtz,rtne}. Per-instruction gets prioritized because we make them explicit conversions (with RTZ or RTNE nir opcodes) and they will override the default execution mode defined with float controls. However, we need to come back to the mode defined by float controls after the execution of the FP Rounding instruction. Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be used to set the default rounding mode and denorms treatment in the whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be used as prioritized rounding mode in a per-instruction basis. v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper and the new opcode (this one) (Caio). v5: - Add an explanation on the actual purpose and priority of the newly introduced opcode in the commit log (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Caio Marcelo de Oliveira Filho	b390ff3517	intel/fs: Add support for SLM fence in Gen11 Gen11 SLM is not on L3 anymore, so now the hardware has two separate fences. Add a way to control which fence types to use. At this time, we don't have enough information in NIR to control the visibility of the memory being fenced, so for now be conservative and assume that fences will need a stall. With more information later we'll be able to reduce those. Fixes Vulkan CTS tests in ICL: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp The whole set of supported tests in dEQP-VK.memory_model.* group should be passing in ICL now. v2: Pass BTI around instead of having an enum. (Jason) Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets transformed into two. (Jason) List tests fixed. (Lionel) v3: For clarity, split the decision of which fences to emit from the emission code. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-11 08:29:32 -07:00
Sagar Ghuge	83fdec0f0d	intel/compiler: Enable the emission of ROR/ROL instructions v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support align16 mode (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Jason Ekstrand	f4ef34f207	intel/fs: Add an UNDEF instruction to avoid excess live ranges With 8 and 16-bit types and anything where we have to use non-trivial strides registersto deal with restrictions, we end up with things that look like partial writes even though we don't care about any values in the register except those written by that instruction. This is particularly important when dealing with loops because liveness sees is_partial_write and the fact that an old version from a previous loop iteration may be valid at that point and extends all purely partially written values to the entire loop. This commit adds a new UNDEF instruction which does nothing (the generator doesn't emit anything) but which does a fake write to the register. This informs liveness that we don't care about any values before that point so it won't consider those registers to be falsely live. We can safely emit UNDEF instructions for all SSA values that come in from NIR and nearly all temporaries generated by various stages of the compiler. In particular, we need to insert UNDEF instructions when we handle region restrictions because the newly allocated registers are almost guaranteed to be partially written. No shader-db changes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-04 14:27:30 -05:00
Jason Ekstrand	83af92e593	intel/fs: Add support for bindless image load/store/atomic Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	843286d324	intel/fs: Add support for bindless texture ops We add two new texture sources for bindless surface and sampler handles. Bindless surface handles are expected to be pre-shifted so that the 20-bit surface state table index is in the top 20 bits of the 32-bit handle. This lets us avoid any extra shifts in the shader. Bindless sampler handles are 32-byte aligned byte offsets from general state base address. We use 32-byte aligned instead of 16-byte aligned to avoid having to use more indirect messages than needed. It means we can't tightly pack samplers but that's probably not a big deal. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	bd56ce8ce5	anv: Implement VK_KHR_shader_atomic_int64 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	e8f863e718	intel/compiler: Re-prefix non-logical surface opcodes with VEC4 The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so we no longer need the non-logical opcodes there. Prefix them VEC4 so it's clear that they're only used by the vec4 back-end. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	aeaba24fcb	intel/compiler: Drop unused surface opcodes The unused typed surface read/write support in the vec4 back-end has been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all image and buffer ops. There's no reason to keep these opcodes around anymore. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	a04c737215	intel/fs: Get rid of the IMAGE_SIZE opcode Since switching to SHADER_OPCODE_SEND for image operations, we no longer need the non-logical opcode. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	494a0543e6	intel/fs: Re-order logical surface arguments It makes more sense to start at the surface then move on to the address and then the data. Also, this is a really good test of whether or not we got all the places that use the sources by explicit integer number. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	94f8fd9a0c	intel/fs: Add an enum type for logical sampler inst sources Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	e644ed468f	intel/fs: Implement nir_intrinsic_global_atomic_* eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	1c25bf4373	intel/fs: Implement load/store_global with A64 untyped messages eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	b284d222db	intel/fs: Use SHADER_OPCODE_SEND for varying UBO pulls on gen7+ Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00

1 2

79 commits