fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-21 15:50:11 +01:00

Author	SHA1	Message	Date
Francisco Jerez	4dc4284342	intel/fs: Implement Wa_14013745556 on TGL+. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>	2021-06-23 07:34:22 +00:00
Francisco Jerez	c19cfa9dc2	intel/fs: Fix synchronization of accumulator-clearing W/A move on TGL+. Right now the accumulator-clearing move emitted by the generator for Wa_14010017096 inherits the SWSB field from the previous instruction. This can lead to redundant synchronization, or possibly more serious issues if the previous instruction had a TGL_SBID_SET SWSB synchronization mode. Take the SWSB synchronization information from the IR. Fixes: `a27542c5dd` ("intel/compiler: Clear accumulator register before EOT") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>	2021-06-23 07:34:22 +00:00
Jason Ekstrand	705395344d	intel/fs: Add support for compiling bindless shaders with resume shaders Instead of depending on the driver to compile each resume shader separately, we compile them all in one go in the back-end and build an SBT as part of the shader program. Shader relocs are used to make the entries in the SBT point point to the correct resume shader. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>	2021-06-22 21:09:25 +00:00
Marcin Ślusarz	3340d5ee02	intel: simplify is_haswell checks, part 1 Generated with: files=`git grep is_haswell \| cut -d: -f1 \| sort \| uniq` for file in $files; do cat $file \| \ sed "s/devinfo->ver <= 7 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" \| \ sed "s/devinfo->ver >= 8 \|\| devinfo->is_haswell/devinfo->verx10 >= 75/g" \| \ sed "s/devinfo->is_haswell \|\| devinfo->ver >= 8/devinfo->verx10 >= 75/g" \| \ sed "s/devinfo.is_haswell \|\| devinfo.ver >= 8/devinfo.verx10 >= 75/g" \| \ sed "s/devinfo->ver > 7 \|\| devinfo->is_haswell/devinfo->verx10 >= 75/g" \| \ sed "s/devinfo->ver == 7 && !devinfo->is_haswell/devinfo->verx10 == 70/g" \| \ sed "s/devinfo.ver == 7 && !devinfo.is_haswell/devinfo.verx10 == 70/g" \| \ sed "s/devinfo->ver < 8 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" \| \ sed "s/device->info.ver == 7 && !device->info.is_haswell/device->info.verx10 == 70/g" \ > tmpXXX mv tmpXXX $file done Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10810>	2021-05-17 09:46:45 +00:00
Jason Ekstrand	34c560ae95	intel/fs: Stop using brw_dp_read/write_desc in Gen7+ only code Those helpers exist primarily to sort out some of the weirdness around Gen4-6 dataport access. On Gen5 and earlier, everything was called "dataport" and, instead of the SFID we have today there was a "target cache" parameter in the descriptor. There are also some bits that moved around on various gens depending on read vs. write. Starting with Gen6, most things which target one of the data cache SFIDs should use brw_dp_desc() instead. v2: Drop backward comment (Ken) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>	2021-05-02 20:20:06 +00:00
Lionel Landwerlin	6d4070f3dd	intel/compiler: add support for fragment coordinate with coarse pixels v2: Drop new internal opcodes (Jason) Simplify code (Jason) v3: Add Z computation for coarse pixels v4: Document things a little Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>	2021-05-02 20:20:06 +00:00
Lionel Landwerlin	b6332fc4a8	intel/compiler: handle coarse pixel in render target writes descriptors v2: Use the new inst->ex_desc field (Jason) v3: Drop CPS LoD compensation from sampler messages (Lionel) v4: Drop useless uses_rate_shading (Ken) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>	2021-05-02 20:20:06 +00:00
Anuj Phogat	61e8636557	intel: Rename gen_device prefix to intel_device export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "gen_device" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen_device/intel_device/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>	2021-04-20 20:06:33 +00:00
Jordan Justen	515ee73b4e	intel/fs: End computer shader with message gateway on XeHP. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>	2021-04-16 08:27:35 +00:00
Francisco Jerez	262b647b25	intel/compiler: Lower integer division on XeHP. It has been removed from the hardware. [jordan.l.justen@intel.com: Move to brw_postprocess_nir] v2: Switch to nir_lower_idiv_precise (Rhys). v3: Fix for interface changes of nir_lower_idiv. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>	2021-04-16 08:27:35 +00:00
Francisco Jerez	05cce1f97d	intel/fs: Use CHV/BXT implementation of 64-bit MOV_INDIRECT on XeHP+. According to the hardware spec "Vx1 and VxH indirect addressing for Float, Half-Float, Double-Float and Quad-Word data must not be used." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>	2021-04-16 08:27:35 +00:00
Michel Dänzer	2928c21eb7	Convert most remaining free-form fall-through comments to FALLTHROUGH One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is explicitly disabled. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>	2021-04-15 16:01:22 +00:00
Anuj Phogat	f96c3b8b63	intel: Rename GEN:BUG:### to Wa_### Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "GEN:BUG:" -rIl $SEARCH_PATH \| xargs sed -ie "s/GEN$:BUG:$/Wa_/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	e7e55af4d6	intel: Rename GENx keyword to GFXx Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "GEN[[:digit:]]+" -rIl $SEARCH_PATH \| xargs sed -ie "s/GEN$[[:digit:]]\+$/GFX\1/g" Exclude the changes to modifiers: grep -E "I915_.GFX" -rIl $SEARCH_PATH \| xargs sed -ie "s/$I915_.$GFX/\1GEN/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	1d296484b4	intel: Rename Genx keyword to Gfxx Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "Gen[[:digit:]]+" -rIl $SEARCH_PATH \| xargs sed -ie "s/Gen$[[:digit:]]\+$/Gfx\1/g" Exclude changes in src/intel/perf/oa-.xml: find src/intel/perf -type f $ -name ".xml" $ \| xargs sed -ie "s/Gfx/Gen/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	b75f095bc7	intel: Rename genx keyword to gfxx in source files Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "gen[[:digit:]]+" -rIl $SEARCH_PATH \| xargs sed -ie "s/gen$[[:digit:]]\+$/gfx\1/g" Exclude pack.h and xml changes in this patch: grep -E "gfx[[:digit:]]+_pack\.h" -rIl $SEARCH_PATH \| xargs sed -ie "s/gfx$[[:digit:]]\+_pack\.h$/gen\1/g" grep -E "gfx[[:digit:]]+\.xml" -rIl $SEARCH_PATH \| xargs sed -ie "s/gfx$[[:digit:]]\+\.xml$/gen\1/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	c1f3a778de	intel: Rename GENx prefix in macros to GFXx in source files Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "GEN" -rIl src/intel/genxml \| grep -E ".py" \| xargs sed -ie "s/GEN$[%{]$/GFX\1/g" grep -E "[^_]GEN[[:digit:]]+" -rIl $SEARCH_PATH \| grep -E ".(\.c\|\.h\|\.y\|\.l)" \| xargs sed -ie "s/$[^_]$GEN$[[:digit:]]\+$/\1GFX\2/g" Leave out renaming GFX12_CCS_E macros. They fall under renaming pattern like "_GEN[[:digit:]]+": grep -E "GFX12_CCS_E" -rIl $SEARCH_PATH \| xargs sed -ie "s/GFX12_CCS_E/GEN12_CCS_E/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Anuj Phogat	abe9a71a09	intel: Rename gen field in gen_device_info struct to ver Commands used to do the changes: export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965" grep -E "info\)(.\|->)gen" -rIl $SEARCH_PATH \| xargs sed -ie "s/info$)$$\.\\|->$gen/info\1\2ver/g" Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>	2021-04-02 18:33:07 +00:00
Ian Romanick	6c8e2e9317	intel/compiler: Enable the ability to emit CMPN instructions v2: Move checks to the EU validator. Suggested by Jason. Fixes: `2f2c00c727` ("i965: Lower min/max after optimization on Gen4/5.") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9027>	2021-02-17 19:52:24 +00:00
Ian Romanick	b0d7434c71	intel/eu/validate: Add some checks for CMP and CMPN These checks were originally assertions elsewhere either in the existing code or later in this MR. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9027>	2021-02-17 19:52:24 +00:00
Jason Ekstrand	3ce6ca7214	intel/fs: Shuffle can't handle source modifiers On Gen7, we have to split shuffles into two MOVs for 64-bit types so we can't handle source modifiers. On Gen12.5, we have to use integer types all the time so we can't use them there either. Fixing that will be a different commit but it interacts with this one. Fixes: `90c9f29518` "i965/fs: Add support for nir_intrinsic_shuffle" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9068>	2021-02-17 03:59:25 +00:00
Jason Ekstrand	f3a43e36e0	intel/fs: Add an ex_desc field to fs_inst for SHADER_OPCODE_SEND I meant to do this years ago when I first added SHADER_OPCODE_SEND. At the time, the only use for the extended descriptor was bindless handles which were always one thing and never non-constant. However, it doesn't actually require any extra instructions because we have to OR in ex_mlen anyway. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8748>	2021-01-28 17:57:48 +00:00
Jason Ekstrand	c80db6611a	intel/fs: Support 64-bit CLUSTER_BROADCAST on Gen11+ Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>	2021-01-22 18:38:38 +00:00
Jason Ekstrand	b90921ec0c	intel/fs: Support 64-bit SHUFFLE on Gen11+ Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>	2021-01-22 18:38:38 +00:00
Jason Ekstrand	cdedc82329	intel/fs: Support 64-bit SEL_EXEC on Gen11+ Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>	2021-01-22 18:38:37 +00:00
Jason Ekstrand	f9d549b2bf	intel/fs: Use BRW_OPCODE_HALT for discards We're about to start using it to implement nir_jump_halt which has nothing inherently to do with fragment shaders or discards. May as well name it for the HW instruction it generates. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:19:08 -06:00
Jason Ekstrand	e76e359007	intel/fs: Rename PLACEHOLDER_HALT to HALT_TARGET It's a bit more explicit and will play more nicely with what we're about to do. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>	2020-12-01 16:18:50 -06:00
Jason Ekstrand	7280b0911d	intel/compiler: Add support for bindless shaders The Intel bindless thread dispatch model is very simple. When a compute shader is to be used for bindless dispatch, it can request a set of stack IDs. These are allocated per-dual-subslice by the hardware and recycled automatically when the stack ID is returned. Passed to the bindless dispatch are a global argument address, a stack ID, and an address of the BINDLESS_SHADER_RECORD to invoke. When the bindless shader is dispatched, it is passed its stack ID as well as the global and local argument pointers. The local argument pointer is the address of the BINDLESS_SHADER_RECORD plus some offset which is specified as part of the BINDLESS_SHADER_RECORD. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>	2020-11-25 05:37:09 +00:00
Ian Romanick	262ca98b3a	intel/compiler: Remove Gen10-specific code Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>	2020-10-15 09:29:53 -07:00
Jason Ekstrand	06ebf23283	intel/fs: Add a SCRATCH_HEADER opcode This opcode is responsible for setting up the buffer base address and per-thread scratch space fields of a scratch message header. For the most part, it's a copy of g0 but some messages need us to zero out g0.2 and the bottom bits of g0.5. This may actually fix a bug when nir_load/store_scratch is used. The docs say that the DWORD scattered messages respect the per-thread scratch size specified in gN.3[3:0] in the message header but we've been leaving it zero. This may mean that we've been ignoring any scratch reads/writes from a load/store_scratch intrinsic above the 1KB mark. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>	2020-10-13 21:59:27 +00:00
Jason Ekstrand	8427e56067	intel/fs: Don't use NoDDClk/NoDDClr for split SHUFFLEs When I copied and pasted the code from MOV_INDIRECT for handling the dependency controls, I missed a subtle difference between MOV_INDIRECT and SHUFFLE. Specifically, MOV_INDIRECT gets lowered to a narrow instruction on Gen7 by the SIMD width lowering whereas SHUFFLE has to split it in the generator. Therefore, the check safety check for whether or not we can use dependency control has to be based on the lowered width rather than the width of the original instruction. Fixes: `a8ac61b0ee` "intel/fs: NoMask initialize the address..." Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3593 Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6989>	2020-10-02 19:53:56 +00:00
Jason Ekstrand	a8ac61b0ee	intel/fs: NoMask initialize the address register for shuffles Cc: mesa-stable@lists.freedesktop.org Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2979 Tested-by: Iván Briano <ivan.briano@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6825>	2020-10-02 00:42:56 +00:00
Marcin Ślusarz	5ea0b6a9c6	intel/compiler: initialize remaining fields of various classes These variables seem to be initialized before being used, so this patch is not fixing any bug, but leaving them unitialized may become a bug after some refactoring. These classes were affected: fs_reg_alloc, fs_visitor, fs_generator, instruction_scheduler. Found by Coverity. Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6667>	2020-09-10 12:16:58 +00:00
Marcin Ślusarz	663c4d5377	intel/fs: add hint how to get more info when shader validation fails Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6559>	2020-09-04 12:09:22 +00:00
Jason Ekstrand	91becd84ae	intel/fs: Add support for a new load_reloc_const intrinsic Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>	2020-09-02 19:48:44 +00:00
Jason Ekstrand	90b6745bc8	intel/fs,vec4: Stuff the constant data from NIR in the end of the program Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>	2020-09-02 19:48:44 +00:00
Jason Ekstrand	91348d125d	intel/eu: Add some new helpers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>	2020-09-02 19:48:44 +00:00
Danylo Piliaiev	bc4a127d6e	intel/disasm: Label support in shader disassembly for UIP/JIP Shader instructions which use UIP/JIP now get formatted with a label in addition with immediate value, labels have "LABEL%d" format. v2: - Consider brw_jump_scale when calculating label's offset From: "Lonnberg, Toni" <toni.lonnberg@intel.com> Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4245>	2020-09-02 10:33:29 +00:00
Jason Ekstrand	cccb497d3c	intel/fs: Fix MOV_INDIRECT and BROADCAST of Q types on Gen11+ The immediate case is pretty uncommon to see but it can happen, in theory. BROADCAST is typically used to uniformize values and those are usually 32-bit. However, it does come up in some subgroup ops. Fixes: `49c21802cb` "intel/compiler: Split has_64bit_types into float/int" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6211>	2020-09-01 13:25:20 -05:00
Jason Ekstrand	c48f42e178	intel/fs: Emit HALT for discard on Gen4-5 Using HALT to immediately jump to the end of the shader is required to implement GL_EXT_gpu_shader4 and OpenGL 3.0. However, vanilla OpenGL 1.2 doesn't forbid it and it likely makes something somewhere faster. We should be consistent and implement the same discard behavior on all hardware if we can. The rules for HALT on Gen4-5 are a bit different from Gen6+. On the older hardware, there is no stack for HALT; instead it's up to software to save and restore mask registers. However, there's no real saving needed since we only use HALT to jump to the end of the program where we're about about to do our FB writes. All we need to do is reset AMask to DMask, the value it was initialized to at the start of the thread. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5244>	2020-05-30 06:21:15 +00:00
Caio Marcelo de Oliveira Filho	f858fa26b4	intel/fs,vec4: Pull stall logic for memory fences up into the IR Instead of emitting the stall MOV "inside" the SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when creating the IR. For IvyBridge, every (data cache) fence is accompained by a render cache fence, that now is explicit in the IR, two SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs). Because Begin and End interlock intrinsics are effectively memory barriers, move its handling alongside the other memory barrier intrinsics. The SHADER_OPCODE_INTERLOCK is still used to distinguish if we are going to use a SENDC (for Begin) or regular SEND (for End). This change is a preparation to allow emitting both SENDs in Gen11+ before we can stall on them. Shader-db results for IVB (i965): total instructions in shared programs: 11971190 -> 11971200 (<.01%) instructions in affected programs: 11482 -> 11492 (0.09%) helped: 0 HURT: 8 HURT stats (abs) min: 1 max: 3 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10% 95% mean confidence interval for instructions value: 0.66 1.84 95% mean confidence interval for instructions %-change: 0.01% 0.27% Instructions are HURT. Unlike the previous code, that used the `mov g1 g2` trick to force both `g1` and `g2` to stall, the scheduling fence will generate `mov null g1` and `mov null g2`. During review it was decided it was not worth keeping the special codepath for the small effect will have. Shader-db results for HSW (i965), BDW and SKL don't have a change on instruction count, but do report changes in cycles count, showing SKL results below total cycles in shared programs: 341738444 -> 341710570 (<.01%) cycles in affected programs: 7240002 -> 7212128 (-0.38%) helped: 46 HURT: 5 helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154 helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95% HURT stats (abs) min: 2 max: 1768 x̄: 646.40 x̃: 362 HURT stats (rel) min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08% 95% mean confidence interval for cycles value: -777.71 -315.38 95% mean confidence interval for cycles %-change: -1.42% -0.83% Cycles are helped. This seems to be the effect of allocating two registers separatedly instead of a single one with size 2, which causes different register allocation, affecting the cycle estimates. while ICL also has not change on instruction count but report changes negative changes in cycles total cycles in shared programs: 352665369 -> 352707484 (0.01%) cycles in affected programs: 9608288 -> 9650403 (0.44%) helped: 4 HURT: 104 helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101 helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49% HURT stats (abs) min: 2 max: 2016 x̄: 408.36 x̃: 48 HURT stats (rel) min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45% 95% mean confidence interval for cycles value: 256.67 523.24 95% mean confidence interval for cycles %-change: 0.63% 1.03% Cycles are HURT. AFAICT this is the result of the case above. Shader-db results for TGL have similar cycles result as ICL, but also affect instructions total instructions in shared programs: 17690586 -> 17690597 (<.01%) instructions in affected programs: 64617 -> 64628 (0.02%) helped: 55 HURT: 32 helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3 helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74% HURT stats (abs) min: 1 max: 65 x̄: 7.44 x̃: 2 HURT stats (rel) min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69% 95% mean confidence interval for instructions value: -2.03 2.28 95% mean confidence interval for instructions %-change: -0.41% 0.15% Inconclusive result (value mean confidence interval includes 0). Now that more is done in the IR, more dependencies are visible and more SWSB annotations are emitted. Mixed with different register allocation decisions like above, some shaders will see more `sync nops` while others able to avoid them. Most of the new `sync nops` are also redundant and could be dropped, which will be fixed in a separate change. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho	0e96b0d6dd	intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>	2020-04-29 07:17:27 +00:00
Francisco Jerez	486f3b04a5	intel/ir: Pass block cycle count information explicitly to disassembler. So we can eventually remove the cycle count estimates from the CFG data structure and consolidate performance information in the brw::performance object. It would be cleaner to pass the brw::performance object directly to the disassembler but that isn't straightforward since the disassembler is built as a plain C file unlike the rest of the compiler back-end. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Francisco Jerez	6579f562c3	intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats. These should be more accurate than the current cycle counts, since among other things they consider the effect of post-scheduling passes like the software scoreboard on TGL. In addition it will enable us to clean up some of the now redundant cycle-count estimation functionality in the instruction scheduler. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-04-28 23:01:27 -07:00
Dylan Baker	f8e4542bad	replace _mesa_logbase2 with util_logbase2 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>	2020-04-21 11:09:03 -07:00
Caio Marcelo de Oliveira Filho	c76f2292b5	intel/fs,vec4: Properly account SENDs in IVB memory fence Change brw_memory_fence to return the number of messages emitted, and use that to update the send_count statistic in code generation. This will fix the book-keeping for IVB since the memory fences will result in two SEND messages. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4646>	2020-04-20 09:29:09 -07:00
Jason Ekstrand	b2e4157143	anv: Advertise SEND count through VK_EXT_pipeline_executable_properties Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4578>	2020-04-15 21:51:55 +00:00
Francisco Jerez	45d4665dc7	intel/fs: Fix workaround for VxH indirect addressing bug under control flow. The current workaround for this hardware bug involved marking the ADD instruction used to initialize the address register as NoMask on Gen12, which was based on the assumption that the problem was caused by a hardware bug affecting the application of the execution mask to the address register write. However that doesn't seem to be the case: The address register write was working correctly, the real problem leading to hangs on TGL is that the indirect addressing logic is unable to deal with garbage values in the address register (e.g. misaligned offsets), even for channels which are currently inactive due to non-uniform control flow. The current workaround isn't able to avoid that situation in general, since the result of the NoMask ADD instruction for a dead channel is calculated based on the corresponding (dead) component of the indirect_byte_offset source, which would still be undefined in the likely case that the source was initialized under control flow itself. This would lead to hangs whenever MOV_INDIRECT was used under non-uniform control flow in some scenarios like a tessellation shader from GFXBench5/gl_4 (AKA Car Chase) on TGL. In addition I've managed to reproduce the same issue on earlier platforms by initializing the whole address register with garbage before the ADD instruction, so this seems to be a long-standing issue we have avoided mostly by luck. This patch fixes the problem and applies the workaround to all platforms, since even when the hardware is able to deal with garbage address values without hanging there might be a significant performance cost from reading random GRF registers due to the useless extra EU cycles spent fetching registers for dead channels and due to the potential for unintended serialization with respect to other random instructions that could be executed in parallel, which may have had a cost of the order of hundreds of cycles in the worst case scenario. Fixes: `f93dfb509c` "intel/fs: Write the address register with NoMask for MOV_INDIRECT" Tested-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2020-03-10 00:42:50 +00:00
Matt Turner	e924181ea8	intel/compiler: Discount NOPs from instruction counts Scheduler changes can cause changes in the number of instructions due to this workaround, so just don't include NOPs in the instruction counts to prevent shader-db noise. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>	2020-03-09 04:44:12 +00:00
Matt Turner	bb3e7b0fe3	intel/compiler: Pass shader_stats for each SIMD mode Passing shader_stats to the fs_generator constructor means that the SIMD8 shader stats from the visitor (such as the scheduler mode) will be reported out for the SIMD16/SIMD32 versions as well. As you can see, we are now passing 'shader_stats' and 'stats' to generate_code(), which is obviously odd looking. Ian rebased and committed an old patch of mine which added the shader_stats struct on July 30 in commit `dabb5d4bee` (i965/fs: Add a shader_stats struct.) and shortly after on August 12 Jason added the brw_compile_stats struct in commit `134607760a` (intel/compiler: Fill a compiler statistics struct). I'd like to combine the two, but I'm not sure how. shader_stats is an input to generate_code() while brw_compile_stats is an output and is only used by the Vulkan driver. Leave it as is for now... Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>	2020-03-09 04:44:12 +00:00

1 2 3 4

171 commits