fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 02:40:11 +01:00

Author	SHA1	Message	Date
Kenneth Graunke	589b03d02f	intel/fs: Opportunistically split SEND message payloads While we've taken advantage of split-sends in select situations, there are many other cases (such as sampler messages, framebuffer writes, and URB writes) that have never received that treatment, and continued to use monolithic send payloads. This commit introduces a new optimization pass which detects SEND messages with a single payload, finds an adjacent LOAD_PAYLOAD that produces that payload, splits it two, and updates the SEND to use both of the new smaller payloads. In places where we manually used split SENDS, we rely on underlying knowledge of the message to determine a natural split point. For example, header and data, or address and value. In this pass, we instead infer a natural split point by looking at the source registers. Often times, consecutive LOAD_PAYLOAD sources may already be grouped together in a contiguous block, such as a texture coordinate. Then, there is another bit of data, such as a LOD, that may come from elsewhere. We look for the point where the source list switches VGRFs, and split it there. (If there is a message header, we choose to split there, as it will naturally come from elsewhere.) This not only reduces the payload sizes, alleviating register pressure, but it means that we may be able to eliminate some payload construction altogether, if we have a contiguous block already and some extra data being tacked on to one side or the other. shader-db results for Icelake are: total instructions in shared programs: 19602513 -> 19369255 (-1.19%) instructions in affected programs: 6085404 -> 5852146 (-3.83%) helped: 23650 / HURT: 15 helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3 helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15% HURT stats (abs) min: 1 max: 44 x̄: 7.20 x̃: 2 HURT stats (rel) min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00% 95% mean confidence interval for instructions value: -10.16 -9.55 95% mean confidence interval for instructions %-change: -3.84% -3.72% Instructions are helped. total cycles in shared programs: 848180368 -> 842208063 (-0.70%) cycles in affected programs: 599931746 -> 593959441 (-1.00%) helped: 22114 / HURT: 13053 helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22 helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75% HURT stats (abs) min: 1 max: 94022 x̄: 526.67 x̃: 22 HURT stats (rel) min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61% 95% mean confidence interval for cycles value: -222.87 -116.79 95% mean confidence interval for cycles %-change: -1.44% -1.20% Cycles are helped. total spills in shared programs: 8387 -> 6569 (-21.68%) spills in affected programs: 5110 -> 3292 (-35.58%) helped: 359 / HURT: 3 total fills in shared programs: 11833 -> 8218 (-30.55%) fills in affected programs: 8635 -> 5020 (-41.86%) helped: 358 / HURT: 3 LOST: 1 SIMD16 shader, 659 SIMD32 shaders GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%) Examining these results: the few shaders where spills/fills increased were already spilling significantly, and were only slightly hurt. The applications affected were also helped in countless other shaders, and other shaders stopped spilling altogether or had 50% reductions. Many SIMD16 shaders were gained, and overall we gain more SIMD32, though many close to the register pressure line go back and forth. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>	2022-07-01 02:05:45 +00:00
Kenneth Graunke	a8b93e628a	intel/compiler: Handle split-sends in EOT high-register pinning case SEND messages with EOT need to use g112-g127 for their sources so that the hardware is able to launch new threads while old ones are finishing without worrying about register overlap when pushing payloads. For the newer split-send messages, this applies to both source registers. Our special case for this in the register allocator was only considering the first source. This wasn't a problem because we hadn't ever tried to use split-sends with EOT before. However, my new optimization pass is going to introduce some shortly, so we'll need to handle them properly. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>	2022-07-01 02:05:45 +00:00
Kenneth Graunke	dd76196cea	intel/compiler: Convert brw_eu.cpp back to brw_eu.c Now that we've removed the thread_local lookup tables using pointer-to-member C++ features, this can go back to being a standard C file, like it was in the past. We just need to annotate a couple of things with "struct". Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Kenneth Graunke	ea72ec98bf	intel/compiler: Remove use of thread_local for opcode tables We had been using thread_local index -> opcode_desc tables to avoid plumbing through a storage location throughout all the code. But now we have done so with the new brw_isa_info structure. So we can just store the tables there, and initialize it with the compiler. This fixes crashes in gtk4-demo on iris, and should help with some programs on zink as well. Something was going wrong with the thread_local variables not being set up correctly. While we might be able to work around that issue, there's really no advantage to storing these lookup tables in TLS (beyond it being simpler to do originally). So let's simply stop doing so. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6728 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6229 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Kenneth Graunke	72e9843991	intel/compiler: Introduce a new brw_isa_info structure This structure will contain the opcode mapping tables in the next commit. For now, this is the mechanical change to plumb it into all the necessary places, and it continues simply holding devinfo. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Kenneth Graunke	342471e93d	intel/compiler: Move opcode_desc handling to a separate header This patch creates a new header file, brw_isa_info.h, which will contains all the functions related to opcode encoding on various generations. Opcode numbers may have different meanings on different hardware, so we remap them between an enum we can easily work with and the hardware encoding. We move the brw_inst setters and getters to brw_inst.h. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Kenneth Graunke	fdae90aa85	intel/compiler: Split 3DPRIM_* defines out to a separate header. These clash with genxml and will become a problem shortly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Kenneth Graunke	9f8784232a	intel/compiler: Fix brw_gfx_ver_enum.h to be a proper header file This header file didn't include normal guards against being included multiple times. It also defined a function in a header file without marking it static inline. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Kenneth Graunke	a141a351de	intel/compiler: Stop including src/mesa/main/config.h src/mesa/main includes are for Mesa's OpenGL implementation, and the compiler is used in Vulkan drivers and other tools. We really only needed one #define, which is that we offer 32 samplers. It probably makes more sense to have our own defined limit for that rather than importing a project-wide value which theoretically could be adjusted, so swap MAX_SAMPLERS for a new BRW_MAX_SAMPLERS and call it a day. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>	2022-06-30 23:46:35 +00:00
Konstantin Seurer	85da294bfe	intel: Use nir_test_mask instead of i2b(iand) Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17242>	2022-06-30 18:00:32 +00:00
Lionel Landwerlin	9d7d1c0637	intel/clc: enable fp16 & subgroups for GRL Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17253>	2022-06-27 15:31:49 +00:00
Marcin Ślusarz	42b551fe7f	intel/compiler: adjust task payload offsets as late as possible Otherwise passes which expect offsets to be in bytes (like brw_nir_lower_mem_access_bit_sizes, called from brw_postprocess_nir) may produce incorrect results. Fixes 64-bit load/stores in task/mesh shaders. Fixes: `c36ae42e4c` ("intel/compiler: Use nir_var_mem_task_payload") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16196>	2022-06-27 14:14:41 +00:00
Marcin Ślusarz	f4386b81e6	intel: fix typos found by codespell Acked-by: David Heidelberg <david.heidelberg@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17191>	2022-06-27 10:20:55 +00:00
Marcin Ślusarz	f871aa10a1	intel/compiler: assert that base is 0 for [load\|store]_shared intrins Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17143>	2022-06-22 10:32:13 +00:00
Marcin Ślusarz	008163f382	intel/compiler: vectorize task payload loads/stores Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17000>	2022-06-20 17:38:20 +00:00
Ian Romanick	676acfe956	intel/fs: Add missing synchronization for WaW dependency v2: Do the synchronization in the correct place. Noticed by Curro. Fixes: `b5fa43952a` ("intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT") Reviewed-by: Francisco Jerez <currojerez@riseup.net> Tested-by: Felix DeGrood <felix.j.degrood@intel.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17037>	2022-06-17 17:05:43 +00:00
Lionel Landwerlin	03e543a422	intel/validator: validate dst/src types against devinfo support v2: deal with src3_a1/src3_a16 instruction types (Curro) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16985>	2022-06-17 15:43:05 +00:00
Yonggang Luo	0f3064ee44	intel: using C++11 keyword thread_local Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15087>	2022-06-15 17:37:16 +00:00
Jason Ekstrand	844a70f439	intel/compiler: Use NIR_PASS(_, ...) I don't know when this was added but it's really neat and we should use it instead of NIR_PASS_V since NIR_DEBUG=print and a few validation things will work better. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17014>	2022-06-13 22:31:25 +00:00
Francisco Jerez	96e7e92f0d	intel/fs/xehp+: Emit scheduling fence for all NIR barriers on platforms with LSC. Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15743>	2022-06-12 12:56:47 +03:00
Tapani Pälli	47773a5d7c	intel/fs: setup SEND message descriptor from nir scope This fixes many tests in following groups on DG2: dEQP-VK.memory_model.* dEQP-VK.fragment_shader_interlock.* v2: use memory scope and setup descriptor also for barriers without defined scope (Curro), use local scope and flush type none with NIR_SCOPE_NONE scope, cleanups (Lionel) v3: use LSC_FENCE_THREADGROUP for NIR_SCOPE_WORKGROUP, remove default case (Curro), use eviction if scope was not defined, use LSC_FENCE_GPU scope for vertex stage v4: use LSC_FENCE_TILE independent of stage for device scope (Curro) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15743>	2022-06-12 12:29:47 +03:00
Kenneth Graunke	a8e718c7e5	intel/compiler: Fix A64 header construction with a uniform address fs_visitor::assign_curb_setup() maps UNIFORM registers to HW regs, and contains the following assert: assert(inst->src[i].stride == 0); emit_a64_oword_block_header's striding tricks run afoul of this restriction, by producing stride 1 values on a 64-bit UNIFORM source. Work around this by copying the UNIFORM value to a VGRF first. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16938>	2022-06-10 02:14:57 +00:00
Emma Anholt	464b32c030	glsl: Drop the div-to-mul-rcp lowering for floats. NIR has fdiv, and all the NIR backends have to have lower_fdiv set appropriately already since various passes (format conversions, tgsi_to_nir, nir_fast_normalize(), etc.) might generate one. This causes softpipe and llvmpipe to now do actual divides, since lower_fdiv is not set there. Note that llvmpipe's rcp implementation is a divide of 1.0 by x, so now we're going to be just doing div(x, y) instead of mul(x, div(1.0, y)). Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>	2022-06-07 02:38:42 +00:00
Erik Faye-Lund	2a134347cb	intel/compiler: use macro for power-of-two check This will allow the use of static_assert here instead of our compiler-specific implementation. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16670>	2022-06-03 07:14:43 +00:00
Paulo Zanoni	72a7d7d7a8	intel/compiler: call ordered_unit() only once at update_inst_scoreboard() Call it once instead of calling the very same function for each source and destination. This should make those ternary operators a little easier to read, IMHO. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15835>	2022-06-02 23:04:39 +00:00
Paulo Zanoni	2256314b08	intel/compiler: split handling of 64 bit floats and ints In opt_algebraic(), handle TYPE_DF in a different check than TYPE_Q. We have a separate flag for each type, use separate checks so platforms where one is true and the other is not can work properly. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15835>	2022-06-02 23:04:39 +00:00
Paulo Zanoni	8f02e6cb19	intel/compiler: compute int64_options based on devinfo->has_64bit_int Don't compute it based on devinfo->has_64bit_float. Othwerwise we may end up emitting 64bit-int (Q) instructions on platforms with 64bit floats but not 64bit integers. Right now, the only platforms where has_64bit_int is different from has_64bit_float are the platforms that use GFX7_FEATURES. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15835>	2022-06-02 23:04:39 +00:00
Kenneth Graunke	26bb81f3f6	intel/compiler: Fix uncompaction of signed word immediates on Tigerlake This expression accidentally performs a 32-bit sign-extension when processing the second half of the expression (the low 16 bits). Consider -7W, which is represented as 0xfff9fff9 in our encoding (the 16-bit word is replicated to both halves of the 32-bit dword). Tigerlake's compaction stores the low 11-bits of an immediate as-is, and replicates the 12th bit. So here, compacted_imm will be 0xff9. ( (int)(0xff9 << 20) >> 4) \| ((short)(0xff9 << 4) >> 4)) 0xfff90000 \| (0xff90 >> 4) 0xfff90000 \| 0xfffffff9 ...oops... 0xfffffff9 By casting the second line of the expression to unsigned short, we prevent the sign-extension when it combines both parts, so we get: 0xfff90000 \| 0x0000fff9 0xfff9fff9 Fixes: `12d3b11908` ("intel/compiler: Add instruction compaction support on Gen12") Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16833>	2022-06-02 13:59:38 -07:00
Jason Ekstrand	dfedeccc13	intel: Only set VectorMaskEnable when needed For cases with lots of very small primitives, this may improve performance because we're not executing those dead channels all the time. Shader-db reports no instruction or cycle-count changes. However, by hacking up the driver to report when this optimization triggers, it appears to affect about 10% of shader-db. v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now, because using VMask on those platforms allows us to perform the eliminate_find_live_channel() optimization. However, XeHP doesn't seem to have packed fragment shader dispatch, so we lose that optimization regardless, and there's no reason not to avoid vmask. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>	2022-05-27 21:52:48 +00:00
Jason Ekstrand	1b9248e761	intel/fs: Copy color_outputs_valid into wm_prog_data Fixes: `36ee2fd61c` ("anv: Implement the basic form of VK_EXT_transform_feedback") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>	2022-05-27 14:33:53 +00:00
Jason Ekstrand	8379993223	intel/fs: Drop fs_visitor::emit_alpha_to_coverage_workaround() It no longer exists. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>	2022-05-27 14:33:53 +00:00
Lionel Landwerlin	e666089082	intel/disasm: add missing handling of <1;1,0> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `7cd9adeb41` ("intel/compiler: In XeHP prefer <1;1,0> regions before compacting") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16704>	2022-05-26 06:42:16 +00:00
Kenneth Graunke	9886615958	intel/compiler: Move spill/fill tracking to the register allocator Originally, we had virtual opcodes for scratch access, and let the generator count spills/fills separately from other sends. Later, we started using the generic SHADER_OPCODE_SEND for spills/fills on some generations of hardware, and simply detected stateless messages there. But then we started using stateless messages for other things: - anv uses stateless messages for the buffer device address feature. - nir_opt_large_constants generates stateless messages. - XeHP curbe setup can generate stateless messages. So counting stateless messages is not accurate. Instead, we move the spill/fill accounting to the register allocator, as it generates such things, as well as the load/store_scratch intrinsic handling, as those are basically spill/fills, just at a higher level. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>	2022-05-25 06:56:01 +00:00
Kenneth Graunke	59bfc9c6cb	intel: Fix analysis invalidation in eliminate_find_live_channel If we saw a HALT instruction, we would forget to invalidate our analysis pass information before returning progress. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16677>	2022-05-24 22:36:39 +00:00
Timothy Arceri	d7a071a28f	gallium/drivers: set force_indirect_unrolling_sampler for all required drivers This is set to true for all drivers that have a GLSL level of support lower than 4.00. This matches the rule for setting the GLSL IR option EmitNoIndirectSampler. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>	2022-05-17 02:12:21 +00:00
Marcin Ślusarz	9acb30c8c4	intel/compiler: implement primitive shading rate for mesh Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16030>	2022-05-13 13:05:51 +00:00
Marcin Ślusarz	29a778fa6b	intel/compiler: print name of the unhandled intrinsic Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>	2022-05-13 09:43:02 +00:00
Marcin Ślusarz	65ff6932dc	intel/compiler: handle gl_Viewport and gl_Layer in FS URB setup Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>	2022-05-13 09:43:02 +00:00
Marcin Ślusarz	040062df41	intel/compiler: handle VARYING_SLOT_CULL_PRIMITIVE in mesh It's needed for gl_MeshPerPrimitiveNV[].gl_ViewportMask Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>	2022-05-13 09:43:02 +00:00
Vadym Shovkoplias	55c71217ec	driconf: Add a limit_trig_input_range option With this option enabled range of input values for fsin and fcos is limited to [-2pi : 2pi] by calculating the reminder after 2*pi modulo division. This helps to improve calculation precision for large input arguments on Intel. -v2: Add limit_trig_input_range option to prog_key to update shader cache (Lionel) Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>	2022-05-13 06:47:53 +00:00
Jason Ekstrand	352e32e5ba	nir/builder: Add a nir_trim_vector helper This pattern pops up a bunch and the semantics of nir_channels() aren't very convenient much of the time. Let's add a nir_trim_vector() which matches nir_pad_vector(). Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>	2022-05-11 14:47:33 +00:00
Karol Herbst	9c5fd100cc	nir: add a nir_remove_non_entrypoints helper This code just got duplicated a lot. There is still more, but the remaining instances do a bit more than just removing other functions. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16348>	2022-05-10 03:37:44 +00:00
Emma Anholt	3a42e92a4f	glsl: Drop the dead MOD_TO_FLOOR path. It's now called lower_fmod in NIR. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8044>	2022-05-05 22:25:03 +00:00
Caio Oliveira	7cd9adeb41	intel/compiler: In XeHP prefer <1;1,0> regions before compacting Ken performed some tests with shader-db to evaluate the effects ``` Across all 145,848 shaders generated, the results were: Total bytes compacted before: 3,326,224 Total bytes compacted after: 60,963,280 ``` Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15399>	2022-05-02 18:03:01 +00:00
Emma Anholt	536c8ee96d	nir/lower_tex: Make the adding a 0 LOD to nir_op_tex in the VS optional. This controls the whole lowering of "make tex ops with implicit derivatives on non-implicit-derivative stages be tex ops with an explicit lod of 0 instead", but it's really hard to describe that in a git commit summary. All existing callers get it added except: - nir_to_tgsi which didn't want it. - nouveau, which didn't want it (fixes regressions in shadowcube and shadow2darray with NIR, since the shading languages don't expose txl of those sampler types and thus it's not supported in HW) - optional lowering passes in mesa/st (lower_rect, YUV lowering, etc) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>	2022-04-28 21:26:08 +00:00
Lionel Landwerlin	8ef8e72aac	intel/fs: tidy up lower of ray queries We already expect a single function. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15946>	2022-04-19 12:56:06 +00:00
Marcin Ślusarz	5dace41c10	intel/compiler: invalidate metadata in brw_nir_initialize_mue New "if" blocks may have been inserted. Fixes: `bc4f8c073a` ("intel/compiler: inject MUE initialization") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15924>	2022-04-19 11:43:55 +00:00
Marcin Ślusarz	4fddef33d5	intel/compiler: invalidate all metadata in brw_nir_lower_intersection_shader New "if" blocks were inserted. Fixes: `303378e1dd` ("intel/rt: Add lowering for combined intersection/any-hit shaders") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15924>	2022-04-19 11:43:55 +00:00
Alexey Bozhenko	2d7d907ad1	intel/compiler: fix singleton pointer coverity warning fix brw_kernel::stats member that was declared as a variable but used as a pointer to array of 3 elements CID: 1503279 Signed-off-by: Bozhenko Alexey <oleksii.bozhenko@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15975>	2022-04-19 12:36:10 +03:00
Lionel Landwerlin	04bd007757	intel/fs: require memory fence commit bit on Gfx9 Fixes a hang on Gfx9 GT1 : dEQP-VK.compute.zero_initialize_workgroup_memory.max_workgroup_memory.128 Tested-by: Mark Janes <markjanes@swizzler.org> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15596>	2022-04-17 21:24:17 +00:00

1 2 3 4 5 ...

2110 commits