fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 02:40:11 +01:00

Author	SHA1	Message	Date
Alejandro Piñeiro	4e1f8d82c2	i965/fs: include multisamplers on image_intrinsic_coord_components This is the second patch needed to fix the following piglit tests: tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test Although in this case it doesn't affect so many borrowed tests, as there aren't too many tests using multisamplers on Intel. It is worth to note that this patch is also needed when those tests are run on GLSL mode (using the --glsl option). Although most Intel drivers would not be able to run/execute tests using multisamplers, as GL_MAX_IMAGE_SAMPLES is zero, technically those tests are expected to link correctly, so linking tests should pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-05 17:02:28 +02:00
Alejandro Piñeiro	2a6182fe06	intel/compiler: rename brw_nir_lower_glsl_images To brw_nir_lower_gl_images, as it will be also used on the ARB_gl_spirv codepath, that doesn't involves GLSL at all. So the lowering is about images following the OpenGL semantics. In any case "brw_nir_lower_opengl_images" seemed too long to me, so I just used gl. That shortening is already used on other parts of the code. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-05 17:02:28 +02:00
Jason Ekstrand	67571ae796	intel/compiler: Remove redundant nir_remove_dead_variables call As of `07a2098a70`, brw_nir_optimize calls nir_remove_dead_variables as the last optimization. Doing it again is just pointless. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-09-04 09:03:16 -05:00
Lionel Landwerlin	07a2098a70	intel: compiler: remove dead local variables at optimization pass We're hitting an assert in gfxbench because one of the local variable is a sampler (according to Jason this isn't valid) : testfw_app: ../src/compiler/nir_types.cpp:551: void glsl_get_natural_size_align_bytes(const glsl_type, unsigned int, unsigned int*): Assertion `!"type does not have a natural size"' failed. Since this particular variable isn't used, it can be eliminated by removing unused local variables at the end of the optimization loop. This makes sense also for valid local variables. v2: Move additional local variable removal out of optimization loop, but before large constant removal (Jason/Lionel) v3: Move the removal at the end of brw_nir_optimize() Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107806 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-03 17:24:19 +01:00
Ian Romanick	82530ce1b5	i965/vec4: Clamp indirect tes input array reads with 0x0fffffff Page 190 of "Volume 7: 3D Media GPGPU Engine (Haswell)" says the valid range of the offset is [0, 0FFFFFFFh]. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org	2018-09-01 00:23:45 -07:00
Ian Romanick	75666605c9	i965/vec4: Correctly handle uniform sources in generate_tes_add_indirect_urb_offset Fixes failure in the new piglit test tes-patch-input-array-vec2-index-invalid-rd.shader_test. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org	2018-09-01 00:23:43 -07:00
Rodrigo Vivi	e8c42ed4ab	intel: Introducing Amber Lake platform Amber Lake uses the same gen graphics as Kaby Lake, including a id that were previously marked as reserved on Kaby Lake, but that now is moved to AML page. This follows the ids and approach used on kernel's commit e364672477a1 ("drm/i915/aml: Introducing Amber Lake platform") Reported-by: Timo Aaltonen <timo.aaltonen@canonical.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-31 13:57:52 -07:00
Jason Ekstrand	a0f18f2142	intel/nir: Lowering image loads and stores trashes all metadata This fixes the GL_ARB_fragment_shader_interlock piglit test on gen8 platforms where the lack of metadata dirtying was causing another pass to accidentally delete a much needed loop. https://bugs.freedesktop.org/show_bug.cgi?id=107745 Fixes: `37f7983bcc` "intel/compiler: Do image load/store lowering..." Jason Ekstrand <jason@jlekstrand.net> writes: Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-30 14:06:31 -05:00
Jason Ekstrand	d8033d4083	intel/compiler: Remove surface_idx from brw_image_param Now that the drivers are lowering to surface indices themselves, we no longer need to push the surface index into the shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	3cbc02e469	intel: Use TXS for image_size when we have a typed surface Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	09f1de97a7	anv,i965: Lower away image derefs in the driver Previously, the back-end compiler turn image access into magic uniform reads and there was a complex contract between back-end compiler and driver about setting up and filling out those params. As of this commit, both drivers now lower image_deref_load_param_intel intrinsics to load_uniform intrinsics controlled by the driver and lower the other image_deref_* intrinsics to image_* intrinsics which take an actual binding table index. There are still "magic" uniforms but they are now added and controlled entirely by the driver and that contract no longer spans components. This also has the side-effect of making most image use compile-time binding table indices. Previously, all image access pulled the binding table index from a uniform. Part of the reason for this was that the magic uniforms made it difficult to decouple binding table indices from the uniforms and, since they are indexed completely differently (especially in Vulkan), it was hard to pull them apart. Now that the driver is handling both, it's trivial to decouple the two and provide actual binding table indices. Shader-db results on Kaby Lake: total instructions in shared programs: 15166872 -> 15164293 (-0.02%) instructions in affected programs: 115834 -> 113255 (-2.23%) helped: 191 HURT: 0 total cycles in shared programs: 571311495 -> 571196465 (-0.02%) cycles in affected programs: 4757115 -> 4642085 (-2.42%) helped: 73 HURT: 67 total spills in shared programs: 10951 -> 10926 (-0.23%) spills in affected programs: 742 -> 717 (-3.37%) helped: 7 HURT: 0 total fills in shared programs: 22226 -> 22201 (-0.11%) fills in affected programs: 1146 -> 1121 (-2.18%) helped: 7 HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	3942943819	nir: Use a bitfield for image access qualifiers This commit expands the current memory access enum to contain the extra two bits provided for images. We choose to follow the SPIR-V convention of NonReadable and NonWriteable because readonly implies that you can read so readonly + writeonly doesn't make as much sense as NonReadable + NonWriteable. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	4289143899	intel/compiler: Use two components for 1D array image sizes Having the array length component stored in .z was a small convenience for the ISL image param filling code and an annoyance in the NIR lowering code. The only convenience of treating 1D arrays like 2D arrays in the lowering code is in the address calculation code so let's put all the complexity there as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	37f7983bcc	intel/compiler: Do image load/store lowering to NIR This commit moves our storage image format conversion codegen into NIR instead of doing it in the back-end. This has the advantage of letting us run it through NIR's optimizer which is pretty effective at shrinking things down. In the common case of rgba8, the number of instructions emitted after NIR is done with it is half of what it was with the lowering happening in the back-end. On the downside, the back-end's lowering is able to directly use predicates and the NIR lowering has to use IFs. Shader-db results on Kaby Lake: total instructions in shared programs: 15166910 -> 15166872 (<.01%) instructions in affected programs: 5895 -> 5857 (-0.64%) helped: 15 HURT: 0 Clearly, we don't have that much image_load_store happening in the shaders in shader-db.... Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Ian Romanick	c836326a29	i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1 No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-28 15:35:50 -07:00
Ian Romanick	c856403868	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1 v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. No changes on any other Intel platforms. Skylake total instructions in shared programs: 14304261 -> 14304241 (<.01%) instructions in affected programs: 1625 -> 1605 (-1.23%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -15.91% 4.19% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527531226 -> 527531194 (<.01%) cycles in affected programs: 92204 -> 92172 (-0.03%) helped: 2 HURT: 0 Haswell and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 14615730 -> 14615710 (<.01%) instructions in affected programs: 1838 -> 1818 (-1.09%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -14.59% 3.85% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 15:35:46 -07:00
Ian Romanick	b6e247cf0e	i965/fs: Refactor image atomics to be a bit more like other atomics This greatly simplifies the next patch. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-28 15:35:46 -07:00
Ian Romanick	fabe3ead57	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1 Funny story... a single shader was hurt for instructions, spills, fills. That same shader was also the most helped for cycles. #GPUsAreWeird No changes on any other Intel platform. v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14304116 -> 14304261 (<.01%) instructions in affected programs: 12776 -> 12921 (1.13%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1 helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55% HURT stats (abs) min: 189 max: 189 x̄: 189.00 x̃: 189 HURT stats (rel) min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87% 95% mean confidence interval for instructions value: -12.83 27.33 95% mean confidence interval for instructions %-change: -1.57% 0.31% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527552861 -> 527531226 (<.01%) cycles in affected programs: 1459195 -> 1437560 (-1.48%) helped: 16 HURT: 2 helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6 helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03% HURT stats (abs) min: 12 max: 12 x̄: 12.00 x̃: 12 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -3699.81 1295.92 95% mean confidence interval for cycles %-change: -0.94% 0.30% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8025 -> 8033 (0.10%) spills in affected programs: 208 -> 216 (3.85%) helped: 1 HURT: 1 total fills in shared programs: 10989 -> 11040 (0.46%) fills in affected programs: 444 -> 495 (11.49%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11709181 -> 11709153 (<.01%) instructions in affected programs: 3505 -> 3477 (-0.80%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4 helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61% total cycles in shared programs: 254741126 -> 254738801 (<.01%) cycles in affected programs: 919067 -> 916742 (-0.25%) helped: 3 HURT: 0 helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160 helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03% total spills in shared programs: 4536 -> 4533 (-0.07%) spills in affected programs: 40 -> 37 (-7.50%) helped: 1 HURT: 0 total fills in shared programs: 4819 -> 4813 (-0.12%) fills in affected programs: 94 -> 88 (-6.38%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 15:35:38 -07:00
Ian Romanick	41399f4bc7	intel/compiler: Silence unused parameter warnings in brw_eu.h All of the other brw__desc functions take a devinfo parameter, and all of the others at least have an assert that uses it. Keep the parameter, but mark it as unused. Silences 37 warnings like: In file included from src/intel/common/gen_disasm.c:27:0: src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’: src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_pixel_interp_desc(const struct gen_device_info devinfo, ^~~~~~~ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-28 15:35:38 -07:00
Kevin Rogovin	03ecec9ed2	i965: Add INTEL_fragment_shader_ordering support. Adds suppport for INTEL_fragment_shader_ordering. We achieve the fragment ordering by using the same instruction as for beginInvocationInterlockARB() which is by issuing a memory fence via sendc. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-08-28 17:15:10 +03:00
Sagar Ghuge	a1e3305f75	intel/eu: print bytes instead of 32 bit hex value INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU byte order is reversed. In order to disassemble binary files, print each byte instead of 32 bit hex value. v2: Print blank spaces in order to vertically align output of compacted instructions hex value with uncompacted instructions hex value. (Matt Turner) v3: Fix line wrap at correct length Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-08-27 11:07:39 -07:00
Jason Ekstrand	8d8222461f	intel/nir: Enable nir_opt_find_array_copies We have to be a bit careful with this one because we want it to run in the optimization loop but only in the first brw_nir_optimize call. Later calls assume that we've lowered away copy_deref instructions and we don't want to introduce any more. Shader-db results on Kaby Lake: total instructions in shared programs: 15176942 -> 15176942 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 In spite of the lack of any shader-db improvement, this patch completely eliminates spilling in the Batman: Arkham City tessellation shaders. This is because we are now able to detect that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and use indirect URB reads instead of making a copy of 4.5 KiB of input data and then indirecting on it with if-ladders. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:47:51 -05:00
Jason Ekstrand	a4a9c07549	intel/nir: Use nir_shrink_vec_array_vars Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15176765 (<.01%) instructions in affected programs: 4259 -> 3419 (-19.72%) helped: 1 HURT: 0 total spills in shared programs: 10954 -> 10855 (-0.90%) spills in affected programs: 295 -> 196 (-33.56%) helped: 1 HURT: 0 total fills in shared programs: 22222 -> 22117 (-0.47%) fills in affected programs: 417 -> 312 (-25.18%) helped: 1 HURT: 0 The helped shader is from the OglCSDof synmark test. On my Kaby Lake laptop, the actual framerate of the benchmark didn't appear to improve beyond the noise. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:46:56 -05:00
Jason Ekstrand	02a5442dd7	intel/nir: Use the new structure and array splitting passes We call structure splitting once because it is guaranteed to split all the structures in the entire shader in one go. We call array splitting in the loop in case future optimizations turn indirects into direct dereferences and we can split more arrays. Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15177605 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This is unsurprising because nir_lower_vars_to_ssa already effectively does structure and array splitting internally. It doesn't actually split the variables but it's ability to reason about aliasing in the presence of arrays and structures and pick out scalars or vectors to be lowered to SSA values is fairly advanced. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:44:14 -05:00
Ian Romanick	d515c75463	intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages v2: Split changes to the message type field to another patch. Suggested by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	f347348f8a	intel/compiler: Expand untyped atomic message type field by a bit This is necessary for a new Gen9 message type that will be added in the next patch. There are also Gen8 message types that need the extra bit (mostly for bindless). v2: Split off from the next patch. Suggested by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	d628642a34	intel/compiler: Silence unused parameter warnings src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’: src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr instr, FILE fp) {} ^~~~~ src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr instr, FILE fp) {} ^~ src/intel/compiler/brw_disasm.c: In function ‘src_ia1’: src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter] unsigned _reg_file, ^~~~~~~~~ src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’: src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter] unsigned dims, unsigned size, ^~~~ v2: Update commit message. brw_fs_generator.cpp warnings were already fixed by another patch. Noticed by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Jason Ekstrand	10f44da775	Revert "intel/nir: Call nir_lower_io_to_scalar_early" Commit `4434591bf5` caused substantially more URB messages in geometry and tessellation shaders. Before we can really enable this sort of optimization, We either need some way of combining them back together into vectors or we need to do cross-stage vector element elimination without splitting everything into scalars. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: `4434591bf5` "intel/nir: Call nir_lower_io_to_scalar_early" Acked-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Mark Janes <mark.a.janes@intel.com>	2018-08-15 17:56:50 -05:00
Mathieu Bridon	2ee1c86d71	meson: Build with Python 3 Now that all the build scripts are compatible with both Python 2 and 3, we can flip the switch and tell Meson to use the latter. Since Meson already depends on Python 3 anyway, this means we don't need two different Python stacks to build Mesa. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-10 15:15:09 -07:00
Kenneth Graunke	08a5c395ab	intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5. When the SIMD16 Gen4-5 fragment shader payload contains source depth (g2-3), destination stencil (g4), and destination depth (g5-6), the single register of stencil makes the destination depth unaligned. We were generating this instruction in the RT write payload setup: mov(16) m14<1>F g5<8,8,1>F { align1 compr }; which is illegal, instructions with a source region spanning more than one register need to be aligned to even registers. This is because the hardware implicitly does (nr \| 1) instead of (nr + 1) when splitting the compressed instruction into two mov(8)'s. I believe this would cause the hardware to load g5 twice, replicating subspan 0-1's destination depth to subspan 2-3. This showed up as 2x2 artifact blocks in both TIS-100 and Reicast. Normally, we rely on the register allocator to even-align our virtual GRFs. But we don't control the payload, so we need to lower SIMD widths to make it work. To fix this, we teach lower_simd_width about the restriction, and then call it again after lower_load_payload (which is what generates the offending MOV). Fixes: `8aee87fe4c` (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Diego Viola <diego.viola@gmail.com>	2018-08-09 12:33:41 -07:00
Jordan Justen	8fcdb71d8c	intel/compiler: Add brw_get_compiler_config_value for disk cache During code review, Jason pointed out that: `2b3064c073` "i965, anv: Use INTEL_DEBUG for disk_cache driver flags" Didn't account for INTEL_SCALER_* environment variables. To fix this, let the compiler return the disk_cache driver flags. Another possible fix would be to pull the INTEL_SCALER_* into INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I didn't think it was a good use of 4 more of these bits. (5 since INTEL_PRECISE_TRIG needs to be accounted for as well.) Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-01 23:49:16 -07:00
Jason Ekstrand	b2e0b0dad6	anv/pipeline: More aggressively optimize away color attachments Instead of just looking at the number of color attachments, look at which ones are actually used by the subpass. This lets us potentially throw away chunks of the fragment shader. In DXVK, for example, all subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this is very helpful in that case. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-01 18:02:28 -07:00
Jason Ekstrand	4434591bf5	intel/nir: Call nir_lower_io_to_scalar_early Shader-db results on Kaby Lake: total instructions in shared programs: 15166953 -> 15073611 (-0.62%) instructions in affected programs: 2390284 -> 2296942 (-3.91%) helped: 16469 HURT: 505 total loops in shared programs: 4954 -> 4951 (-0.06%) loops in affected programs: 3 -> 0 helped: 3 HURT: 0 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-01 18:02:28 -07:00
Jason Ekstrand	b0bb547f78	intel/nir: Split IO arrays into elements The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O variables which are arrays or matrices into a sequence of separate variables. This can help link-time optimization by allowing us to remove varyings at a more granular level. Shader-db results on Kaby Lake: total instructions in shared programs: 15177645 -> 15168494 (-0.06%) instructions in affected programs: 79857 -> 70706 (-11.46%) helped: 392 HURT: 0 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-01 18:02:28 -07:00
Jason Ekstrand	57804efa88	i965/fs: Flag all slots of a flat input as flat Otherwise, only the first vec4 of a matrix or other complex type will get marked as flat and we'll interpolate the others. This was caught by a dEQP test which started failing because it did a SSO vs. non-SSO comparison. Previously, we did the interpolation wrong consistently in both versions. However, with one of Tim Arceri's NIR linkingpatches, we started splitting the matrix input into vectors at link time in the non-SSO version and it started getting correctly interpolated which didn't match the broken SSO version. As of this commit, they both get correctly interpolated. Fixes: `e61cc87c75` "i965/fs: Add a flat_inputs field to prog_data" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-01 18:02:28 -07:00
Jason Ekstrand	4e060385e9	intel/nir: Use the correct scalar stage for consumers when linking Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-01 18:02:28 -07:00
Iago Toral Quiroga	471bce5689	intel/compiler: implement 8-bit constant load Fixes VK-GL-CTS CL#2567 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-01 08:08:15 +02:00
Iago Toral Quiroga	7e6c8b0cb7	intel/compiler: add setup_imm_(u)b helpers The hardware doesn't support byte immediates, so similar to setup_imm_df() for doubles, these helpers work by loading the constant value into a VGRF. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-01 08:08:15 +02:00
Iago Toral Quiroga	615aaedb93	intel/compiler: fix lower conversions to account for predication The pass can create a temporary result for the instruction and then moves from it to the original destination, however, if the original instruction was predicated, the mov has to be predicated as well. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>	2018-07-27 14:48:29 +02:00
Kenneth Graunke	488972222c	i965: Combine both gl_PatchVerticesIn lowering passes. Until now, we had separate passes for lowering gl_PatchVerticesIn to a statically known constant (for TES inputs when linked against a TCS), and a uniform in the other cases. Annoyingly, one had to be run before nir_lower_system_values, and the other afterward. This simplified the passes, but made life painful for the callers. This patch combines both into a single pass. If you give it a non-zero static count, it uses that. If you give it Mesa state slots, it turns it back into a built-in uniform. Otherwise, it does nothing. This also moves the i965 uniform lowering out to shared code. v2: Make token arrays const. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-07-26 21:51:36 -07:00
Kenneth Graunke	8794fe3e30	intel/compiler: Delete dead VS intrinsic handling. These are lowered by brw_nir_lower_vs_inputs(). If they weren't, we would have already hit the unreachable() in emit_system_values_block(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-26 11:45:34 -07:00
Karol Herbst	7f95564a22	nir: rename f2f16_undef to f2f16 we need rounding modes on other conversions involving floats and it is easier to rename f2f16_undef than renaming all the other ones. v2: rebased on master Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-07-24 20:40:05 +02:00
Jason Ekstrand	820d5e51b7	intel/compiler: Account for built-in uniforms in analyze_ubo_ranges The original pass only looked for load_uniform intrinsics but there are a number of other places that could end up loading a push constant. One obvious omission was images which always implicitly use a push constant. Legacy VS clip planes also get pushed into the shader. This fixes some new Vulkan CTS tests that test random combinations of bindings and, in particular, test lots of UBOs and images together. Cc: mesa-stable@lists.freedesktop.org Cc: Kenneth Graunke <kenneth@whitecape.org>	2018-07-23 15:28:17 -07:00
Caio Marcelo de Oliveira Filho	4a29ee1861	intel/compiler: fix -Wsign-compare warning Explicitly convert to signed integer. Conversion is valid since is the same (implicitly) used to initialize the loop. Avoids the warning: ../../src/intel/compiler/brw_fs.cpp: In member function ‘bool fs_visitor::lower_simd_width()’: ../../src/intel/compiler/brw_fs.cpp:5761:45: warning: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Wsign-compare] split_inst.eot = inst->eot && i == n - 1; ~~^~~~~~~~ Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2018-07-18 08:29:51 -07:00
Caio Marcelo de Oliveira Filho	7df5f62768	intel/compiler: silence -Wclass-memaccess warnings Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2018-07-18 08:29:51 -07:00
Jose Maria Casanova Crespo	62f37ee53d	i965/fs: unspills shoudn't use grf127 as dest since Gen8+ At `232ed89802` "i965/fs: Register allocator shoudn't use grf127 for sends dest" we didn't take into account the case of SEND instructions that are not send_from_grf. But since Gen7+ although the backend still uses MRFs internally for sends they are finally assigned to a GRFs. In the case of unspills the backend assigns directly as source its destination because it is suppose to be available. So we always have a source-destination overlap. If the reg_allocator assigns registers that include the grf127 we fail the validation rule that affects Gen8+ "r127 must not be used for return address when there is a src and dest overlap in send instruction." So this patch activates the grf127_send_hack_node for Gen8+ and if we have any register spilled we add interferences to the destination of the unspill operations. We also need to avoid that opt_bank_conflicts() optimization, that runs after the register allocation, doesn't move things around, causing the grf127 to be used in the condition we were avoiding. Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test and some shader-db crashed because of the grf127 validation rule.. v2: make sure that opt_bank_conflicts() optimization doesn't change the use of grf127. (Caio) Found by Caio Marcelo de Oliveira Filho Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193 Fixes: `232ed89802` "i965/fs: Register allocator shoudn't use grf127 for sends dest" Cc: 18.1 <mesa-stable@lists.freedesktop.org> Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-07-12 18:02:26 +02:00
Francisco Jerez	18c086a9e6	intel/ir: Uncomment definition of several unused hardware opcodes. There are a number of opcode_desc table entries for many of these unused opcodes. A symbolic opcode enum will be required in a future commit in order to keep them in the opcode description tables. The alternative would be to remove the unused opcodes from the opcode description tables. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-09 23:46:58 -07:00
Francisco Jerez	48d6fc5eb6	intel/fs: Initialize mlen for gen7 varying pull constant load messages. This makes the message length available at the IR level, which should save some guesswork in a future commit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-09 23:46:58 -07:00
Francisco Jerez	6643143f6e	intel/eu: Assert that the instruction is send-like in brw_set_desc_ex(). Constructing a descriptor in-place as part of the immediate of an ALU instruction is no longer supported. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-09 23:46:58 -07:00
Francisco Jerez	6f81e2b994	intel/eu: Get rid of the return value of brw_send_indirect_message(). The return value is not used anymore. This allows simplifying the code slightly, and in addition it should frustrate anybody's attempts to continue using the obsolete piecemeal approach to construct a message descriptor in combination with brw_send_indirect_message(). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-09 23:46:58 -07:00

... 14 15 16 17 18 ...

1355 commits