fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-16 14:08:07 +02:00

Author	SHA1	Message	Date
Lionel Landwerlin	da2d67fc3b	anv: gem-stubs: return a valid fd got anv_gem_userptr() Fixes invalid close(-1) in the unit tests. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-09-25 22:02:51 +03:00
Andres Gomez	5e87f48f1d	i965/fs: set rounding mode when emitting the flrp instruction flrp was forgotten when already adding the rounding mode for other instructions. Fixes: `ba1e25e1aa` ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions") Suggested-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2019-09-24 12:06:59 +03:00
Andres Gomez	6f1468c371	i965/fs: add a comment about how the rounding mode in fmul is set After `1711bf6cf2` ("intel/fs: Generate better code for fsign multiplied by a value"), the conflicts resolution for setting the rounding mode after the fused fmul and fsign optimization is non obvious. Basically, the optimization doesn't really result in a MUL, or any other operation which would need to have the rounding mode set. Hence, we set it just before the actual MUL in the treatment of fmul. Fixes: `ba1e25e1aa` ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions") Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2019-09-24 11:24:15 +03:00
Kenneth Graunke	b9e93db208	intel: Increase Gen11 compute shader scratch IDs to 64. From the MEDIA_VFE_STATE docs: "Starting with this configuration, the Maximum Number of Threads must be set to (#EU * 8) for GPGPU dispatches. Although there are only 7 threads per EU in the configuration, the FFTID is calculated as if there are 8 threads per EU, which in turn requires a larger amount of Scratch Space to be allocated by the driver." It's pretty clear that we need to increase this for scratch address calculations, because the FFTID has a certain bit-pattern. The quote above seems to indicate that we should increase the actual thread count programmed in MEDIA_VFE_STATE as well, but we think the intention is to only bump the scratch space. Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8. Fixes: `5ac804bd9a` ("intel: Add a preliminary device for Ice Lake") Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-09-23 16:59:40 -07:00
Kenneth Graunke	50c0dd8621	Revert "intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM" This reverts commit `729de1488f`. It turns out that, although the register is in the logical context, it isn't whitelisted, so we can't actually write it from userspace batch buffers. The write just becomes a noop, which is why we saw no performance changes. I manually whitelisted it, and still observed no performance gains, but it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments on the iris driver. So we might need to fix something before enabling this. To prevent it randomly getting turned on should the kernel ever whitelist this register, we revert the patch for now.	2019-09-23 16:31:23 -07:00
Kenneth Graunke	8489206e9d	intel/genxml: Stop manually scrubbing 'α' -> "alpha" 'α' has never appeared in any genxml files, so there's no need to replace it with the word "alpha". Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-09-23 20:24:54 +00:00
Kenneth Graunke	aa7ac32976	isl: Drop WaDisableSamplerL2BypassForTextureCompressedFormats on Gen11 Gen11 doesn't require us to bypass the L2 cache for BC* images anymore. The documentation is a bit hard to follow on this point, but the Windows driver clearly only applies this workaround on Gen9, and their commit history indicates that this was an intentional change to drop the workaround for Gen11+. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-09-20 15:35:17 -07:00
Jason Ekstrand	7d861ab812	anv: Advertise VK_KHR_shader_subgroup_extended_types Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-20 18:02:15 +00:00
Jason Ekstrand	03255da225	intel/fs: Do 8-bit subgroup scan operations in 16 bits Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-20 18:02:15 +00:00
Jason Ekstrand	651725f7a1	intel/fs: Allow CLUSTER_BROADCAST to do type conversion We can't really handle it in the little-core 64-bit case but it's not really needed there. Where we really want this is for when we need to do 16 -> 8-bit conversions. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-20 18:02:15 +00:00
Jason Ekstrand	3515c0e9cf	intel/fs: Allow UB, B, and HF types in brw_nir_reduction_op_identity Because byte immediates aren't a thing on GEN hardware, we return a signed or unsigned word immediate in the byte case. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-20 18:02:15 +00:00
Paulo Zanoni	10532c6831	intel/fs: don't forget the stride at generate_shuffle During generate_shuffle(), when we use byte sized registers we end up with a destination stride of 2. We don't take the stride into consideration when selecting the group offset for the last MOV operation, which means we end up moving things to the wrong place, leaving the last few channels untouched. Take the destination stride in consideration so we don't miss the last channels. v2: Assert this is not necessary for the IVB special case (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-20 10:57:05 -07:00
Jason Ekstrand	dae33052db	util/rb_tree: Reverse the order of comparison functions The new order matches that of the comparison functions accepted by the C standard library qsort() functions. Being consistent with qsort will hopefully help avoid developer confusion. The only current user of the red-black tree is aub_mem.c which is pretty easy to fix up. Reviewed-by: Lionel Landwerlin <lionel.g.lndwerlin@intel.com>	2019-09-20 17:37:25 +00:00
Eric Engestrom	3c1a24de07	anv: implement ICD interface v4 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-09-20 08:31:58 +00:00
Eric Engestrom	19db95e78e	anv: split instance dispatch table This effectively breaks the instance dispatch table in 2 with entry points using a physical device as first argument getting their own dispatch table. As a result we now have to check instance & physical device dispatch table instead of just the instance dispatch table before. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-09-20 08:31:58 +00:00
Jason Ekstrand	0c4e89ad5b	Move blob from compiler/ to util/ There's nothing whatsoever compiler-specific about it other than that's currently where it's used. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-19 19:56:22 +00:00
Caio Marcelo de Oliveira Filho	fa080f03d3	intel/fs: Add Fall-through comment Reviewed-by: Andres Gomez <agomez@igalia.com>	2019-09-19 10:02:16 -07:00
Arcady Goldmints-Orlov	5ec5fecc26	anv: fix descriptor limits on gen8 Later generations support bindless for samplers, images, and buffers and thus per-stage descriptors are not limited by the binding table size. However, gen8 doesn't support bindless images and thus needs to report a lower per-stage limit so that all combinations of descriptors that fit within the advertised limits are reported as supported by vkGetDescriptorSetLayoutSupport. Fixes test dEQP-VK.api.maintenance3_check.descriptor_set Fixes: `79fb0d27f3` ("anv: Implement SSBOs bindings with GPU addresses in the descriptor BO") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-19 09:10:40 -05:00
Paulo Zanoni	8e614c7a29	intel/fs: fix SHADER_OPCODE_CLUSTER_BROADCAST for SIMD32 The current code can create functions with a width of 32, which is not supported by our hardware. Add some code to simplify how we express what we want and prevent such cases. For some unknown reason, all the tests I could run seem to work even with these unsupported MOVs. Fixes: `b0858c1cc6` "intel/fs: Add a couple of simple helper opcodes" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:48:27 +00:00
Paulo Zanoni	c99df52873	intel/fs: the maximum supported stride width is 16 There are cases where we try to generate registers with a stride of 32, while the hardware maximum is just 16. This happens, for example, when using 8 bit integers on SIMD32. This results in a crash because the variable 'width' has a value of 32: ../../src/intel/compiler/brw_reg.h:550: brw_reg brw_vecn_reg(unsigned int, brw_reg_file, unsigned int, unsigned int): Assertion `!"Invalid register width"' failed. This change prevents the crash and makes the tests pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:48:27 +00:00
Paulo Zanoni	cebf447d16	intel/fs: roll the loop with the <0,1,0> additions in emit_scan() IMHO the code is easier to understand this way, being explicit that we're doing exactly the same thing every time. No functional changes. v2: Adjust the loop breaking condition (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:47:17 +00:00
Paulo Zanoni	d9ddf5076d	intel/fs: make scan/reduce work with SIMD32 when it fits 2 registers When dealing with uint16_t and uint8_t on SIMD32 we can do all the operations using just 2 registers, so we don't hit the recursion at the beginning of emit_scan(). Because of that, we need to actually compute scan/reduce for channels 31:16. v2: Still missed instructions (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-09-19 02:47:17 +00:00
Kenneth Graunke	0e4a75f917	intel/compiler: Record whether any pull constant loads occur I would like for iris to be able to avoid setting up SURFACE_STATE for UBOs in the common case where all constants are pushed. Unfortunately, we don't know up front whether everything will be pushed: the backend is allowed to demote pushed UBOs to pull loads fairly late in the process. This is probably desirable though, as we'd like the backend to be able to re-pull pushed data to break up long live ranges in response to register pressure. Here we simply add a "are there any pull loads at all" boolean to prog_data, which is a bit crude but at least allows us to skip work in the common "everything pushed" case. We could skip more work by tracking exactly which UBO surfaces are pulled in a bitmask, but I wanted to avoid bringing back the old mark_surface_used() mechanism. Finer-grained tracking could allow us to skip a bit more work when multiple UBOs are in use and /some/ are 100% pushed, but others are accessed via pulls. However, I'm not sure how common this is and it would save at most 4 pull descriptors, so we defer that for now. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-18 15:44:22 -07:00
Kenneth Graunke	f76a724e06	intel/compiler: Set "Null Render Target" ex_desc bit on Gen11 When there are no color regions (i.e. a depth only pass), we can set the "Null Render Target" bit in the Gen11 RT write extended message descriptor to indicate that it should behave as if it's writing to a null render target, without the need for a binding table entry. This lets drivers avoid setting up that null RT binding table entry, but more importantly means the HW doesn't actually have to bother looking up the surface state. Together with the next patch, this improves performance in Car Chase on an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 14:27:51 -07:00
Samuel Iglesias Gonsálvez	f5dd6dfe01	anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls This adds support for VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT_CONTROLS_PROPERTIES_KHR and enables de Vulkan and SPIR-V extensions. Also, notice that this includes the updates applied to the VkPhysicalDeviceFloatControlsPropertiesKHR structure in the extension VK_KHR_shader_float_controls v4 and Vulkan 1.1.116. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez	9b07020a4f	i965/fs: add support for shader float control to remove_extra_rounding_modes() The remove_extra_rounding_modes() optimization will remove duplicated rounding mode changes. v2: - Fix bug in the rounding mode change (Alejandro). v3: - Fix rounding modes. v4: - Updated to renamed shader info member and enum values (Andres). v5: - Simplify flags logic operations (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez	9bd88d10d8	i965/fs: set rounding mode when emitting nir_op_f2f32 or nir_op_f2f16 v2: - Consider nir_op_f2f16 case too (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez	ba1e25e1aa	i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions v2: - Updated to renamed shader info member (Andres). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez	9da56ffc52	i965/fs: add emit_shader_float_controls_execution_mode() and aux functions We need this function to emit code that setups the control register later with the defined execution mode for the shader. Therefore, we emit it as the first instruction. v2: - Fix bug in setting the default mode mask in brw_rnd_mode_from_nir(). - Fix support for rounding modes in brw_rnd_mode_from_nir(). v3: - Updated to renamed shader info member and enum values (Andres). v4: - Add actual emission as first instruction of emit_nir_code (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez	8a6507b6fe	i965/fs/generator: add new opcode to set float controls modes in control register Before this commit, we had only FPRoundingMode decoration (the per instruction one) that is applied during the SPIR-V handling. In vtn_alu we find out the rounding mode, and generate the code accordingly that later will be used to look for the respective nir_op_f2f16_{rtz,rtne}. Per-instruction gets prioritized because we make them explicit conversions (with RTZ or RTNE nir opcodes) and they will override the default execution mode defined with float controls. However, we need to come back to the mode defined by float controls after the execution of the FP Rounding instruction. Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be used to set the default rounding mode and denorms treatment in the whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be used as prioritized rounding mode in a per-instruction basis. v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper and the new opcode (this one) (Caio). v5: - Add an explanation on the actual purpose and priority of the newly introduced opcode in the commit log (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez	28da9558f5	i965/fs/generator: refactor rounding mode helper in preparation for float controls v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper (this one) and the new opcode (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez	cdace5b0c6	i965/fs/nir: add nir_op_unpack_half_2x16_split_*_flush_to_zero The denorm mode is set in the control register, no need to do something else. v2: - Add an assert to make sure that we realize if this assumption is broken in the future (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez	3c474f8513	intel/nir: do not apply the fsin and fcos trig workarounds for consts If we have fsin or fcos trigonometric operations with constant values as inputs, we will multiply the result by 0.99997 in brw_nir_apply_trig_workarounds, making the result wrong. Adjusting the rules so they do not apply to const values we let a later constant fold to deal with it. v2: - Do not early constant fold but only apply the trig workaround for non constants (Caio). - Add fixes tag to commit log (Caio). Fixes: `bfd17c76c1` "i965: Port INTEL_PRECISE_TRIG=1 to NIR." Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-09-17 23:39:18 +03:00
Sergii Romantsov	2bfcf04345	nir/large_constants: pass after lowering copy_deref v2: by J.Ekstrand suggestion moved lowering of large constants after lowering of copy_deref is done. CC: Jason Ekstrand <jason@jlekstrand.net> CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450 Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>	2019-09-16 11:23:48 +00:00
Lionel Landwerlin	0616b7ac90	vulkan: add vk_x11_strict_image_count option This option strictly allocate the minImageCount given by the application at swapchain creation. This works around application that do not deal with the fact that the implementation allocates more images than the minimum specified. v2: Add values in default drirc (Bas) v3: specify engine name/version (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Cc: 19.2 <mesa-stable@lists.freedesktop.org>	2019-09-15 15:37:02 +03:00
Lionel Landwerlin	04dc6074cf	driconfig: add a new engine name/version parameter Vulkan applications can register with the following structure : typedef struct VkApplicationInfo { VkStructureType sType; const void* pNext; const char* pApplicationName; uint32_t applicationVersion; const char* pEngineName; uint32_t engineVersion; uint32_t apiVersion; } VkApplicationInfo; This enables the Vulkan implementations to apply workarounds based off matching this description. Here we add a new parameter for matching the driconfig options with the following : <device driver="anv"> <application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42"> <option name="blaaah" value="true" /> </application> </device> v2: switch engine name match to use regexps v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric) v4: Add missing bit that went to the following commit (Eric) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: 19.2 <mesa-stable@lists.freedesktop.org>	2019-09-15 15:37:02 +03:00
Jason Ekstrand	acfa2340e6	intel/fs: Handle UNDEF in split_virtual_grfs When the UNDEF instruction was added, we didn't do anything special in split_virtual_grfs. This mean that anything with an UNDEF wasn't getting split which causes problems for the compiler. Among other things, it makes RA harder because things are in bigger chunks. It also meant that dvec4s weren't getting split which means that they are larger than the maximum register size. Shader-db results on Kaby Lake: total instructions in shared programs: 14959202 -> 14960035 (<.01%) instructions in affected programs: 96197 -> 97030 (0.87%) helped: 140 HURT: 128 helped stats (abs) min: 1 max: 17 x̄: 1.62 x̃: 1 helped stats (rel) min: 0.09% max: 6.15% x̄: 0.65% x̃: 0.45% HURT stats (abs) min: 1 max: 825 x̄: 8.28 x̃: 1 HURT stats (rel) min: 0.13% max: 139.83% x̄: 1.70% x̃: 0.50% 95% mean confidence interval for instructions value: -2.96 9.18 95% mean confidence interval for instructions %-change: -0.56% 1.51% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4372 -> 4372 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 352646771 -> 352840997 (0.06%) cycles in affected programs: 218600800 -> 218795026 (0.09%) helped: 21167 HURT: 21411 helped stats (abs) min: 1 max: 2924 x̄: 36.89 x̃: 10 helped stats (rel) min: <.01% max: 41.90% x̄: 2.97% x̃: 0.98% HURT stats (abs) min: 1 max: 26027 x̄: 45.54 x̃: 10 HURT stats (rel) min: <.01% max: 324.46% x̄: 3.88% x̃: 1.06% 95% mean confidence interval for cycles value: 2.87 6.26 95% mean confidence interval for cycles %-change: 0.40% 0.55% Cycles are HURT. total spills in shared programs: 8840 -> 8953 (1.28%) spills in affected programs: 126 -> 239 (89.68%) helped: 1 HURT: 2 total fills in shared programs: 21782 -> 21914 (0.61%) fills in affected programs: 431 -> 563 (30.63%) helped: 1 HURT: 3 LOST: 0 GAINED: 5 Shader-db results on Haswell: total instructions in shared programs: 13320918 -> 13320769 (<.01%) instructions in affected programs: 40998 -> 40849 (-0.36%) helped: 146 HURT: 56 helped stats (abs) min: 1 max: 8 x̄: 2.73 x̃: 2 helped stats (rel) min: 0.16% max: 8.60% x̄: 2.52% x̃: 2.22% HURT stats (abs) min: 2 max: 23 x̄: 4.45 x̃: 4 HURT stats (rel) min: 0.21% max: 10.26% x̄: 6.83% x̃: 10.26% 95% mean confidence interval for instructions value: -1.26 -0.21 95% mean confidence interval for instructions %-change: -0.62% 0.77% Inconclusive result (%-change mean confidence interval includes 0). total loops in shared programs: 4373 -> 4373 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 374518258 -> 374384193 (-0.04%) cycles in affected programs: 231101954 -> 230967889 (-0.06%) helped: 21427 HURT: 19438 helped stats (abs) min: 1 max: 2035 x̄: 31.09 x̃: 8 helped stats (rel) min: <.01% max: 40.95% x̄: 2.42% x̃: 0.86% HURT stats (abs) min: 1 max: 20875 x̄: 27.38 x̃: 8 HURT stats (rel) min: <.01% max: 59.09% x̄: 2.49% x̃: 0.80% 95% mean confidence interval for cycles value: -4.49 -2.07 95% mean confidence interval for cycles %-change: -0.14% -0.04% Cycles are helped. total spills in shared programs: 23406 -> 23411 (0.02%) spills in affected programs: 3 -> 8 (166.67%) helped: 0 HURT: 2 total fills in shared programs: 34845 -> 34850 (0.01%) fills in affected programs: 3 -> 8 (166.67%) helped: 0 HURT: 2 LOST: 0 GAINED: 0 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111566 Fixes: `f4ef34f207` "intel/fs: Add an UNDEF instruction to avoid..." Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-09-13 04:12:24 +00:00
Anuj Phogat	729de1488f	intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM Initial benchmarking didn't show any performance benefits. But it might eventually. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-09-11 11:29:37 -07:00
Anuj Phogat	ee2bde5232	genxml/gen11+: Add COMMON_SLICE_CHICKEN4 register Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-09-11 11:29:37 -07:00
Mauro Rossi	ae5ac26dfa	android: anv: libmesa_vulkan_common: add libmesa_util static dependency Change needed to fix the following building error: In file included from external/mesa/src/intel/vulkan/anv_device.c:43: external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found ^~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `4dcb1ff` ("anv: add support for driconf") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-09-08 20:07:56 +02:00
Jason Ekstrand	34541be7b0	intel/blorp: Use wide formats for nicely aligned stencil clears In the case where the stencil clear is nicely aligned, we can clear stencil much more efficiently by mapping it as a wide format (say RGBA32_UINT) and blasting out the stencil clear value with a repclear. On Unigine Heaven, this makes one stencil clear go from non-trivial to unnoticeable when looking at per-draw timings. In order for this change to work properly, ANV needs to do a bit more flushing around depth and stencil clears. i965 and iris already have the cache tracking logic to handle this so no changes are required there. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-06 23:35:09 +00:00
Jason Ekstrand	d62ca48c31	intel/blorp: Expose surf_fake_interleaved_msaa internally	2019-09-06 23:35:09 +00:00
Jason Ekstrand	caa786e029	intel/blorp: Expose surf_retile_w_to_y internally Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-06 23:35:09 +00:00
Jason Ekstrand	a90b1cbe73	blorp: Memset surface info to zero when initializing it This isn't known to fix any current bugs but it does prevent a regression in a subsequent commit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-06 23:35:09 +00:00
Jason Ekstrand	c15b197d74	intel/tools: Decode PS kernels on SNB Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-06 23:35:09 +00:00
Jason Ekstrand	7f5cb5fd6d	intel/tools: Decode 3DSTATE_BINDING_TABLE_POINTERS on SNB Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-06 23:35:09 +00:00
Eric Engestrom	037b5b567f	anv: add support for vk_x11_override_min_image_count Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-06 23:16:05 +01:00
Eric Engestrom	4dcb1fff19	anv: add support for driconf No option is supported yet, this is just the boilerplate. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-09-06 23:16:05 +01:00
Jordan Justen	9790cfcefa	anv,iris: L3ALLOC register replaces L3CNTLREG for gen12 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-09-06 13:11:25 -07:00
Anuj Phogat	414cae0fd6	intel/gen12: Add L3 configurations Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-09-06 13:11:22 -07:00

1 2 3 4 5 ...

4610 commits