fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-02-03 17:20:26 +01:00

Author	SHA1	Message	Date
Rafael Antognolli	7728720f07	anv: Make blorp update the clear color. Instead of updating the clear color in anv before a resolve, just let blorp handle that for us during fast clears. v5: Update comment about HiZ clear color (Jordan). Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	e8cadb673d	anv: Use clear address for HiZ fast clears too. Store the default clear address for HiZ fast clears on a global bo, and point to it when needed. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	021e1885d0	anv: Emit the fast clear color address, instead of value. On Gen10+, instead of copying the clear color from the state buffer to the surface state, just use the address of the state buffer in the surface state directly. This way we can avoid the copy from state buffer to surface state. v4: - Remove use_clear_address from anv code. (Jason) - Use the helper to extract clear color from attachment (Jason) Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	3f96b459f4	anv: Add a helper to extract clear color from the attachment. Extract the code from color_attachment_compute_aux_usage, so we can later reuse it to update the clear color state buffer. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	14260e7c60	intel/blorp: Update clear color state buffer during fast clears. We always want to update the fast clear color during a fast clear on i965. On anv, we are doing that before a resolve, but by adding support to blorp, we can do a similar thing and update it during a fast clear instead. The goal is to remove some code from anv that does such update, and centralize everything in blorp, hopefully removing a lot of code duplication. It also allows us to have a similar behavior on gen < 9 and gen >= 10. v5: s/we/we are/ (Jordan) Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	92eb5bbc68	intel/blorp: Only copy clear color when doing a resolve. We only need to copy the clear color from the state buffer to the inlined surface state when doing a resolve. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	188a473b9a	intel/blorp: Add support for fast clear address. On gen10+, if surface->clear_color_addr is present, use it directly intead of copying it to the surface state. v4: Remove redundant #if clause for GEN <= 10 (Jason) v5: Move flush after the reloc, and keep lower bits (Topi). Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	b8f45cf967	intel/isl: Add support to emit clear value address. gen10 can emit the clear color by setting it on a buffer somewhere, and then adding only the address to the surface state. This commit add support for that on isl_surf_fill_state, and if that is requested, skip setting the clear value itself. v2: Add assert to make sure we are at least on gen10. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	94675edcfd	intel: Use Clear Color struct size. The size of the clear color struct (expected by the hardware) is 8 dwords (isl_dev.ss.clear_value_state_size here). But we still need to track the size of the clear color, used when memcopying it to/from the state buffer. For that we keep isl_dev.ss.clear_value_size. v4: - Add struct to gen11 too (Jason, Jordan) - Add field for Converted Clear Color to gen11 (Jason) - Add clear_color_state_offset to differentiate from clear_value_offset. - Fix all the places where clear_value_size was used. v5 (Jason): - Split genxml changes to another commit. - Remove unnecessary gen checks. - Bring back missing offset increment to init_fast_clear_color(). v6 (Jason): - On init_fast_clear_color, change: addr.offset += 4 => sdi.Address.offset += i * 4 - Use GEN_GEN instead of GEN_VERSIONx10. [jordan.l.justen@intel.com: isl_device_init changes] Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	f77789a3f0	intel/genxml: Add Clear Color struct to gen10+. v5: Split genxml changes into its own commit (Jason). Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	7e616ae201	intel/genxml: Use a single field for clear color address on gen10. genxml does not support having two address fields with different names but same position in the state struct. Both "Clear Color Address" and "Clear Depth Address Low" mean the same thing, only for different surface types. To workaround this genxml limitation, rename "Clear Color Address" to "Clear Value Address" and use it for both color and depth. Do the same for the high bits. TODO: add support for multiple addresses at the same position in the xml. v2: Combine high and low order bits into a single address field. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	8e1f2e1d2d	genxml: Preserve fields that share dword space with addresses. Some instructions contain fields that are either an address or a value of some type based on the content of other fields, such as clear color values vs address. That works fine if these fields are in the less significant dword, the lower 32 bits of the address, because they get OR'ed with the address. But if they are in the higher 32 bits, they get discarded. On Gen10 we have fields that share space with the higher 16 bits of the address too. This commit makes sure those fields don't get discarded. v5: Remove spurious whitespace (Jason). Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-04-05 07:42:45 -07:00
Rafael Antognolli	f421a31637	anv/image: Do not override lower bits of dword. The lower bits seem to have extra fields in every platform but gen8 (even though we don't use them in gen9). So just go ahead and avoid using them for the address. v4: Use Jason's suggestion for comment explaining the change. v5: Fix aux_address comment in anv_private.h (Jason) Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2018-04-05 07:42:45 -07:00
Lionel Landwerlin	1beb80cb56	intel: compiler: silence compiler warning ../src/intel/compiler/brw_reg.h: In function ‘bool brw_regs_negative_equal(const brw_reg, const brw_reg)’: ../src/intel/compiler/brw_reg.h:305:1: warning: control reaches end of non-void function [-Wreturn-type] Introduced by `8f83eea71e` ("i965: Add negative_equals methods"). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-04-04 11:57:39 +01:00
Kevin Strasser	5bbde9b80f	anv: Fix close(fd) before import issue in vkCreateDmaBufImageINTEL If we close the fd before calling DRM_IOCTL_PRIME_FD_TO_HANDLE the kernel will hit a -EBADF error. Move the close(fd) call to the end of anv_CreateDmaBufImageINTEL(). Signed-off-by: Kevin Strasser <kevin.strasser@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-04-03 18:33:17 -07:00
Lionel Landwerlin	78c18d99dc	intel: gen-decoder: print all dword a field belongs to Prior to printing a decoded field, print out all dwords that field belongs to. In particular with address fields spanning multiple dwords, we want to have all the dwords presented before the field is decoded to make it easier to read. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-04-03 16:55:53 +01:00
Lionel Landwerlin	4d59127213	intel: genxml: decode variable length MI_LRI MI_LOAD_REGISTER_IMM can load multiple (register, value) tuples in one command. In our drivers we only use one tuple at a time, but the kernel might load more than one at a time. Instead of making all the tuple part of a group, we leave out the first tuple (the one we use in the generated packing structures). This is particularly useful for looking at error stats generated by the kernel. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-04-03 16:55:53 +01:00
Lionel Landwerlin	2841af6238	intel: gen-decoder: don't decode fields beyond a dword length For example, a PIPE_CONTROL with DWordLength = 2 should look like this : 0xffffe374: 0x7a000002: PIPE_CONTROL 0xffffe374: 0x7a000002 : Dword 0 DWord Length: 2 0xffffe378: 0x00800000 : Dword 1 Depth Cache Flush Enable: false Stall At Pixel Scoreboard: false State Cache Invalidation Enable: false Constant Cache Invalidation Enable: false VF Cache Invalidation Enable: false DC Flush Enable: false Pipe Control Flush Enable: false Notify Enable: false Indirect State Pointers Disable: false Texture Cache Invalidation Enable: false Instruction Cache Invalidate Enable: false Render Target Cache Flush Enable: false Depth Stall Enable: false Post Sync Operation: 0 (No Write) Generic Media State Clear: false TLB Invalidate: false Global Snapshot Count Reset: false Command Streamer Stall Enable: false Store Data Index: 0 LRI Post Sync Operation: 1 (MMIO Write Immediate Data) Destination Address Type: 0 (PPGTT) Flush LLC: false 0xffffe37c: 0x00000000 : Dword 2 Address: 0x00000000 0xffffe384: 0x05000000: MI_BATCH_BUFFER_END Prior to this change, fields beyond the length of the command would be decoded (notice the MI_BATCH_BUFFER_END decoded as part of the previous PIPE_CONTROL) : 0xffffe374: 0x7a000002: PIPE_CONTROL 0xffffe374: 0x7a000002 : Dword 0 DWord Length: 2 0xffffe378: 0x00800000 : Dword 1 Depth Cache Flush Enable: false Stall At Pixel Scoreboard: false State Cache Invalidation Enable: false Constant Cache Invalidation Enable: false VF Cache Invalidation Enable: false DC Flush Enable: false Pipe Control Flush Enable: false Notify Enable: false Indirect State Pointers Disable: false Texture Cache Invalidation Enable: false Instruction Cache Invalidate Enable: false Render Target Cache Flush Enable: false Depth Stall Enable: false Post Sync Operation: 0 (No Write) Generic Media State Clear: false TLB Invalidate: false Global Snapshot Count Reset: false Command Streamer Stall Enable: false Store Data Index: 0 LRI Post Sync Operation: 1 (MMIO Write Immediate Data) Destination Address Type: 0 (PPGTT) Flush LLC: false 0xffffe37c: 0x00000000 : Dword 2 Address: 0x00000000 0xffffe380: 0x00000000 : Dword 3 0xffffe384: 0x05000000 : Dword 4 Immediate Data: 83886080 0xffffe384: 0x05000000: MI_BATCH_BUFFER_END Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-04-03 16:55:53 +01:00
Lionel Landwerlin	81375516b2	intel: error_decode: add an option to decode all buffers The kernel reports workaround batch buffers, but we're not presenting them currently. Also they might not be useful for debugging purely userspace driver issues, when problems arise because of interactions between kernel & userspace drivers, it's nice to be able to decode them. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-04-03 16:55:53 +01:00
Lionel Landwerlin	b3aa18dfd6	intel: genxml: add preemption control instructions Helpful to debug kernel workaround batchbuffers. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2018-04-03 16:55:53 +01:00
Rob Clark	51888bf07d	nir+drivers: add helpers to get # of src/dest components Add helpers to get the number of src/dest components for an intrinsic, and update spots that were open-coding this logic to use the helpers instead. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-04-03 06:08:56 -04:00
Iago Toral Quiroga	31881079af	anv/cmd_buffer: honor pending clear views for depth/stencil attachments v2: rebased on top of subpass rework. v3: rebased v4: - rebased - reset pending clear views in one go rather one bit at a time (Caio) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-04-02 09:53:24 +02:00
Iago Toral Quiroga	f60c5fc17e	anv/cmd_buffer: consider multiview masks for tracking pending clear aspects When multiview is active a subpass clear may only clear a subset of the attachment layers. Other subpasses in the same render pass may also clear too and we want to honor those clears as well, however, we need to ensure that we only clear a layer once, on the first subpass that uses a particular layer (view) of a given attachment. This means that when we check if a subpass attachment needs to be cleared we need to check if all the layers used by that subpass (as indicated by its view_mask) have already been cleared in previous subpasses or not, in which case, we must clear any pending layers used by the subpass, and only those pending. v2: - track pending clear views in the attachment state (Jason) - rebased on top of fast-clear rework. v3: - rebased on top of subpass rework. v4: rebased. v5 (Caio): - Rebased. - Initialize pending clear views to only have bits set for layers that exist. - Reset pending clear views in one go rather one bit at a time. - Put "last subpass for this attachment" condition in a separate function to simplify the conditional that resets pending_clear_aspects. Fixes: dEQP-VK.multiview.readback_implicit_clear.* Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-04-02 09:53:15 +02:00
Jason Ekstrand	2b977989f3	intel/vec4: Set channel_sizes for MOV_INDIRECT sources Otherwise, any indirect push constant access results in an assertion failure when we start digging through the channel_sizes array. This fixes dEQP-VK.pipeline.push_constant.graphics_pipeline.dynamic_index_vert on Haswell. It should be a harmless no-op for GL since indirect push constants aren't used there. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `e69e5c7006` "i965/vec4: load dvec3/4 uniforms first in the..."	2018-03-30 17:20:27 -07:00
Ian Romanick	22fbb5c594	util: Add and use util_is_power_of_two_nonzero Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2018-03-29 14:09:28 -07:00
Ian Romanick	d76c204d05	util: Move util_is_power_of_two to bitscan.h and rename to util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-03-29 14:09:23 -07:00
Dylan Baker	2cfc68d984	autotools: Include intel/dev/meson.build in tarball Fixes: `272bef0601` ("intel: Split gen_device_info out into libintel_dev") Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2018-03-28 10:19:05 -07:00
Jason Ekstrand	7e38f49a8f	intel/fs: Don't emit a des copy for image ops with has_dest == false This was causing us to walk dest_components times over a thing with no destination. This happened to work because all of the image intrinsics without a destination also happened to have dest_components == 0. We shouldn't be reading dest_components if has_dest == false. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-03-27 18:18:21 -07:00
Rafael Antognolli	27581d18bc	intel/aubinator_error_decode: Decode more registers. Decode SC_INSTDONE, ROW_INSTDONE and SAMPLER_INSTDONE. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-26 09:25:57 -07:00
Rafael Antognolli	70d7c70e8d	intel/genxml: Add SAMPLER_INSTDONE register. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-26 09:25:57 -07:00
Rafael Antognolli	227edf05f3	intel/genxml: Add ROW_INSTDONE register. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-26 09:25:57 -07:00
Rafael Antognolli	4c0ae36143	intel/genxml: Add SC_INSTDONE register. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-26 09:25:57 -07:00
Ian Romanick	91225cb33f	i965/vec4: Fix null destination register in 3-source instructions A recent commit (see below) triggered some cases where conditional modifier propagation and dead code elimination would cause a MAD instruction like the following to be generated: mad.l.f0 null, ... Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases like this in the scalar backend. This commit basically ports that code to the vec4 backend. NOTE: I have sent a couple tests to the piglit list that reproduce this bug without the commit mentioned below. This commit fixes those tests. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Tapani Pälli <tapani.palli@intel.com> Cc: mesa-stable@lists.freedesktop.org Fixes: `ee63933a7` ("nir: Distribute binary operations with constants into bcsel") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704	2018-03-26 08:50:44 -07:00
Ian Romanick	cd635d149b	i965/vec4: Propagate conditional modifiers from compares to adds No changes on Broadwell or later as those platforms do not use the vec4 backend. Ivy Bridge and Haswell had similar results. (Ivy Bridge shown) total instructions in shared programs: 11682119 -> 11681056 (<.01%) instructions in affected programs: 150403 -> 149340 (-0.71%) helped: 950 HURT: 0 helped stats (abs) min: 1 max: 16 x̄: 1.12 x̃: 1 helped stats (rel) min: 0.23% max: 2.78% x̄: 0.82% x̃: 0.71% 95% mean confidence interval for instructions value: -1.19 -1.04 95% mean confidence interval for instructions %-change: -0.84% -0.79% Instructions are helped. total cycles in shared programs: 257495842 -> 257495238 (<.01%) cycles in affected programs: 270302 -> 269698 (-0.22%) helped: 271 HURT: 13 helped stats (abs) min: 2 max: 14 x̄: 2.42 x̃: 2 helped stats (rel) min: 0.06% max: 1.13% x̄: 0.32% x̃: 0.28% HURT stats (abs) min: 2 max: 12 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.15% max: 1.18% x̄: 0.30% x̃: 0.26% 95% mean confidence interval for cycles value: -2.41 -1.84 95% mean confidence interval for cycles %-change: -0.31% -0.26% Cycles are helped. Sandy Bridge total instructions in shared programs: 10430493 -> 10429727 (<.01%) instructions in affected programs: 120860 -> 120094 (-0.63%) helped: 766 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.30% max: 2.70% x̄: 0.78% x̃: 0.73% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.80% -0.75% Instructions are helped. total cycles in shared programs: 146138718 -> 146138446 (<.01%) cycles in affected programs: 244114 -> 243842 (-0.11%) helped: 132 HURT: 0 helped stats (abs) min: 2 max: 4 x̄: 2.06 x̃: 2 helped stats (rel) min: 0.03% max: 0.43% x̄: 0.16% x̃: 0.19% 95% mean confidence interval for cycles value: -2.12 -2.00 95% mean confidence interval for cycles %-change: -0.18% -0.15% Cycles are helped. GM45 and Iron Lake had identical results. (Iron Lake shown) total instructions in shared programs: 7780251 -> 7780248 (<.01%) instructions in affected programs: 175 -> 172 (-1.71%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.49% max: 2.44% x̄: 1.81% x̃: 1.49% total cycles in shared programs: 177851584 -> 177851578 (<.01%) cycles in affected programs: 9796 -> 9790 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.05% max: 0.08% x̄: 0.06% x̃: 0.05% Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-03-26 08:50:43 -07:00
Ian Romanick	780f307ba8	i965/vec4: Allow cmod propagation when src0 is a uniform or shader input No shader-db changes. This source must have been written by a previous instruction, so it cannot be a uniform or a shader input. However, this change allows the next commit to help more shaders. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-03-26 08:50:43 -07:00
Ian Romanick	020b0055e7	i965/fs: Propagate conditional modifiers from compares to adds The math inside the add and the cmp in this instruction sequence is the same. We can utilize this to eliminate the compare. add(8) g5<1>F g2<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; cmp.z.f0(8) null<1>F g2<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g8<1>F (abs)g5<8,8,1>F 3e-37F { align1 1Q }; This is reduced to: add.z.f0(8) g5<1>F g2<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; (-f0) sel(8) g8<1>F (abs)g5<8,8,1>F 3e-37F { align1 1Q }; This optimization pass could do even better. The nature of converting vectorized code from the GLSL front end to scalar code in NIR results in sequences like: add(8) g7<1>F g4<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; add(8) g6<1>F g3<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; add(8) g5<1>F g2<8,8,1>F g64.5<0,1,0>F { align1 1Q compacted }; cmp.z.f0(8) null<1>F g2<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g8<1>F (abs)g5<8,8,1>F 3e-37F { align1 1Q }; cmp.z.f0(8) null<1>F g3<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g10<1>F (abs)g6<8,8,1>F 3e-37F { align1 1Q }; cmp.z.f0(8) null<1>F g4<8,8,1>F -g64.5<0,1,0>F { align1 1Q switch }; (-f0) sel(8) g12<1>F (abs)g7<8,8,1>F 3e-37F { align1 1Q }; In this sequence, only the first cmp.z is removed. With different scheduling, all 3 could get removed. Skylake total instructions in shared programs: 14407009 -> 14400173 (-0.05%) instructions in affected programs: 1307274 -> 1300438 (-0.52%) helped: 4880 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.03% max: 8.70% x̄: 0.70% x̃: 0.52% 95% mean confidence interval for instructions value: -1.45 -1.35 95% mean confidence interval for instructions %-change: -0.72% -0.69% Instructions are helped. total cycles in shared programs: 532943169 -> 532923528 (<.01%) cycles in affected programs: 14065798 -> 14046157 (-0.14%) helped: 2703 HURT: 339 helped stats (abs) min: 1 max: 1062 x̄: 12.27 x̃: 2 helped stats (rel) min: <.01% max: 28.72% x̄: 0.38% x̃: 0.21% HURT stats (abs) min: 1 max: 739 x̄: 39.86 x̃: 12 HURT stats (rel) min: 0.02% max: 27.69% x̄: 1.38% x̃: 0.41% 95% mean confidence interval for cycles value: -8.66 -4.26 95% mean confidence interval for cycles %-change: -0.24% -0.14% Cycles are helped. LOST: 0 GAINED: 1 Broadwell total instructions in shared programs: 14719636 -> 14712949 (-0.05%) instructions in affected programs: 1288188 -> 1281501 (-0.52%) helped: 4845 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.38 x̃: 1 helped stats (rel) min: 0.03% max: 8.00% x̄: 0.70% x̃: 0.52% 95% mean confidence interval for instructions value: -1.43 -1.33 95% mean confidence interval for instructions %-change: -0.72% -0.68% Instructions are helped. total cycles in shared programs: 559599253 -> 559581699 (<.01%) cycles in affected programs: 13315565 -> 13298011 (-0.13%) helped: 2600 HURT: 269 helped stats (abs) min: 1 max: 2128 x̄: 12.24 x̃: 2 helped stats (rel) min: <.01% max: 23.95% x̄: 0.41% x̃: 0.20% HURT stats (abs) min: 1 max: 790 x̄: 53.07 x̃: 20 HURT stats (rel) min: 0.02% max: 15.96% x̄: 1.55% x̃: 0.75% 95% mean confidence interval for cycles value: -8.47 -3.77 95% mean confidence interval for cycles %-change: -0.27% -0.18% Cycles are helped. LOST: 0 GAINED: 8 Haswell total instructions in shared programs: 12978609 -> 12973483 (-0.04%) instructions in affected programs: 932921 -> 927795 (-0.55%) helped: 3480 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.47 x̃: 1 helped stats (rel) min: 0.03% max: 7.84% x̄: 0.78% x̃: 0.58% 95% mean confidence interval for instructions value: -1.53 -1.42 95% mean confidence interval for instructions %-change: -0.80% -0.75% Instructions are helped. total cycles in shared programs: 410270788 -> 410250531 (<.01%) cycles in affected programs: 10986161 -> 10965904 (-0.18%) helped: 2087 HURT: 254 helped stats (abs) min: 1 max: 2672 x̄: 14.63 x̃: 4 helped stats (rel) min: <.01% max: 39.61% x̄: 0.42% x̃: 0.21% HURT stats (abs) min: 1 max: 519 x̄: 40.49 x̃: 16 HURT stats (rel) min: 0.01% max: 12.83% x̄: 1.20% x̃: 0.47% 95% mean confidence interval for cycles value: -12.82 -4.49 95% mean confidence interval for cycles %-change: -0.31% -0.18% Cycles are helped. LOST: 0 GAINED: 5 Ivy Bridge total instructions in shared programs: 11686082 -> 11681548 (-0.04%) instructions in affected programs: 937696 -> 933162 (-0.48%) helped: 3150 HURT: 0 helped stats (abs) min: 1 max: 33 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 7.84% x̄: 0.69% x̃: 0.49% 95% mean confidence interval for instructions value: -1.49 -1.38 95% mean confidence interval for instructions %-change: -0.71% -0.67% Instructions are helped. total cycles in shared programs: 257514962 -> 257492471 (<.01%) cycles in affected programs: 11524149 -> 11501658 (-0.20%) helped: 1970 HURT: 239 helped stats (abs) min: 1 max: 3525 x̄: 17.48 x̃: 3 helped stats (rel) min: <.01% max: 49.60% x̄: 0.46% x̃: 0.17% HURT stats (abs) min: 1 max: 1358 x̄: 50.00 x̃: 15 HURT stats (rel) min: 0.02% max: 59.88% x̄: 1.84% x̃: 0.65% 95% mean confidence interval for cycles value: -17.01 -3.35 95% mean confidence interval for cycles %-change: -0.33% -0.08% Cycles are helped. LOST: 9 GAINED: 1 Sandy Bridge total instructions in shared programs: 10432841 -> 10429893 (-0.03%) instructions in affected programs: 685071 -> 682123 (-0.43%) helped: 2453 HURT: 0 helped stats (abs) min: 1 max: 9 x̄: 1.20 x̃: 1 helped stats (rel) min: 0.02% max: 7.55% x̄: 0.64% x̃: 0.46% 95% mean confidence interval for instructions value: -1.23 -1.17 95% mean confidence interval for instructions %-change: -0.67% -0.62% Instructions are helped. total cycles in shared programs: 146133660 -> 146134195 (<.01%) cycles in affected programs: 3991634 -> 3992169 (0.01%) helped: 1237 HURT: 153 helped stats (abs) min: 1 max: 2853 x̄: 6.93 x̃: 2 helped stats (rel) min: <.01% max: 29.00% x̄: 0.24% x̃: 0.14% HURT stats (abs) min: 1 max: 1740 x̄: 59.56 x̃: 12 HURT stats (rel) min: 0.03% max: 78.98% x̄: 1.96% x̃: 0.42% 95% mean confidence interval for cycles value: -5.13 5.90 95% mean confidence interval for cycles %-change: -0.17% 0.16% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 1 GM45 and Iron Lake had similar results (GM45 shown): total instructions in shared programs: 4800332 -> 4798380 (-0.04%) instructions in affected programs: 565995 -> 564043 (-0.34%) helped: 1451 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 1.35 x̃: 1 helped stats (rel) min: 0.05% max: 5.26% x̄: 0.47% x̃: 0.31% 95% mean confidence interval for instructions value: -1.40 -1.29 95% mean confidence interval for instructions %-change: -0.50% -0.45% Instructions are helped. total cycles in shared programs: 122032318 -> 122027798 (<.01%) cycles in affected programs: 8334868 -> 8330348 (-0.05%) helped: 1029 HURT: 1 helped stats (abs) min: 2 max: 40 x̄: 4.43 x̃: 2 helped stats (rel) min: <.01% max: 1.83% x̄: 0.09% x̃: 0.04% HURT stats (abs) min: 38 max: 38 x̄: 38.00 x̃: 38 HURT stats (rel) min: 0.25% max: 0.25% x̄: 0.25% x̃: 0.25% 95% mean confidence interval for cycles value: -4.70 -4.08 95% mean confidence interval for cycles %-change: -0.09% -0.08% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-03-26 08:50:43 -07:00
Ian Romanick	5bbb3d60d3	i965/fs: Allow cmod propagation when src0 is a uniform or shader input No shader-db changes. This source must have been written by a previous instruction, so it cannot be a uniform or a shader input. However, this change allows the next commit to help about 900 more shaders. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-03-26 08:50:43 -07:00
Ian Romanick	8f83eea71e	i965: Add negative_equals methods This method is similar to the existing ::equals methods. Instead of testing that two src_regs are equal to each other, it tests that one is the negation of the other. v2: Simplify various checks based on suggestions from Matt. Use src_reg::type instead of fixed_hw_reg.type in a check. Also suggested by Matt. v3: Rebase on 3 years. Fix some problems with negative_equals with VF constants. Add fs_reg::negative_equals. v4: Replace the existing default case with BRW_REGISTER_TYPE_UB, BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF. Suggested by Matt. Expand the FINISHME comment to better explain why it isn't already finished. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3] Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-03-26 08:50:43 -07:00
Jordan Justen	d60eaf7b1f	anv: Set genX_table for gen11 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-23 17:23:59 -07:00
Jordan Justen	af8535d02f	anv: Add gen11 to anv_genX_call Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-03-23 17:23:59 -07:00
Kenneth Graunke	90f556f0b1	android: Use local i915_drm.h rather than the system one. Fixes: `2d26c99933` (intel: devinfo: meson: include drm uapi) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2018-03-23 10:05:02 -07:00
Jason Ekstrand	884d27bcf6	nir: Rename image intrinsics to image_var Generated with git grep -l nir_intrinsic_image \| xargs \ sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g' and some manual fixing in nir_intrinsics.h Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-03-23 13:48:11 +11:00
Lionel Landwerlin	57a11550bc	i965: perf: query topology With the introduction of asymmetric slices in CNL, we cannot rely on the previous SUBSLICE_MASK getparam to tell userspace what subslices are available. We introduce a new uAPI in the kernel driver to report exactly what part of the GPU are fused and require this to be available on Gen10+. Prior generations can continue to rely on GETPARAM on older kernels. This patch is quite a lot of code because we have to support lots of different kernel versions, ranging from not providing any information (for Haswell on 4.13 through 4.17), to being able to query through GETPARAM (for gen8/9 on 4.13 through 4.17), to finally requiring 4.17 for Gen10+. This change stores topology information in a unified way on brw_context.topology from the various kernel APIs. And then generates the appropriate values for the equations from that unified topology. v2: Move slice/subslice masks fields to gen_device_info (Rafael) v3: Add a gen_device_info_subslice_available() helper (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-22 20:14:22 +00:00
Lionel Landwerlin	c1900f5b0f	intel: devinfo: add helper functions to fill fusing masks values There are a couple of ways we can get the fusing information from the kernel : - Through DRM_I915_GETPARAM with the SLICE_MASK/SUBSLICE_MASK parameters - Through the new DRM_IOCTL_I915_QUERY by requesting the DRM_I915_QUERY_TOPOLOGY_INFO The second method is more accurate and also gives us the EUs fusing masks. It's also a requirement for CNL as this platform has asymetric subslices and the first method SUBSLICE_MASK value is assumed uniform across slices. v2: Change gen_device_info_update_from_masks() to generate topology and call into gen_device_info_update_from_topology (Lionel/Ken) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-22 20:14:22 +00:00
Lionel Landwerlin	2d26c99933	intel: devinfo: meson: include drm uapi Already available with the autotools build. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-22 20:14:22 +00:00
Lionel Landwerlin	c471716574	intel: devinfo: store slice/subslice/eu masks We want to store values coming from the kernel but as a first step, we can generate mask values out the numbers already stored in the gen_device_info masks. v2: Add a helper to set EU masks (Lionel/Ken) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-22 20:14:22 +00:00
Lionel Landwerlin	7e2c6147da	intel: devinfo: store number of EUs per subslice This will be reused to store values reported by the kernel. The main use case will be for use as the input values of the metric sets equations for the INTEL_performance_queries extension. By storing this information in the gen_device_info we make this non GL specific so this can be reused by Vulkan if we ever have an equivalent extension. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-22 20:14:22 +00:00
Juan A. Suarez Romero	13459c637a	anv/radv: autotools: include vulkan_*.h headers Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-03-22 18:25:39 +01:00
Matt Turner	724586a266	intel/compiler: Readd ICL to test_eu_validate.cpp Now that the PCI IDs are upstream, this can be readded.	2018-03-22 09:56:09 -07:00
Matt Turner	65b060d9cb	intel/compiler: Skip 64-bit type tests when types not available Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-03-22 09:56:09 -07:00

... 245 246 247 248 249 ...

15202 commits