Commit graph

2868 commits

Author SHA1 Message Date
Ian Romanick
780f307ba8 i965/vec4: Allow cmod propagation when src0 is a uniform or shader input
No shader-db changes.  This source must have been written by a previous
instruction, so it cannot be a uniform or a shader input.  However, this
change allows the next commit to help more shaders.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Ian Romanick
020b0055e7 i965/fs: Propagate conditional modifiers from compares to adds
The math inside the add and the cmp in this instruction sequence is the
same.  We can utilize this to eliminate the compare.

add(8)          g5<1>F          g2<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
cmp.z.f0(8)     null<1>F        g2<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g8<1>F          (abs)g5<8,8,1>F 3e-37F          { align1 1Q };

This is reduced to:

add.z.f0(8)     g5<1>F          g2<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
(-f0) sel(8)    g8<1>F          (abs)g5<8,8,1>F 3e-37F          { align1 1Q };

This optimization pass could do even better.  The nature of converting
vectorized code from the GLSL front end to scalar code in NIR results in
sequences like:

add(8)          g7<1>F          g4<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
add(8)          g6<1>F          g3<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
add(8)          g5<1>F          g2<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
cmp.z.f0(8)     null<1>F        g2<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g8<1>F          (abs)g5<8,8,1>F 3e-37F          { align1 1Q };
cmp.z.f0(8)     null<1>F        g3<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g10<1>F         (abs)g6<8,8,1>F 3e-37F          { align1 1Q };
cmp.z.f0(8)     null<1>F        g4<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g12<1>F         (abs)g7<8,8,1>F 3e-37F          { align1 1Q };

In this sequence, only the first cmp.z is removed.  With different
scheduling, all 3 could get removed.

Skylake
total instructions in shared programs: 14407009 -> 14400173 (-0.05%)
instructions in affected programs: 1307274 -> 1300438 (-0.52%)
helped: 4880
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.03% max: 8.70% x̄: 0.70% x̃: 0.52%
95% mean confidence interval for instructions value: -1.45 -1.35
95% mean confidence interval for instructions %-change: -0.72% -0.69%
Instructions are helped.

total cycles in shared programs: 532943169 -> 532923528 (<.01%)
cycles in affected programs: 14065798 -> 14046157 (-0.14%)
helped: 2703
HURT: 339
helped stats (abs) min: 1 max: 1062 x̄: 12.27 x̃: 2
helped stats (rel) min: <.01% max: 28.72% x̄: 0.38% x̃: 0.21%
HURT stats (abs)   min: 1 max: 739 x̄: 39.86 x̃: 12
HURT stats (rel)   min: 0.02% max: 27.69% x̄: 1.38% x̃: 0.41%
95% mean confidence interval for cycles value: -8.66 -4.26
95% mean confidence interval for cycles %-change: -0.24% -0.14%
Cycles are helped.

LOST:   0
GAINED: 1

Broadwell
total instructions in shared programs: 14719636 -> 14712949 (-0.05%)
instructions in affected programs: 1288188 -> 1281501 (-0.52%)
helped: 4845
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.38 x̃: 1
helped stats (rel) min: 0.03% max: 8.00% x̄: 0.70% x̃: 0.52%
95% mean confidence interval for instructions value: -1.43 -1.33
95% mean confidence interval for instructions %-change: -0.72% -0.68%
Instructions are helped.

total cycles in shared programs: 559599253 -> 559581699 (<.01%)
cycles in affected programs: 13315565 -> 13298011 (-0.13%)
helped: 2600
HURT: 269
helped stats (abs) min: 1 max: 2128 x̄: 12.24 x̃: 2
helped stats (rel) min: <.01% max: 23.95% x̄: 0.41% x̃: 0.20%
HURT stats (abs)   min: 1 max: 790 x̄: 53.07 x̃: 20
HURT stats (rel)   min: 0.02% max: 15.96% x̄: 1.55% x̃: 0.75%
95% mean confidence interval for cycles value: -8.47 -3.77
95% mean confidence interval for cycles %-change: -0.27% -0.18%
Cycles are helped.

LOST:   0
GAINED: 8

Haswell
total instructions in shared programs: 12978609 -> 12973483 (-0.04%)
instructions in affected programs: 932921 -> 927795 (-0.55%)
helped: 3480
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.47 x̃: 1
helped stats (rel) min: 0.03% max: 7.84% x̄: 0.78% x̃: 0.58%
95% mean confidence interval for instructions value: -1.53 -1.42
95% mean confidence interval for instructions %-change: -0.80% -0.75%
Instructions are helped.

total cycles in shared programs: 410270788 -> 410250531 (<.01%)
cycles in affected programs: 10986161 -> 10965904 (-0.18%)
helped: 2087
HURT: 254
helped stats (abs) min: 1 max: 2672 x̄: 14.63 x̃: 4
helped stats (rel) min: <.01% max: 39.61% x̄: 0.42% x̃: 0.21%
HURT stats (abs)   min: 1 max: 519 x̄: 40.49 x̃: 16
HURT stats (rel)   min: 0.01% max: 12.83% x̄: 1.20% x̃: 0.47%
95% mean confidence interval for cycles value: -12.82 -4.49
95% mean confidence interval for cycles %-change: -0.31% -0.18%
Cycles are helped.

LOST:   0
GAINED: 5

Ivy Bridge
total instructions in shared programs: 11686082 -> 11681548 (-0.04%)
instructions in affected programs: 937696 -> 933162 (-0.48%)
helped: 3150
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.03% max: 7.84% x̄: 0.69% x̃: 0.49%
95% mean confidence interval for instructions value: -1.49 -1.38
95% mean confidence interval for instructions %-change: -0.71% -0.67%
Instructions are helped.

total cycles in shared programs: 257514962 -> 257492471 (<.01%)
cycles in affected programs: 11524149 -> 11501658 (-0.20%)
helped: 1970
HURT: 239
helped stats (abs) min: 1 max: 3525 x̄: 17.48 x̃: 3
helped stats (rel) min: <.01% max: 49.60% x̄: 0.46% x̃: 0.17%
HURT stats (abs)   min: 1 max: 1358 x̄: 50.00 x̃: 15
HURT stats (rel)   min: 0.02% max: 59.88% x̄: 1.84% x̃: 0.65%
95% mean confidence interval for cycles value: -17.01 -3.35
95% mean confidence interval for cycles %-change: -0.33% -0.08%
Cycles are helped.

LOST:   9
GAINED: 1

Sandy Bridge
total instructions in shared programs: 10432841 -> 10429893 (-0.03%)
instructions in affected programs: 685071 -> 682123 (-0.43%)
helped: 2453
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 1.20 x̃: 1
helped stats (rel) min: 0.02% max: 7.55% x̄: 0.64% x̃: 0.46%
95% mean confidence interval for instructions value: -1.23 -1.17
95% mean confidence interval for instructions %-change: -0.67% -0.62%
Instructions are helped.

total cycles in shared programs: 146133660 -> 146134195 (<.01%)
cycles in affected programs: 3991634 -> 3992169 (0.01%)
helped: 1237
HURT: 153
helped stats (abs) min: 1 max: 2853 x̄: 6.93 x̃: 2
helped stats (rel) min: <.01% max: 29.00% x̄: 0.24% x̃: 0.14%
HURT stats (abs)   min: 1 max: 1740 x̄: 59.56 x̃: 12
HURT stats (rel)   min: 0.03% max: 78.98% x̄: 1.96% x̃: 0.42%
95% mean confidence interval for cycles value: -5.13 5.90
95% mean confidence interval for cycles %-change: -0.17% 0.16%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 1

GM45 and Iron Lake had similar results (GM45 shown):
total instructions in shared programs: 4800332 -> 4798380 (-0.04%)
instructions in affected programs: 565995 -> 564043 (-0.34%)
helped: 1451
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.35 x̃: 1
helped stats (rel) min: 0.05% max: 5.26% x̄: 0.47% x̃: 0.31%
95% mean confidence interval for instructions value: -1.40 -1.29
95% mean confidence interval for instructions %-change: -0.50% -0.45%
Instructions are helped.

total cycles in shared programs: 122032318 -> 122027798 (<.01%)
cycles in affected programs: 8334868 -> 8330348 (-0.05%)
helped: 1029
HURT: 1
helped stats (abs) min: 2 max: 40 x̄: 4.43 x̃: 2
helped stats (rel) min: <.01% max: 1.83% x̄: 0.09% x̃: 0.04%
HURT stats (abs)   min: 38 max: 38 x̄: 38.00 x̃: 38
HURT stats (rel)   min: 0.25% max: 0.25% x̄: 0.25% x̃: 0.25%
95% mean confidence interval for cycles value: -4.70 -4.08
95% mean confidence interval for cycles %-change: -0.09% -0.08%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Ian Romanick
5bbb3d60d3 i965/fs: Allow cmod propagation when src0 is a uniform or shader input
No shader-db changes.  This source must have been written by a previous
instruction, so it cannot be a uniform or a shader input.  However, this
change allows the next commit to help about 900 more shaders.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Ian Romanick
8f83eea71e i965: Add negative_equals methods
This method is similar to the existing ::equals methods.  Instead of
testing that two src_regs are equal to each other, it tests that one is
the negation of the other.

v2: Simplify various checks based on suggestions from Matt.  Use
src_reg::type instead of fixed_hw_reg.type in a check.  Also suggested
by Matt.

v3: Rebase on 3 years.  Fix some problems with negative_equals with VF
constants.  Add fs_reg::negative_equals.

v4: Replace the existing default case with BRW_REGISTER_TYPE_UB,
BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF.  Suggested by Matt.
Expand the FINISHME comment to better explain why it isn't already
finished.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Jordan Justen
d60eaf7b1f anv: Set genX_table for gen11
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-03-23 17:23:59 -07:00
Jordan Justen
af8535d02f anv: Add gen11 to anv_genX_call
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-03-23 17:23:59 -07:00
Kenneth Graunke
90f556f0b1 android: Use local i915_drm.h rather than the system one.
Fixes: 2d26c99933 (intel: devinfo: meson: include drm uapi)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
2018-03-23 10:05:02 -07:00
Jason Ekstrand
884d27bcf6 nir: Rename image intrinsics to image_var
Generated with

git grep -l nir_intrinsic_image | xargs \
sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g'

and some manual fixing in nir_intrinsics.h

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-03-23 13:48:11 +11:00
Lionel Landwerlin
57a11550bc i965: perf: query topology
With the introduction of asymmetric slices in CNL, we cannot rely on
the previous SUBSLICE_MASK getparam to tell userspace what subslices
are available.

We introduce a new uAPI in the kernel driver to report exactly what
part of the GPU are fused and require this to be available on Gen10+.

Prior generations can continue to rely on GETPARAM on older kernels.

This patch is quite a lot of code because we have to support lots of
different kernel versions, ranging from not providing any information
(for Haswell on 4.13 through 4.17), to being able to query through
GETPARAM (for gen8/9 on 4.13 through 4.17), to finally requiring 4.17
for Gen10+.

This change stores topology information in a unified way on
brw_context.topology from the various kernel APIs. And then generates
the appropriate values for the equations from that unified topology.

v2: Move slice/subslice masks fields to gen_device_info (Rafael)

v3: Add a gen_device_info_subslice_available() helper (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 20:14:22 +00:00
Lionel Landwerlin
c1900f5b0f intel: devinfo: add helper functions to fill fusing masks values
There are a couple of ways we can get the fusing information from the
kernel :

  - Through DRM_I915_GETPARAM with the SLICE_MASK/SUBSLICE_MASK
    parameters

  - Through the new DRM_IOCTL_I915_QUERY by requesting the
    DRM_I915_QUERY_TOPOLOGY_INFO

The second method is more accurate and also gives us the EUs fusing
masks. It's also a requirement for CNL as this platform has asymetric
subslices and the first method SUBSLICE_MASK value is assumed uniform
across slices.

v2: Change gen_device_info_update_from_masks() to generate topology
    and call into gen_device_info_update_from_topology (Lionel/Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 20:14:22 +00:00
Lionel Landwerlin
2d26c99933 intel: devinfo: meson: include drm uapi
Already available with the autotools build.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 20:14:22 +00:00
Lionel Landwerlin
c471716574 intel: devinfo: store slice/subslice/eu masks
We want to store values coming from the kernel but as a first step, we
can generate mask values out the numbers already stored in the
gen_device_info masks.

v2: Add a helper to set EU masks (Lionel/Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 20:14:22 +00:00
Lionel Landwerlin
7e2c6147da intel: devinfo: store number of EUs per subslice
This will be reused to store values reported by the kernel. The main
use case will be for use as the input values of the metric sets
equations for the INTEL_performance_queries extension. By storing this
information in the gen_device_info we make this non GL specific so
this can be reused by Vulkan if we ever have an equivalent extension.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 20:14:22 +00:00
Juan A. Suarez Romero
13459c637a anv/radv: autotools: include vulkan_*.h headers
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-03-22 18:25:39 +01:00
Matt Turner
724586a266 intel/compiler: Readd ICL to test_eu_validate.cpp
Now that the PCI IDs are upstream, this can be readded.
2018-03-22 09:56:09 -07:00
Matt Turner
65b060d9cb intel/compiler: Skip 64-bit type tests when types not available
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 09:56:09 -07:00
Jason Ekstrand
d2eecf0b0b intel/compiler/icl: Clear "null render target" bit in extended message descriptor
Otherwise all our render target writes go no where.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 09:56:09 -07:00
Anuj Phogat
1484876ef7 intel/compiler/icl: Update the assert in brw_stage_has_packed_dispatch()
Rafael ran piglit with the test code enabled and saw no additional GPU
hangs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 09:56:09 -07:00
Anuj Phogat
f05e0d9c2a intel/common/icl: Disable hiz surface sampling
On gen11+ AUX_HIZ is not a supported value for surfaces being
sampled by the 3D sampler.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 09:56:09 -07:00
Anuj Phogat
370af9dcc0 intel/common/icl: Add L3 config
ICL uses the same L3 configs as CNL, just leaving the SLM configs out.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 09:56:09 -07:00
Matt Turner
f56693af4b intel/tools/aubinator: Drop platform list from print_help()
We all know the platform names, and I don't want to update this list
continually.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-22 09:56:09 -07:00
Caio Marcelo de Oliveira Filho
12c22b897a anv/pipeline: don't pass constant view index in multiview
If view mask has only one bit set, view index is effectively a
constant, so doesn't need to be passed to the next stages, just always
set it.

Part of this was in the original patch that added
anv_nir_lower_multiview.c but disabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-03-21 14:49:50 -07:00
Caio Marcelo de Oliveira Filho
5e7c1d05d4 anv/pipeline: use less instructions for multiview
The view_index is encoded in the remainder of dividing instance id by
the number of views in the view mask (n). In the general case (handled
by the else clause), there is a need to map from 0..n-1 into the
number of the view being masked. For that a map is encoded.

In the case only the first n bits in the mask are set, the mapping is
trivial, 0..n-1 already represent what view is being referred to.

That case was in the original patch that added
anv_nir_lower_multiview.c but disabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-03-21 14:49:50 -07:00
Rafael Antognolli
5297a17571 aubinator_error_decode: Compare only the class_name of the ring.
ring_name is "<class_name> + <instance_id>" (e.g. rcs0). So we need to
first compare the class name only, then get the instance id.

Without this, INSTDONE is not being decoded.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2018-03-21 11:35:15 -07:00
Scott D Phillips
cab8df1e3e intel/tools: aubinator: Catch gen11 "enhanced execlist" submission
Different registers are used for execlist submission in gen11, so
also watch those. This code only watches element zero of the
submit queue, which is all aubdump currently writes.

Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-03-21 11:07:15 -07:00
Lionel Landwerlin
7f977d51b3 intel: genxml: add INSTPM/CS_DEBUG_MODE2 registers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-20 16:58:30 +00:00
Scott D Phillips
d849d36c6c anv: off-by-one in GetDescriptorSetLayoutSupport
Loop was accessing one more than bindingCount elements from
pBindings, accessing uninitialized memory.

Fixes: ddc4069122 ("anv: Implement VK_KHR_maintenance3")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-03-20 07:58:10 -07:00
Caio Marcelo de Oliveira Filho
f6338c3b85 anv/pipeline: set active_stages early
Since the intermediate states of active_stages are not used,
i.e. active_stages is read only after all stages were set into it,
just set its value before compiling the shaders.

This will allow to conditionally run certain passes based on what
other shaders are being used, e.g. a certain pass might only be
applicable to the vertex shader if there's no geometry or tessellation
shader being used.

v2: Use vk_to_mesa_shader_stage. (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-03-19 18:00:49 +00:00
Caio Marcelo de Oliveira Filho
318073ce66 anv/pipeline: fail if TCS/TES compile fail
v2: Add Fixes tag. (Lionel)

Fixes: e50d4807a3 ("anv: Compile TCS/TES shaders.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-03-19 18:00:49 +00:00
Eric Anholt
7db1c09d12 anv: Silence warning about heap_size.
We only get VK_SUCCESS if it was initialized, but apparently my compiler
doesn't track that far.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-03-16 15:10:05 -07:00
Eric Anholt
d25640c3a3 i965: Silence compiler warning about promoted_constants.
We only have a cfg != NULL if we went through one of the paths that set
it, but my compiler doesn't figure that out.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6411defdcd ("intel/cs: Re-run final NIR optimizations for each SIMD size")
2018-03-16 15:09:55 -07:00
Eric Anholt
9f89452ea3 anv: Silence compiler warnings about uninitialized bind_offset.
This is a legitimate warning: if anv's blorp_alloc_binding_table() throws
an error from anv_cmd_buffer_alloc_blorp_binding_table(), we silently
continue to use this undefined value.  The rest of this code doesn't seem
very allocation-error-proof, though, either.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-03-16 15:09:47 -07:00
Matt Turner
f3833f1ca7 intel/compiler: Use gen_get_device_info() in test_eu_validate
Previously the unit test filled out a minimal devinfo struct. A previous
patch caused the test to begin assert failing because the devinfo was
not complete. Avoid this by using the real mechanism to create devinfo.

Note that we have to drop icl from the table, since we now rely on the
name -> PCI ID translation done by gen_device_name_to_pci_device_id(),
and ICL's PCI IDs are not upstream yet.

Fixes: f89e735719 ("intel/compiler: Check for unsupported register sizes.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-03-16 13:20:21 -07:00
Matt Turner
54db78b196 intel: Add cfl to gen_device_name_to_pci_device_id()
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-03-16 13:20:21 -07:00
Rafael Antognolli
f89e735719 intel/compiler: Check for unsupported register sizes.
Make sure we don't emit 64 bit types if the hardware doesn't support
them.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-16 09:27:16 -07:00
Lionel Landwerlin
51783f3e7d anv: silence unused variable warning
Fixes: 59b0ea0c74 ("anv: Stop returning VK_ERROR_INCOMPATIBLE_DRIVER")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-03-15 18:56:26 +00:00
Lionel Landwerlin
0f544a3c51 anv: silence unused function warning on gen11
[84/227] Compiling C object 'src/intel/vulkan/libanv_gen110@sta/genX_blorp_exec.c.o'.
../src/intel/vulkan/genX_blorp_exec.c:68:1: warning: ‘blorp_get_surface_base_address’ defined but not used [-Wunused-function]
 blorp_get_surface_base_address(struct blorp_batch *batch)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-03-15 18:55:42 +00:00
Karol Herbst
b617bfcccf compiler: int8/uint8 support
OpenCL kernels also have int8/uint8.

v2: remove changes in nir_search as Jason posted a patch for that

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-03-14 10:08:42 -04:00
Iago Toral Quiroga
1a0aba7216 anv/entrypoints: VkGetDeviceProcAddr returns NULL for core instance commands
af5f2322d0 addressed this for extension commands, but the spec mandates
this behavior also for core API commands. From the Vulkan spec,
Table 2. vkGetDeviceProcAddr behavior:

device     pname                            return
----------------------------------------------------------
(..)
device     core device-level command        fp
(...)

See that it specifically states "device-level".

Since the vk.xml file doesn't state if core commands are instance or
device level, we identify device level commands as the ones that take a
VkDevice, VkQueue or VkCommandBuffer as their first parameter.

Fixes test failures in new work-in-progress CTS tests.

Also see the public issue:
https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/issues/2323

v2:
  - Include reference to github issue (Emil)
  - Rebased on top of Vulkan 1.1 changes.

v3:
  - Remove the not in the condition and switch the then/else cases (Jason)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-03-14 08:09:15 +01:00
Iago Toral Quiroga
a631575ff4 anv/entrypoints: dispatches to VkQueue are device-level
v2:
  - Add trampoline functions (Jason)
  - Add an assertion for unhandled trampoline cases

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-03-14 08:09:15 +01:00
Jordan Justen
24b415270f intel/vulkan: Hard code CS scratch_ids_per_subslice for Cherryview
Ken suggested that we might be underallocating scratch space on HD
400. Allocating scratch space as though there was actually 8 EUs
seems to help with a GPU hang seen on synmark CSDof.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-09 16:15:58 -08:00
Ian Romanick
1583f49eaa i965/vec4: Allow CSE on subset VF constant loads
v2: Rewrite the code that generates the VF mask.  Suggested by Ken.

No changes on other platforms.

Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13059891 -> 13059884 (<.01%)
instructions in affected programs: 431 -> 424 (-1.62%)
helped: 7
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -3.39% -0.71%
Instructions are helped.

total cycles in shared programs: 409260032 -> 409260018 (<.01%)
cycles in affected programs: 4228 -> 4214 (-0.33%)
helped: 7
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -1.15% 0.07%

Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-08 15:26:26 -08:00
Ian Romanick
360899d457 i965/vec4: Relax writemask condition in CSE
If the previously seen instruction generates more fields than the new
instruction, still allow CSE to happen.  This doesn't do much, but it
also enables a couple more shaders in the next patch.  It helped quite a
bit in another change series that I have (at least for now) abandoned.

v2: Add some extra comentary about the parameters to instructions_match.
Suggested by Ken.

No changes on Skylake, Broadwell, Iron Lake or GM45.

Ivy Bridge and Haswell had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11780295 -> 11780294 (<.01%)
instructions in affected programs: 302 -> 301 (-0.33%)
helped: 1
HURT: 0

total cycles in shared programs: 257308315 -> 257308313 (<.01%)
cycles in affected programs: 2074 -> 2072 (-0.10%)
helped: 1
HURT: 0

Sandy Bridge
total instructions in shared programs: 10506687 -> 10506686 (<.01%)
instructions in affected programs: 335 -> 334 (-0.30%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-03-08 15:26:26 -08:00
Ian Romanick
52c7df1643 i965/fs: Merge CMP and SEL into CSEL on Gen8+
v2: Fix several problems handling inverted predicates.  Add a much
bigger comment around the BRW_CONDITIONAL_NZ case.

v3: Allow uniforms and shader inputs as sources for the original SEL and
CMP instructions.  This enables a LOT more shaders to receive CSEL
merging (5816 vs 8564 on SKL).

v4: Report progress.

Broadwell and Skylake had similar results. (Broadwell shown)
helped: 8527
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 1
helped stats (rel) min: 0.03% max: 17.80% x̄: 1.12% x̃: 0.70%
95% mean confidence interval for instructions value: -2.51 -2.36
95% mean confidence interval for instructions %-change: -1.15% -1.10%
Instructions are helped.

total cycles in shared programs: 559442317 -> 558288357 (-0.21%)
cycles in affected programs: 372699860 -> 371545900 (-0.31%)
helped: 6748
HURT: 1450
helped stats (abs) min: 1 max: 32000 x̄: 182.41 x̃: 12
helped stats (rel) min: <.01% max: 66.08% x̄: 3.42% x̃: 0.70%
HURT stats (abs)   min: 1 max: 2538 x̄: 53.08 x̃: 14
HURT stats (rel)   min: <.01% max: 96.72% x̄: 3.32% x̃: 0.90%
95% mean confidence interval for cycles value: -179.01 -102.51
95% mean confidence interval for cycles %-change: -2.37% -2.08%
Cycles are helped.

LOST:   0
GAINED: 6

No changes on earlier platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-08 15:26:26 -08:00
Kenneth Graunke
70de61594d i965/fs: Add infrastructure for generating CSEL instructions.
v2 (idr): Don't allow CSEL with a non-float src2.

v3 (idr): Add CSEL to fs_inst::flags_written.  Suggested by Matt.

v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt).  Don't
reset the access mode afterwards (suggested by Samuel and Matt).  Add
support for CSEL not modifying the flags to more places (requested by
Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-08 15:26:26 -08:00
Jason Ekstrand
c217607b65 anv: Support version overrides
While always sketchy to do, this is useful for debugging.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
d6b65222df anv: Enable Vulkan 1.1
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
03c07ac548 anv: Add support for SPIR-V 1.3 subgroup operations
This requires us to bump the subgroup size to 32 for all shader stages
because Vulkan requires that to be a physical device query.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
8b4a5e641b intel/fs: Add support for subgroup quad operations
NIR has code to lower these away for us but we can do significantly
better in many cases with register regioning and SIMD4x2.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Jason Ekstrand
2292b20b29 intel/fs: Implement reduce and scan opeprations
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00