Commit graph

5520 commits

Author SHA1 Message Date
Arcady Goldmints-Orlov
95fd950d35 intel/compiler: fix alignment assert in nir_emit_intrinsic
Fixes: c643979228 (intel/fs: Choose memory message type based on bit size)
Fixes: dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i8vec2

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5000>
2020-05-12 22:14:31 +00:00
Jason Ekstrand
51c6bc13ce anv,vulkan: Implement VK_EXT_private_data
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4882>
2020-05-12 18:01:48 +00:00
Kenneth Graunke
ab16bff97d intel: Delete hardcoded devinfo->urb.size values for Gen7+ (sans DG1).
On all Gen7+ platforms except DG1, the URB is a subsection of the
configurable L3 cache, and so the size can vary.  The size listed
in the documentation on those platforms is an "example size", picked
by calculating it based on an arbitrarily chosen L3 config.

Hardcoding a value for those platforms provides no value and only
confuses people trying to fill out these tables when doing hardware
enabling.  anv and iris never use this field.  i965 uses it to
initialize brw->urb.size, but then updates that in update_urb_size()
to be the correct value, so the initial value doesn't matter.

Delete the values for Gen7+ and update the comment accordingly.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4969>
2020-05-11 09:40:56 -07:00
Samuel Pitoiset
04718a9cd6 nir: do not vectorize load/store if offset can overflow and robustness enabled
This prevents vectorization for loads/stores that can overflow if
the low offset is negative and the range greater or equal than 0.

The caller can pass the list of variable modes that matter for
robust access.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4881>
2020-05-11 07:25:15 +00:00
Lionel Landwerlin
9e790fea7c genxml: pack: deal with default field not being simple integers
Storing integers into enums doesn't seem to cause issues in C, but
with our builder tests written in C++ this causes warnings/errors.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4938>
2020-05-09 07:20:48 +00:00
Lionel Landwerlin
942d4538a4 genxml: factor out utility functions
v2: Use the regexp version (Jordan)
    Also fix regexp that missed the ' character replacement (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4938>
2020-05-09 07:20:48 +00:00
Lionel Landwerlin
d07f69413e genxml: fix invalid end value for video fields
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4938>
2020-05-09 07:20:48 +00:00
Lionel Landwerlin
af17e392b2 genxml: run sorting script
Helps running diff/meld between generations :)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4938>
2020-05-09 07:20:48 +00:00
Jordan Justen
45c33313e6 intel/dev: Add device info for RKL
Cc: 20.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by : Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4955>
2020-05-09 01:39:43 +00:00
Jordan Justen
54996ad492 intel/dev: Split .num_subslices out of GEN12_FEATURES macro
The .num_subslices field makes it problematic to reuse the
GEN12_FEATURES macro in other macros.

This also fixes the number of L3 banks for tgl gt1, except that this
was already fixed by Jason (dynamically) in:

86f67952d3 ("intel/devinfo: Compute the correct L3$ size for Gen12")

Cc: 20.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by : Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4955>
2020-05-09 01:39:43 +00:00
D Scott Phillips
6c998c7adf intel/dump_gpu: Fix name of LD_PRELOAD in env append logic
Checking for the wrong environment variable name to be set causes
us to stomp any pre-existing LD_PRELOAD.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4970>
2020-05-08 14:49:07 -07:00
Jason Ekstrand
d11e4738a8 anv/allocator: Add a start_offset to anv_state_pool
This allows a pool's allocations to start somewhere other than the base
address.  Our first real use of this will be to use a negative offset
for the binding table pool to make it so that the offset is baked into
the pool and the code in anv_batch_chain.c doesn't have to understand
pool offsetting.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4897>
2020-05-08 16:54:17 +00:00
Lionel Landwerlin
8bcfce2fcd anv: fix alignments for uniform buffers
We were not consistent with minimums reported in the physical device
properties.

Fixes a few CTS tests :
   dEQP-VK.memory.requirements.dedicated_allocation.buffer.regular
   dEQP-VK.memory.requirements.extended.buffer.regular
   dEQP-VK.memory.requirements.core.buffer.regular

v2: Use define for the limit

v3: Rename define

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a0de2e0090 ("anv: increase minUniformBufferOffsetAlignment to 64")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4940>
2020-05-08 08:59:02 +00:00
Tapani Pälli
ee2aef3ea5 anv: call base finish only if pass given in DestroyRenderPass
Fixes: 682c81bdfb ("vulkan,anv: Add a base object struct type")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4936>
2020-05-08 08:36:45 +03:00
Ian Romanick
f46eabf84e nir/algebraic: Split ibfe and ubfe with two constant sources
I also tried splitting ubfe instructions with one or zero constants,
and zero shaders in shader-db were affected.

The "lost" shader is a compute shader that was promoted from SIMD8 to
SIMD16, so is also counted as the gained shader.

v2: Further restrict bfe splitting.  bfe with multiple constants is
better on at least some Radeon GPUs.  Use -x instead of 32-x in shift
counts.

v3: Fix the outer shift count for ibfe lowering.  Add c=0 optimizations
to prevent bad lowering.  Both suggested by Rhys.  Add shift by -32
optimizations.

Tiger Lake
total instructions in shared programs: 17608764 -> 17596316 (-0.07%)
instructions in affected programs: 303765 -> 291317 (-4.10%)
helped: 113
HURT: 46
helped stats (abs) min: 1 max: 458 x̄: 120.67 x̃: 21
helped stats (rel) min: 0.09% max: 11.23% x̄: 3.47% x̃: 1.39%
HURT stats (abs)   min: 1 max: 201 x̄: 25.83 x̃: 6
HURT stats (rel)   min: 0.23% max: 5.18% x̄: 1.53% x̃: 1.11%
95% mean confidence interval for instructions value: -101.13 -55.45
95% mean confidence interval for instructions %-change: -2.61% -1.44%
Instructions are helped.

total cycles in shared programs: 338390770 -> 333530868 (-1.44%)
cycles in affected programs: 79438330 -> 74578428 (-6.12%)
helped: 112
HURT: 64
helped stats (abs) min: 2 max: 268955 x̄: 44261.93 x̃: 1452
helped stats (rel) min: <.01% max: 29.51% x̄: 4.72% x̃: 2.23%
HURT stats (abs)   min: 2 max: 17618 x̄: 1522.41 x̃: 84
HURT stats (rel)   min: <.01% max: 7.34% x̄: 1.35% x̃: 0.34%
95% mean confidence interval for cycles value: -37232.47 -17993.69
95% mean confidence interval for cycles %-change: -3.37% -1.65%
Cycles are helped.

total spills in shared programs: 8944 -> 8138 (-9.01%)
spills in affected programs: 3240 -> 2434 (-24.88%)
helped: 67
HURT: 0

total fills in shared programs: 9373 -> 7842 (-16.33%)
fills in affected programs: 4736 -> 3205 (-32.33%)
helped: 67
HURT: 0

LOST:   1
GAINED: 2

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 16123288 -> 16116876 (-0.04%)
instructions in affected programs: 241155 -> 234743 (-2.66%)
helped: 126
HURT: 2
helped stats (abs) min: 1 max: 209 x̄: 50.90 x̃: 7
helped stats (rel) min: 0.07% max: 5.94% x̄: 1.76% x̃: 0.65%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.05% max: 0.24% x̄: 0.15% x̃: 0.15%
95% mean confidence interval for instructions value: -61.29 -38.89
95% mean confidence interval for instructions %-change: -2.05% -1.42%
Instructions are helped.

total cycles in shared programs: 335419163 -> 330438819 (-1.48%)
cycles in affected programs: 77515502 -> 72535158 (-6.42%)
helped: 139
HURT: 37
helped stats (abs) min: 2 max: 269140 x̄: 36374.19 x̃: 597
helped stats (rel) min: <.01% max: 28.60% x̄: 3.67% x̃: 1.31%
HURT stats (abs)   min: 4 max: 17618 x̄: 2045.08 x̃: 174
HURT stats (rel)   min: 0.02% max: 8.32% x̄: 2.61% x̃: 0.62%
95% mean confidence interval for cycles value: -37799.30 -18795.51
95% mean confidence interval for cycles %-change: -3.13% -1.57%
Cycles are helped.

total spills in shared programs: 8065 -> 7306 (-9.41%)
spills in affected programs: 3153 -> 2394 (-24.07%)
helped: 67
HURT: 0

total fills in shared programs: 8710 -> 7412 (-14.90%)
fills in affected programs: 4466 -> 3168 (-29.06%)
helped: 67
HURT: 0

LOST:   1
GAINED: 1

Broadwell
total instructions in shared programs: 14970538 -> 14965967 (-0.03%)
instructions in affected programs: 227040 -> 222469 (-2.01%)
helped: 126
HURT: 2
helped stats (abs) min: 1 max: 136 x̄: 36.29 x̃: 8
helped stats (rel) min: 0.07% max: 6.02% x̄: 1.47% x̃: 0.89%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.05% max: 0.24% x̄: 0.14% x̃: 0.14%
95% mean confidence interval for instructions value: -43.05 -28.37
95% mean confidence interval for instructions %-change: -1.69% -1.19%
Instructions are helped.

total cycles in shared programs: 336237662 -> 333035960 (-0.95%)
cycles in affected programs: 72066394 -> 68864692 (-4.44%)
helped: 134
HURT: 42
helped stats (abs) min: 4 max: 122640 x̄: 24344.54 x̃: 1833
helped stats (rel) min: <.01% max: 26.93% x̄: 4.02% x̃: 2.38%
HURT stats (abs)   min: 1 max: 17205 x̄: 1439.69 x̃: 92
HURT stats (rel)   min: <.01% max: 7.12% x̄: 1.34% x̃: 0.62%
95% mean confidence interval for cycles value: -23753.58 -12629.40
95% mean confidence interval for cycles %-change: -3.50% -1.98%
Cycles are helped.

total spills in shared programs: 21122 -> 20204 (-4.35%)
spills in affected programs: 3644 -> 2726 (-25.19%)
helped: 67
HURT: 0

total fills in shared programs: 24879 -> 23460 (-5.70%)
fills in affected programs: 4883 -> 3464 (-29.06%)
helped: 67
HURT: 0

Haswell
total instructions in shared programs: 13148269 -> 13145444 (-0.02%)
instructions in affected programs: 137046 -> 134221 (-2.06%)
helped: 97
HURT: 3
helped stats (abs) min: 1 max: 137 x̄: 30.58 x̃: 3
helped stats (rel) min: 0.14% max: 4.38% x̄: 1.38% x̃: 0.44%
HURT stats (abs)   min: 1 max: 70 x̄: 47.00 x̃: 70
HURT stats (rel)   min: 0.05% max: 5.82% x̄: 3.90% x̃: 5.82%
95% mean confidence interval for instructions value: -37.15 -19.35
95% mean confidence interval for instructions %-change: -1.56% -0.89%
Instructions are helped.

total cycles in shared programs: 321221834 -> 318333159 (-0.90%)
cycles in affected programs: 54932349 -> 52043674 (-5.26%)
helped: 95
HURT: 53
helped stats (abs) min: 4 max: 123390 x̄: 30648.39 x̃: 702
helped stats (rel) min: <.01% max: 28.87% x̄: 4.27% x̃: 2.87%
HURT stats (abs)   min: 4 max: 2357 x̄: 432.49 x̃: 113
HURT stats (rel)   min: <.01% max: 3.44% x̄: 1.03% x̃: 0.54%
95% mean confidence interval for cycles value: -26154.16 -12881.99
95% mean confidence interval for cycles %-change: -3.20% -1.55%
Cycles are helped.

total spills in shared programs: 19878 -> 19293 (-2.94%)
spills in affected programs: 3020 -> 2435 (-19.37%)
helped: 41
HURT: 2

total fills in shared programs: 20918 -> 19875 (-4.99%)
fills in affected programs: 3968 -> 2925 (-26.29%)
helped: 41
HURT: 2

LOST:   0
GAINED: 1

Ivy Bridge
total instructions in shared programs: 11875585 -> 11873641 (-0.02%)
instructions in affected programs: 78065 -> 76121 (-2.49%)
helped: 27
HURT: 0
helped stats (abs) min: 8 max: 134 x̄: 72.00 x̃: 72
helped stats (rel) min: 0.36% max: 4.23% x̄: 2.42% x̃: 2.42%
95% mean confidence interval for instructions value: -83.68 -60.32
95% mean confidence interval for instructions %-change: -2.78% -2.07%
Instructions are helped.

total cycles in shared programs: 178232734 -> 175769085 (-1.38%)
cycles in affected programs: 50018707 -> 47555058 (-4.93%)
helped: 27
HURT: 0
helped stats (abs) min: 82035 max: 99953 x̄: 91246.26 x̃: 92278
helped stats (rel) min: 4.40% max: 5.69% x̄: 4.93% x̃: 4.95%
95% mean confidence interval for cycles value: -93674.20 -88818.32
95% mean confidence interval for cycles %-change: -5.09% -4.78%
Cycles are helped.

total spills in shared programs: 4182 -> 3739 (-10.59%)
spills in affected programs: 1089 -> 646 (-40.68%)
helped: 27
HURT: 0

total fills in shared programs: 5216 -> 4345 (-16.70%)
fills in affected programs: 1874 -> 1003 (-46.48%)
helped: 27
HURT: 0

No changes on any earlier Intel platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4156>
2020-05-07 10:55:50 -07:00
Lionel Landwerlin
4f17e9eef6 anv: don't expose VK_INTEL_performance_query without kernel support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2b5f30b1d9 ("anv: implement VK_INTEL_performance_query")
Acked-by: Timothy Strelchun <timothy.strelchun@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4937>
2020-05-07 16:42:44 +00:00
Arcady Goldmints-Orlov
a0de2e0090 anv: increase minUniformBufferOffsetAlignment to 64
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4904>
2020-05-06 19:45:01 -05:00
Joshua Ashton
c4d11ea3c4 anv: Remove RANGE_SIZE usage
These were removed from the latest Vulkan headers
https://github.com/KhronosGroup/Vulkan-Docs/issues/1230

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4878>
2020-05-05 00:27:59 +00:00
Jason Ekstrand
7628585dd7 anv: Refactor setting descriptors with immutable sampler
Don't call anv_sampler_from_handle if the handle may be invalid.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
2020-05-04 14:06:27 +00:00
Jason Ekstrand
73fb7cdbe1 vulkan,anv: Move the DEFINE_HANDLE_CASTS macros to vk_object.h
We've already got these duplicated a bunch of places.  They should
really probably live in common code.  The new versions take two more
arguments:

 1. The struct member which gets you from __driver_type to the
    vk_object_base.  This requires drivers which use this to also use
    vk_object_base.

 2. The VkObjectType enum which represents that object type.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
2020-05-04 14:06:27 +00:00
Jason Ekstrand
682c81bdfb vulkan,anv: Add a base object struct type
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
2020-05-04 14:06:27 +00:00
Jason Ekstrand
369703774c anv: Allocate CPU-side memory for events
As discrete graphics looms, we really need to stop storing CPU data
structures in GPU memory.  One of the most egregious instances of this
was VkEvent where we had a CPU data structure living inside a dynamic
state pool allocation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
2020-05-04 14:06:27 +00:00
Jason Ekstrand
4ac4e8e11f anv: Stop clflushing events
They're allocated out of the dynamic state pool which is snooped.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
2020-05-04 14:06:27 +00:00
Jason Ekstrand
a9158f7951 vulkan,anv: Add a common base object type for VkDevice
We should keep this very minimal; I don't know that we need to go all
struct gl_context on it.  However, this gives us at least a tiny base on
which we can start building some common functionality.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
2020-05-04 14:06:27 +00:00
Dave Airlie
b2164320a0 i965: add support for gen 5 pipelined pointers to dump
I wanted to see inside these, so added support to the dumper.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4846>
2020-05-03 05:47:16 +10:00
Nataraj Deshpande
49cc9e9526 anv: Disable extensions based on Android versions
This extends commit 2243f0cd for anv with additional
extensions for Pie and Q versions.

Fixes tests with 9_R11 CTS:
dEQP-VK.api.info.android#no_unknown_extensions
dEQP-VK.api.info.device#extensions.

v2: Use snake_case function name (Jason Ekstrand)
    Drop Change-Id in commit (Kristian H. Kristensen)

v3: Resolve meson-clang error for ANDROID_API_LEVEL.

Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4827>
2020-05-01 21:04:26 +00:00
Nataraj Deshpande
a77cf797f1 anv: Limit vulkan version to 1.1 for Android
Current Android dessert versions such as Pie, Q reject
vulkan version > 1.1. Clamp the vulkan versions to 1.1
for platforms running these Android desserts.

Fixes android.graphics.cts.VulkanFeaturesTest and
dEQP-VK.api.info.device#properties.

v2: Limit version with '!ANDROID' (Eric Engestrom and Tapani Pälli)

Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4781>
2020-05-01 20:50:54 +00:00
Caio Marcelo de Oliveira Filho
e645bc6939 intel: Let drivers call brw_nir_lower_cs_intrinsics()
The motivating factor is: this lowering may cause
nir_intrinsic_load_local_group_size intrinsics to be added to the
shader, and by moving this around we make possible for the drivers to
lower that intrinsic by themselves.

Iris will do just that in a later patch for implementing variable
group size.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:37 -07:00
Caio Marcelo de Oliveira Filho
2663759af0 intel/fs: Add and use a new load_simd_width_intel intrinsic
Intrinsic to get the SIMD width, which not always the same as subgroup
size.  Starting with a small scope (Intel), but we can rename it later
to generalize if this turns out useful for other drivers.

Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of
a width will be passed as argument.  The pass also used to optimized
load_subgroup_id for the case that the workgroup fitted into a single
thread (it will be constant zero).  This optimization moved together
with lowering of the SIMD.

This is a preparation for letting the drivers call it before the
brw_compile_cs() step.

No shader-db changes in BDW, SKL, ICL and TGL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:37 -07:00
Caio Marcelo de Oliveira Filho
4b000b491a intel/fs: Add an option to lower variable group size in backend
Adding this since Iris will handle variable group size parameters by
itself.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:28 -07:00
Caio Marcelo de Oliveira Filho
0edb58a84e intel/fs: Clean up variable group size handling in backend
Just use the information from NIR shader_info.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:28 -07:00
Kenneth Graunke
615270502c intel: Move anv_gem_supports_syncobj_wait to common code.
This will let me use this in iris.

We leave the existing anv function for anv_gem_stubs.c faking, but
move the contents to a helper in a new src/intel/common/gen_gem.c file.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
2020-05-01 19:00:02 +00:00
Kenneth Graunke
812cf5f522 anv: Include linux/sync_file.h instead of cut and pasting contents
Linux 4.7 has been out for a long time, this is probably safe to
depend on at this point, rather than cut and pasting the contents.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
2020-05-01 19:00:02 +00:00
Caio Marcelo de Oliveira Filho
2a05ba5414 intel/dev: Bail when INTEL_DEVID_OVERRIDE is not valid
Avoids surprises where you set an OVERRIDE but it gets ignored and the
system PCI ID is used.

Also fixes the bug that the error of invalid platform name being
printed too early, even when the passed platform was a PCI ID (which
is also supported).

For the case where euid != uid, a warning was added but the behavior
wasn't changed: it is still going to fallback to system PCI ID.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4841>
2020-05-01 10:12:01 -07:00
D Scott Phillips
65b05ebdda anv,iris: Fix input vertex max for tcs on gen12
gen12 does away with the single patch dispatch mode for tcs, and
increases some limits so that 8_patch mode can always work. Make the
necessary changes so we don't try to fall back to single patch mode.

Fixes KHR-GL46.tessellation_shader.single.max_patch_vertices and others

Fixes: 44754279ac ("intel/fs/gen12: Use TCS 8_PATCH mode.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4843>
2020-05-01 16:49:11 +00:00
D Scott Phillips
7bd15135a6 intel/fs: Update location of Render Target Array Index for gen12
Render Target Array Index has moved from R0.0[26:16] to
R1.1[26:16] on gen12.

Fixes dEQP-VK.multiview.input_attachments.*

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4836>
2020-05-01 08:48:22 -07:00
Jason Ekstrand
3fac55ce0d Revert "anv/gen12: Temporarily disable VK_KHR_buffer_device_address (and EXT)"
This reverts commit c61ad77cd2.  We now no
longer have a problem with these.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4819>
2020-04-30 14:45:50 +00:00
Jason Ekstrand
4985e380dd intel/eu: Use non-coherent mode (BTI=253) for stateless A64 messages
We don't care about full IA coherency since we always have the
opportunity in GL or Vulkan to flush the data cache.  Using IA-coherent
mode is likely just making A64 access slower than it needs to be.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4819>
2020-04-30 14:45:50 +00:00
Lionel Landwerlin
0f4f1d70bf intel: add stub_gpu tool
Run shaderdb like this :

   intel_stub_gpu -p bxt ./run ./shaders/*

List of platform names is available from
gen_device_name_to_pci_device_id() (src/intel/dev/gen_device_info.c).

v2: Add missing getparam support
    Raise max soft limit of file descriptors

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4594>
2020-04-30 11:32:54 +03:00
Lionel Landwerlin
8c3c1d8a99 intel/dev: print out error when platform is not found by name
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4594>
2020-04-30 11:32:54 +03:00
Francisco Jerez
0842758ec0 intel/ir: Update performance analysis parameters for memory fence codegen changes.
The SFID field of the SHADER_OPCODE_MEMORY_FENCE and
SHADER_OPCODE_INTERLOCK instructions now indicates the target function
of the memory fence.  Account the cycle-count cost to the right shared
unit.

Fixes: f858fa26b4 ("intel/fs,vec4: Pull stall logic for memory fences up into the IR")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4817>
2020-04-29 23:40:36 +00:00
Jason Ekstrand
e581ddeeee intel/fs: Don't delete coalesced MOVs if they have a cmod
Shader-db results on ICL:

    total instructions in shared programs: 17133088 -> 17133287 (<.01%)
    instructions in affected programs: 61300 -> 61499 (0.32%)
    helped: 0
    HURT: 199

This means it's likely fixing 199 bugs. :-)  All the changed shaders are
in Mad Max.  It's surprisingly difficult to get the back-end compiler to
generate a pattern that hits this we don't tend to emit a lot coalescable
MOVs.  The pattern in Mad Max that's able to hit is fsign(fsat(x)) under
the right conditions.

Closes: #2820
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4773>
2020-04-29 16:45:51 +00:00
Tapani Pälli
1a33358b27 anv: remove assert from GetImageMemoryRequirements[2]
This assert is actually correct but due to how android hardware buffer
support is implemented we should remove it, otherwise debug build of
mesa hits the assert with Android CTS tests.

Test creates VkImage with non-external format and sets up
VkExternalMemoryImageCreateInfo to indicate that image *may* be used
with Android hardwarebuffer handle. Then test attempts to get image
memory requirements. Problem with this is that we setup all android
supporting images as having external format and thus hit the assert as
the size has not been set yet. This is not a problem in practice since
android will bind ahw memory with the image later on.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2807
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4762>
2020-04-29 08:30:42 +00:00
Caio Marcelo de Oliveira Filho
a3cba3c771 intel/fs: Only stall after sending all memory fence messages
In Gen11+, when emitting a fence for both L3 and SLM, the generated
code would look like

    SEND, MOV (for stall), SEND, MOV (for stall)

This commit change that so two SENDs are emitted before the MOVs for
stall.  This is similar to the approach used in Ivy Bridge for the
render fence.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho
f858fa26b4 intel/fs,vec4: Pull stall logic for memory fences up into the IR
Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.

For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).

Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics.  The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).

This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.

Shader-db results for IVB (i965):

    total instructions in shared programs: 11971190 -> 11971200 (<.01%)
    instructions in affected programs: 11482 -> 11492 (0.09%)
    helped: 0
    HURT: 8
    HURT stats (abs)   min: 1 max: 3 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
    95% mean confidence interval for instructions value: 0.66 1.84
    95% mean confidence interval for instructions %-change: 0.01% 0.27%
    Instructions are HURT.

  Unlike the previous code, that used the `mov g1 g2` trick to force
  both `g1` and `g2` to stall, the scheduling fence will generate `mov
  null g1` and `mov null g2`.  During review it was decided it was not
  worth keeping the special codepath for the small effect will have.

Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below

    total cycles in shared programs: 341738444 -> 341710570 (<.01%)
    cycles in affected programs: 7240002 -> 7212128 (-0.38%)
    helped: 46
    HURT: 5
    helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
    helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
    HURT stats (abs)   min: 2 max: 1768 x̄: 646.40 x̃: 362
    HURT stats (rel)   min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
    95% mean confidence interval for cycles value: -777.71 -315.38
    95% mean confidence interval for cycles %-change: -1.42% -0.83%
    Cycles are helped.

  This seems to be the effect of allocating two registers separatedly
  instead of a single one with size 2, which causes different register
  allocation, affecting the cycle estimates.

while ICL also has not change on instruction count but report changes
negative changes in cycles

    total cycles in shared programs: 352665369 -> 352707484 (0.01%)
    cycles in affected programs: 9608288 -> 9650403 (0.44%)
    helped: 4
    HURT: 104
    helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
    helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
    HURT stats (abs)   min: 2 max: 2016 x̄: 408.36 x̃: 48
    HURT stats (rel)   min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
    95% mean confidence interval for cycles value: 256.67 523.24
    95% mean confidence interval for cycles %-change: 0.63% 1.03%
    Cycles are HURT.

  AFAICT this is the result of the case above.

Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions

    total instructions in shared programs: 17690586 -> 17690597 (<.01%)
    instructions in affected programs: 64617 -> 64628 (0.02%)
    helped: 55
    HURT: 32
    helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
    helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
    HURT stats (abs)   min: 1 max: 65 x̄: 7.44 x̃: 2
    HURT stats (rel)   min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
    95% mean confidence interval for instructions value: -2.03 2.28
    95% mean confidence interval for instructions %-change: -0.41% 0.15%
    Inconclusive result (value mean confidence interval includes 0).

  Now that more is done in the IR, more dependencies are visible and
  more SWSB annotations are emitted.  Mixed with different register
  allocation decisions like above, some shaders will see more `sync
  nops` while others able to avoid them.

  Most of the new `sync nops` are also redundant and could be dropped,
  which will be fixed in a separate change.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00
Caio Marcelo de Oliveira Filho
0e96b0d6dd intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers
It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
2020-04-29 07:17:27 +00:00
Francisco Jerez
5e2a7e11b4 intel/ir: Remove scheduling-based cycle count estimates.
The cycle count estimation logic part of the scheduler is now
redundant with the shader performance modeling pass, and the estimates
can be consolidated into the brw::performance analysis result object
instead of being part of the CFG, which guarantees that the estimates
cannot be accessed without previously calling the
performance_analysis::require() method, which makes sure that the
right analysis pass is executed at the right time if we don't already
have up-to-date cached results.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:01:27 -07:00
Francisco Jerez
486f3b04a5 intel/ir: Pass block cycle count information explicitly to disassembler.
So we can eventually remove the cycle count estimates from the CFG
data structure and consolidate performance information in the
brw::performance object.

It would be cleaner to pass the brw::performance object directly to
the disassembler but that isn't straightforward since the disassembler
is built as a plain C file unlike the rest of the compiler back-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:01:27 -07:00
Francisco Jerez
6579f562c3 intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats.
These should be more accurate than the current cycle counts, since
among other things they consider the effect of post-scheduling passes
like the software scoreboard on TGL.  In addition it will enable us to
clean up some of the now redundant cycle-count estimation
functionality in the instruction scheduler.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:01:27 -07:00
Francisco Jerez
65342be3ae intel/fs: Add INTEL_DEBUG=no32 debugging flag.
This is useful in order to identify codegen issues caused by SIMD32.
It doesn't currently have any effect on compute shaders since SIMD32
dispatch is only enabled for CS when it's strictly necessary to do so
in order to support the workgroup size requested for the shader --
That might change in the future though when we hook up the SIMD32
heuristic to CS compilation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:01:27 -07:00