Commit graph

4254 commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho
e45bf01940 spirv: Change spirv_to_nir() to return a nir_shader
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it.  There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.

The return type reflects better what the function name suggests.  It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it.  When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Kenneth Graunke
bc273dece2 intel/decoder: Use get_state_size() over guessed counts in more cases
This makes the following packets use actual driver provided sizes rather
than guessing an arbitrary number:

  - CC_VIEWPORT
  - SF_CLIP_VIEWPORT
  - BLEND_STATE
  - COLOR_CALC_STATE
  - SCISSOR_RECT

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-05-28 13:44:16 -07:00
Kenneth Graunke
6a9e39d44b iris: Ask st to vectorize our IO.
(Technically this is common code, but it doesn't affect i965 or anv.)

Improves performance of GFXBench5/gl_tess_off on Skylake GT4e at 1080p
by 9.3933% +/- 0.0305157% by eliminating all spilling in the GS.

Improves performance of GFXBench5/gl_4_off (Car Chase) on Skylake GT4e
at 1080p by 0.325208% +/- 0.0842233% (n=18).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-28 01:06:48 -07:00
Lionel Landwerlin
2042f22e28 anv: fix apply_pipeline_layout pass for arrays of YCbCr descriptors
When using the binding tables to access arrays of YCbCr descriptors we
did not consider the offset of the accessed element. We can't do a
simple multiple because the binding table entries are tightly packed.

For example element 0 of the array could use 2 entries/planes and
element 1 could use 2 entries/planes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3bb8768b9d ("anv: toggle on support for VK_EXT_ycbcr_image_arrays")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-05-27 22:47:53 +01:00
Chenglei Ren
13b38ca1e4 anv/android: fix missing dependencies issue during parallel build
The libmesa_anv_gen* modules require anv_extensions.h, patch makes sure
it gets generated as a dependency before building them.

Signed-off-by: Chenglei Ren <chenglei.ren@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
2019-05-27 10:13:17 +03:00
Jason Ekstrand
f2dc0f2872 nir: Drop imov/fmov in favor of one mov instruction
The difference between imov and fmov has been a constant source of
confusion in NIR for years.  No one really knows why we have two or when
to use one vs. the other.  The real reason is that they do different
things in the presence of source and destination modifiers.  However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
8ffbb54405 intel: Implement abs, neg, and sat in the back-end
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
4fde459563 intel/nir: Call alu_to_scalar one last time before out-of-ssa
A few of our very late passes can end up generating vectors accidentally
so we need to get rid of them.  The only known case of this is the ffma
peephole which generates fneg and fabs as vectors.  Currently, they're
not a problem because they get turned into fmov which the back-end
compiler knows how to handle as a vector.  That's about to change.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Jason Ekstrand
ddd08e1888 nir/builder: Remove the use_fmov parameter from nir_swizzle
This flag has caused more confusion than good in most cases.  You can
validly use imov for floats or fmov for integers because, without source
modifiers, neither modify their input in any way.  Using imov for floats
is more reliable so we go that direction.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-24 08:38:11 -05:00
Danylo Piliaiev
c82dcf89ae anv: Do not emulate texture swizzle for INPUT_ATTACHMENT, STORAGE_IMAGE
If descriptorType is VK_DESCRIPTOR_TYPE_STORAGE_IMAGE
or VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT, the imageView member of each
element of pImageInfo must have been created with the identity swizzle.

Fixes: d2aa65eb

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-24 09:20:38 +00:00
Lionel Landwerlin
cb7c9b2a93 vulkan: fix build dependency issue with generated files
On machines with many cores, you can run into that issue :

../mesa-9999/src/vulkan/overlay-layer/overlay.cpp:42:10: fatal error: vk_enum_to_str.h: No such file or directory

v2: Move declare_dependency around (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Jan Ziak
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-22 14:07:14 +00:00
Kenneth Graunke
419d9b21e1 intel: Move brw_prog_key_set_id from i965 to the compiler.
I want to use it in iris.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-21 15:05:38 -07:00
Caio Marcelo de Oliveira Filho
cf05ffbfd6 anv: Don't re-use entry_point pointer from spirv_to_nir
When running with NIR_TEST_CLONE=1, the pointer will not be valid, as
the whole shader is going to be recreated every pass.  Prefer using
is_entrypoint (to query when looping) and nir_shader_get_entrypoint()
instead.

Fixes the Vulkan Piglit tests
- vulkan/glsl450/frexp-double
- vulkan/glsl450/isinf-double
- vulkan/shaders/fs-multiple-large-local-array

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-20 16:47:39 -07:00
Caio Marcelo de Oliveira Filho
31a7476335 spirv, radv, anv: Replace ptr_type with addr_format
Instead of setting the glsl types of the pointers for each resource,
set the nir_address_format, from which we can derive the glsl_type,
and in the future the bit pattern representing a NULL pointer.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-20 10:53:38 -07:00
Jason Ekstrand
1c92358bd8 anv: Only consider minSampleShading when sampleShadingEnable is set
From the Vulkan 1.1.107 spec:

    Sample shading is enabled for a graphics pipeline:

      - If the interface of the fragment shader entry point of the
        graphics pipeline includes an input variable decorated with
        SampleId or SamplePosition. In this case minSampleShadingFactor
        takes the value 1.0.

      - Else if the sampleShadingEnable member of the
        VkPipelineMultisampleStateCreateInfo structure specified when
        creating the graphics pipeline is set to VK_TRUE. In this case
        minSampleShadingFactor takes the value of
        VkPipelineMultisampleStateCreateInfo::minSampleShading.

    Otherwise, sample shading is considered disabled.

In other words, if sampleShadingEnable is set to VK_FALSE, we should
ignore minSampleShading.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-17 20:33:57 +00:00
Jason Ekstrand
8413fd136c anv: Stop forcing bindless for images
This was an unintended artifact of my testing of bindless images.  We
should be choosing bindless or not dynamically.

Fixes: c0d9926df7 "anv: Use bindless handles for images"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-17 19:58:51 +00:00
Jason Ekstrand
d2aa65eb18 anv: Emulate texture swizzle in the shader when needed
Now that we have the descriptor buffer mechanism, emulated texture
swizzle can be implemented in a very non-invasive way.  Previous
attempts all tried to extend the push constant based image param
mechanism which was gross.  This could, in theory, be done much faster
with a magic back-end instruction which does indirect MOVs but Vulkan on
IVB is already so slow this isn't going to matter much.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104355
Cc: "19.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-17 12:25:58 -05:00
Nanley Chery
629806b55b anv: Fix some depth buffer sampling cases on ICL+
Don't attempt sampling with HiZ if the sampler lacks support for it. On
ICL, the HW docs state that sampling with HiZ is not supported and that
instances of AUX_HIZ in the RENDER_SURFACE_STATE object will be
interpreted as AUX_NONE.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-05-16 20:54:53 +00:00
Jason Ekstrand
fce0214e94 intel/fs/live_variables: Do compute_start_end in BITSET_WORD chunks
For a block with a contiguous chunk of 32 vars that don't need updating,
this lets us skip 32 vars at a time. Also, by using bitscan, we only
iterate for each set bit rather than testing them all one at a time.
Looking at perf (with -O0 which is unfortunately necessary to get
reasonable back-traces), this seems to cuts about 50-60% of the time
spent in compute_start_end() which is, itself about 4-6% of the
run-time. In the real world, with a release driver build, this cuts
1.34% off a full shader-db run. (I ran shader-db 5 times in each
configuration).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-16 02:14:40 +00:00
Jason Ekstrand
b2d274c677 intel/fs/ra: Choose a spill reg before throwing away the graph
Otherwise, we get an effectively random spill reg because we no longer
have the information from RA to guide us.  Also, a completely clean
graph has undefined data in in_stack which is used for choosing the
spill reg so it really is non-deterministic.

Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
c19acf321c intel/fs/ra: Add spill costs to the graph on-demand
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
2c14e2b5bf intel/fs/ra: Add a helper for discarding the interference graph
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Dave Airlie
4efd04ab18 intel/compiler: use bitset instead of opencoding a 32-bit bitset. (v2)
In the future I want to expand this to 128-bits, for vec16 support, so
lets just put the code in place to use bitset ranges now.

v2: just declare the bitset to be the max of what we should ever see
and change assert to reflect it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-15 07:10:34 +10:00
Dave Airlie
3b2c433167 intel/compiler: remove repeated bit_size / 8 in brw mem lowering pass.
Just use a variable already.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-15 07:10:30 +10:00
Kenneth Graunke
646924cfa1 intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8
Our tessellation control shaders can be dispatched in several modes.

- SINGLE_PATCH (Gen7+) processes a single patch per thread, with each
  channel corresponding to a different patch vertex.  PATCHLIST_N will
  launch (N / 8) threads.  If N is less than 8, some channels will be
  disabled, leaving some untapped hardware capabilities.  Conditionals
  based on gl_InvocationID are non-uniform, which means that they'll
  often have to execute both paths.  However, if there are fewer than
  8 vertices, all invocations will happen within a single thread, so
  barriers can become no-ops, which is nice.  We also burn a maximum
  of 4 registers for ICP handles, so we can compile without regard for
  the value of N.  It also works in all cases.

- DUAL_PATCH mode processes up to two patches at a time, where the first
  four channels come from patch 1, and the second group of four come
  from patch 2.  This tries to provide better EU utilization for small
  patches (N <= 4).  It cannot be used in all cases.

- 8_PATCH mode processes 8 patches at a time, with a thread launched per
  vertex in the patch.  Each channel corresponds to the same vertex, but
  in each of the 8 patches.  This utilizes all channels even for small
  patches.  It also makes conditions on gl_InvocationID uniform, leading
  to proper jumps.  Barriers, unfortunately, become real.  Worse, for
  PATCHLIST_N, the thread payload burns N registers for ICP handles.
  This can burn up to 32 registers, or 1/4 of our register file, for
  URB handles.  For Vulkan (and DX), we know the number of vertices at
  compile time, so we can limit the amount of waste.  In GL, the patch
  dimension is dynamic state, so we either would have to waste all 32
  (not reasonable) or guess (badly) and recompile.  This is unfortunate.
  Because we can only spawn 16 thread instances, we can only use this
  mode for PATCHLIST_16 and smaller.  The rest must use SINGLE_PATCH.

This patch implements the new 8_PATCH TCS mode, but leaves us using
SINGLE_PATCH by default.  A new INTEL_DEBUG=tcs8 flag will switch to
using 8_PATCH mode for testing and benchmarking purposes.  We may
want to consider using 8_PATCH mode in Vulkan in some cases.

The data I've seen shows that 8_PATCH mode can be more efficient in
some cases, but SINGLE_PATCH mode (the one we use today) is faster
in other cases.  Ultimately, the TES matters much more than the TCS
for performance, so the decision may not matter much.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:30 -07:00
Kenneth Graunke
076159b40b intel/compiler: Move ICP handle fetching into a helper function.
This will be significantly different in 8_PATCH mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:28 -07:00
Kenneth Graunke
3d84fd29e8 intel/compiler: Don't repeat dispatch max fixing condition
Having a single flag will keep both places in sync if the condition
gets more complicated.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:27 -07:00
Kenneth Graunke
f0d52cf2b0 intel/compiler: Rename invocation_id_mask to instance_id_mask
The payload field is actually "instance" (thread number), which is used
to calculate the invocation ID.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:25 -07:00
Kenneth Graunke
d86260719e intel/compiler: Refactor TCS invocation ID setup into a helper
When we add 8_PATCH mode, this will get a bit more complex, so we may
as well start by putting it in a helper function.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:24 -07:00
Ian Romanick
45c7ff95fc intel/compiler: Repeat nir_opt_algebraic_late
A tiny bit of help seems to come from nir_copy_prop.  Future patches
will benefit from this change.

Doing more copy propagation on the vec4 backend led to a disaster in
hurt cycles.

v2: Fix typo in comment.  Noticed by Matt.

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224634 -> 17224623 (<.01%)
instructions in affected programs: 4586 -> 4575 (-0.24%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.36% -0.19%
Instructions are helped.

total cycles in shared programs: 360828542 -> 360828714 (<.01%)
cycles in affected programs: 151159 -> 151331 (0.11%)
helped: 49
HURT: 28
helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6
helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42%
HURT stats (abs)   min: 1 max: 196 x̄: 52.36 x̃: 15
HURT stats (rel)   min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88%
95% mean confidence interval for cycles value: -13.48 17.95
95% mean confidence interval for cycles %-change: -0.69% 0.84%
Inconclusive result (value mean confidence interval includes 0).

Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13529544 -> 13529542 (<.01%)
instructions in affected programs: 358 -> 356 (-0.56%)
helped: 2
HURT: 0

total cycles in shared programs: 357290311 -> 357289678 (<.01%)
cycles in affected programs: 178324 -> 177691 (-0.35%)
helped: 48
HURT: 40
helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13
helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66%
HURT stats (abs)   min: 1 max: 224 x̄: 22.00 x̃: 6
HURT stats (rel)   min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31%
95% mean confidence interval for cycles value: -18.28 3.89
95% mean confidence interval for cycles %-change: -1.01% 0.32%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8159110 -> 8158980 (<.01%)
instructions in affected programs: 22719 -> 22589 (-0.57%)
helped: 65
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74%
95% mean confidence interval for instructions value: -2.06 -1.94
95% mean confidence interval for instructions %-change: -0.78% -0.68%
Instructions are helped.

total cycles in shared programs: 188609448 -> 188609214 (<.01%)
cycles in affected programs: 1875852 -> 1875618 (-0.01%)
helped: 109
HURT: 104
helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4
helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07%
HURT stats (abs)   min: 2 max: 20 x̄: 3.31 x̃: 2
HURT stats (rel)   min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02%
95% mean confidence interval for cycles value: -1.95 -0.25
95% mean confidence interval for cycles %-change: -0.04% -0.01%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-14 11:38:22 -07:00
Ian Romanick
a79570099b intel/fs: Allow cmod propagation to instructions with saturate modifier
v2: Add unit tests.  Suggested by Matt.

All Intel GPUs had similar results. (Ice Lake shown)
total instructions in shared programs: 17229441 -> 17228658 (<.01%)
instructions in affected programs: 159574 -> 158791 (-0.49%)
helped: 489
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.60 x̃: 1
helped stats (rel) min: 0.07% max: 2.70% x̄: 0.61% x̃: 0.59%
95% mean confidence interval for instructions value: -1.72 -1.48
95% mean confidence interval for instructions %-change: -0.64% -0.58%
Instructions are helped.

total cycles in shared programs: 360944149 -> 360937144 (<.01%)
cycles in affected programs: 1072195 -> 1065190 (-0.65%)
helped: 254
HURT: 27
helped stats (abs) min: 2 max: 234 x̄: 30.51 x̃: 9
helped stats (rel) min: 0.04% max: 8.99% x̄: 0.75% x̃: 0.24%
HURT stats (abs)   min: 2 max: 83 x̄: 27.56 x̃: 24
HURT stats (rel)   min: 0.09% max: 3.79% x̄: 1.28% x̃: 1.16%
95% mean confidence interval for cycles value: -30.11 -19.75
95% mean confidence interval for cycles %-change: -0.70% -0.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
2019-05-14 11:38:21 -07:00
Jason Ekstrand
e99081e76d intel/fs/ra: Spill without destroying the interference graph
Instead of re-building the interference graph every time we spill, we
modify it in place so we can avoid recalculating liveness and the whole
O(n^2) interference graph building process.  We make a simplifying
assumption in order to do so which is that all spill/fill temporary
registers live for the entire duration of the instruction around which
we're spilling.  This isn't quite true because a spill into the source
of an instruction doesn't need to interfere with its destination, for
instance.  Not re-calculating liveness also means that we aren't
adjusting spill costs based on the new liveness.  The combination of
these things results in a bit of churn in spilling.  It takes a large
cut out of the run-time of shader-db on my laptop.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311224 -> 15311360 (<.01%)
    instructions in affected programs: 77027 -> 77163 (0.18%)
    helped: 11
    HURT: 18

    total cycles in shared programs: 355544739 -> 355830749 (0.08%)
    cycles in affected programs: 203273745 -> 203559755 (0.14%)
    helped: 234
    HURT: 190

    total spills in shared programs: 12049 -> 12042 (-0.06%)
    spills in affected programs: 2465 -> 2458 (-0.28%)
    helped: 9
    HURT: 16

    total fills in shared programs: 25112 -> 25165 (0.21%)
    fills in affected programs: 6819 -> 6872 (0.78%)
    helped: 11
    HURT: 16

    Total CPU time (seconds): 2469.68 -> 2360.22 (-4.43%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
147665d0a2 intel/fs/ra: Put the VGRFs at the end of the nodes
This is slightly less convenient in some places but it will make it much
easier when we want to start adding nodes dynamically.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
e7b7d572b3 intel/fs/ra: Re-arrange interference setup
The old code was arranged by the type of interference being added.  It
would set up payload registers and then add payload interference for all
VGRFs.  It would set up MRFs and add MRF interference for all VGRFs.
This commit re-arranges things to be organized differently.  It first
creates and sets up all RA nodes and then groups interference into two
new categories:  live range and instruction interference.  Once all the
RA nodes have been set up, it walks the list of VGRFs and sets up their
live range interference and then walks the list of instructions and sets
up instruction interference.  This new arrangement will be advantageous
for a future patch but, at the moment, it cuts 2% off the run-time of
shader-db on my laptop.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311224 -> 15311224 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355544739 -> 355544739 (0.00%)
    cycles in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    Total CPU time (seconds): 2523.45 -> 2469.68 (-2.13%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
0fd60e95fb intel/fs/ra: Do the spill loop inside RA
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
47b1dcdcab intel/fs/ra: Only add MRF hack interference if we're spilling
The only use of the MRF hack these days is for spilling and there we
don't need the precise MRF usage information.  If we're spilling then we
know pretty well how many MRFs are going to be used.  It is possible if
the only things that are spilled have fewer SIMD channels than the
dispatch width of the shader that this may be more MRFs than needed.
That's a risk we're willing to takd.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311224 (<.01%)
    instructions in affected programs: 16664 -> 16788 (0.74%)
    helped: 1
    HURT: 5

    total cycles in shared programs: 355543197 -> 355544739 (<.01%)
    cycles in affected programs: 731864 -> 733406 (0.21%)
    helped: 3
    HURT: 6

The hurt shaders are all SIMD32 compute shaders where we reserve enough
space for a 32-wide spill/fill but don't need it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
69878a9bb0 intel/fs/ra: Pull the guts of RA into its own class
This accomplishes two things.  First, it makes interfaces which are
really private to RA private to RA.  Second, it gives us a place to
store some common stuff as we go through the algorithm.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
9e00a251be intel/fs/ra: Move assign_regs further down in the file
It's the main function from which all the other functions are called.
It belongs at the bottom.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
5d9ac57c8c intel/fs/ra: Split building the interference graph into a helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
472ef2f98d intel/fs/ra: Initialize grf_used with first_non_payload_grf
There's no reason why we need to use the calculated payload_node_count
value which is just first_non_payload_grf aligned up.  The grf_used
value will be aligned up to 16 anyway (which is a much bigger alignment)
before being handed off to hardware.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
096ad8a809 intel/fs/ra: Stop adding RA interference to too many SENDS nodes
We only have one node per VGRF so this was adding way too much
interference.  No idea how we didn't catch this before.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311100 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355468050 -> 355543197 (0.02%)
    cycles in affected programs: 2472492 -> 2547639 (3.04%)
    helped: 17
    HURT: 20

Fixes: 014edff0d2 "intel/fs: Add interference between SENDS sources"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
88cac12230 intel/fs/ra: Only add dest interference to sources that exist
Fixes: 83dedb6354 "i965: Add src/dst interference for certain"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
6212326941 intel/fs: Stop doing extra RA calls
In the last phase of the schedule and RA loop, the RA call is redundant
if we spill.  Immediately afterwards, we're going to see that we
couldn't allocate without spilling and call back into RA and tell it to
go ahead and spill.  We've known about it for a while but we've always
brushed over it on the theory that, if you're going to spill, you'll be
calling RA a bunch anyway and what does one extra RA hurt?  As it turns
out, it hurts more than you'd expect.  Because the RA interference graph
gets sparser with each spill and the RA algorithm is more efficient on
sparser graphs, the RA call that we're duplicating is actually the most
expensive call in the RA-and-spill loop.

There's another extra RA call we do that's a bit harder to see which
this also removes.  If we try to compile a shader that isn't the minimum
dispatch width and it fails to allocate without spilling we call fail()
to set an error but then go ahead and do the first spilling RA pass and
only after that's complete do we detect the fail and bail out.  By
making minimum dispatch widths part of the spill condition, we side-step
this problem.

Getting rid of these extra spills takes the compile time of a nasty
Aztec Ruins shader from about 28 seconds to about 26 seconds on my
laptop.  It also makes shader-db 1.5% faster

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311100 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355468050 -> 355468050 (0.00%)
    cycles in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Nanley Chery
29a13eb71d isl: Add restrictions to isl_surf_get_hiz_surf()
Import some restrictions from intel_tiling_supports_hiz() and
intel_miptree_supports_hiz().

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
d57242190e isl: Add restriction and comments to isl_surf_get_ccs_surf()
Import some restrictions and comments from intel_miptree_supports_ccs().

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Nanley Chery
1de089797c isl: Modify restrictions in isl_surf_get_mcs_surf()
Import some restrictions from intel_miptree_supports_mcs() and don't
assume that the caller knows which device generations are supported.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-14 16:23:12 +00:00
Jason Ekstrand
0745d4bd96 anv: Implement VK_KHR_uniform_buffer_standard_layout
There's no real work to do here since we already support scalar block
layout which is a direct superset of what this extension allows.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-13 17:20:33 -05:00
Vinson Lee
20b42fad9b intel/tools: Fix build with glibc < 2.27.
glibc < 2.27 defines OVERFLOW in /usr/include/math.h.

This patch fixes this build error.

In file included from ../include/c99_math.h:37:0,
                 from ../src/util/u_math.h:44,
                 from ../src/mesa/main/macros.h:35,
                 from ../src/intel/compiler/brw_reg.h:47,
                 from ../src/intel/tools/i965_asm.h:32,
                 from ../src/intel/tools/i965_gram.y:29:
src/intel/tools/i965_gram.tab.c:562:5: error: expected identifier before numeric constant
     OVERFLOW = 412,
     ^

Fixes: 70308a5a8a ("intel/tools: New i965 instruction assembler tool")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110656
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
2019-05-13 11:05:48 -07:00
Mike Blumenkrantz
7b2468bf6e intel: drop misleading driver name from gen_get_device_info() 2019-05-11 04:14:06 +00:00
Caio Marcelo de Oliveira Filho
3610081daa anv: Fix limits when VK_EXT_descriptor_indexing is used
Update various limits in
VkPhysicalDeviceDescriptorIndexingPropertiesEXT that were previously
zero to their values from VkPhysicalDeviceLimits.  When using
VK_EXT_descriptor_indexing, the former limits will apply to all the
descriptor layout sets -- not only those using the new feature bits.

For the reference, VK_EXT_descriptor_indexing says

    "There are new descriptor set layout and descriptor pool creation
    flags that are required to opt in to the update-after-bind
    functionality, and there are separate maxPerStage* and
    maxDescriptorSet* limits that apply to these descriptor set
    layouts which may be much higher than the pre-existing limits. The
    old limits only count descriptors in non-updateAfterBind
    descriptor set layouts, and the new limits count descriptors in
    all descriptor set layouts in the pipeline layout."

Fixes: 6e230d7607 "anv: Implement VK_EXT_descriptor_indexing"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-10 15:15:11 -07:00