Commit graph

4157 commits

Author SHA1 Message Date
Jason Ekstrand
3d33c13eca anv/descriptor_set: Only vma_heap_finish if we have a descriptor buffer
Fixes: 7bb34ecff9 "anv: release memory allocated by bo_heap when..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-24 05:40:27 +00:00
Jason Ekstrand
0bc1942c9d anv/descriptor_set: Destroy sets before pool finalization
Fixes: 105002bd2d "anv: destroy descriptor sets when pool gets..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-24 05:40:27 +00:00
Jason Ekstrand
6be603edf7 anv/descriptor_set: Unlink sets from the pool in set_destroy
anv_descriptor_pool_free_set is called on the clean-up path of
anv_descriptor_set_create and the set may not have been added to the
pool's list of sets yet.  While we're here, we move adding it to that
list into set_create for symmetry.

Fixes: 105002bd2d "anv: destroy descriptor sets when pool gets..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-24 05:40:27 +00:00
Ian Romanick
21223acf7d intel/fs: Fix D to W conversion in opt_combine_constants
Found by GCC warning:

src/intel/compiler/brw_fs_combine_constants.cpp: In function ‘bool needs_negate(const fs_reg*, const imm*)’:
src/intel/compiler/brw_fs_combine_constants.cpp:306:34: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
       return ((reg->d & 0xffffu) < 0) != (imm->w < 0);
               ~~~~~~~~~~~~~~~~~~~^~~

The result of the bit-and is a 32-bit value with the top bits all zero.
This will never be < 0.  Instead of masking off the bits, just cast to
int16_t and let the compiler handle the actual conversion.

Fixes: e64be391dd ("intel/compiler: generalize the combine constants pass")
Cc: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-23 19:48:33 -07:00
Ian Romanick
26391cceaa intel/compiler: Lower ffma on Gen4 and Gen5
flrp32 is also a 3-source instruction, but there is another pending
series that handles that for Gen4 and Gen5.

v2: Rebase on "intel/compiler: Don't have sepearate, per-Gen
nir_options"

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-23 17:50:28 -07:00
Ian Romanick
fd1fa9afc7 intel/compiler: Don't have sepearate, per-Gen nir_options
Instead, just have separate scalar vs. vector nir_options and do
per-Gen "fix ups".

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-23 17:50:16 -07:00
Bas Nieuwenhuizen
f2e0f5c3c4 vulkan/wsi: Add X11 adaptive sync support based on dri options.
The dri options are optional. When the dri options are not provided
the WSI will not  use adaptive sync.

FWIW I think for xf86-video-amdgpu this still requires an X11 config
option, so only people who opt in can get possible regressions from this.

So then the remaining question is: why do this in the WSI?

It has been suggested in another MR that the application sets this.
However, I disagree with that as I don't think we'll ever get a
reasonable set of applications setting it.

The next questions is whether this can be a layer. It definitely
can be as implemented now. However, I think this generally fits
well with the function of the WSI. Furthemore, for e.g. the DISPLAY
WSI this is much harder to do in a layer.

Of course, most of the WSI could almost be a layer, but I think
this still fits best in the WSI.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-23 23:49:39 +00:00
Lionel Landwerlin
0fb0058f18 anv: fix argument name for vkCmdEndQuery
Doesn't fix anything but it's not the right function prototype.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 673f33c77d ("anv: Implement CmdBegin/EndQueryIndexed")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-04-24 04:33:26 +08:00
Lionel Landwerlin
b1ba7ffdbd intel: workaround VS fixed function issue on Gen9 GT1 parts
The issue is noticeable in the
dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_3d
test where a triangle goes missing when we use the maximum number of
URB entries as specified by the documentation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107505
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-23 13:41:20 +08:00
Matt Turner
4ec258ac3c intel/compiler: Improve fix_3src_operand()
Allow ATTR and IMM sources unconditionally (ATTR are just GRFs, IMM will
be handled by opt_combine_constants(). Both are already allowed by
opt_copy_propagation().

Also allow FIXED_GRF if the regioning is 8,8,1. Could also allow other
stride=1 regions (e.g., 4,4,1) and scalar regions but I don't think
those occur. This is sufficient to allow a pass added in a future commit
(fs_visitor::lower_linterp) to avoid emitting extra MOV instructions.

I removed the 'src.stride > 1' case because it seems wrong: 3-src
instructions on Gen6-9 are align16-only and can only do stride=1 or
stride=0. A run through Jenkins with an assert(src.stride <= 1) never
triggers, so it seems that it was dead code.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-04-22 16:54:31 -07:00
Matt Turner
8aae7a3998 intel/compiler: Add unit tests for sat prop for different exec sizes
The two new unit tests verify that propagating a saturate between
instructions of different exec sizes does not happen.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-04-22 16:54:21 -07:00
Matt Turner
54d4d34b96 intel/compiler: Use SIMD16 instructions in fs saturate prop unit test
Will allow us to test that propagation between instructions of different
exec sizes does not happen (in the next commit).

The stray-looking change in intervening_dest_write is to adjust the size
of the texture result to keep the test functioning identically when the
instructions' exec sizes are doubled. Without the change, the texture
does not overwrite the destination fully as the unit test intends.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-04-22 16:54:17 -07:00
Rafael Antognolli
70e03e220c intel/fs: Remove fs_generator::generate_linterp from gen11+.
We now have a lowering pass that will do this at the fs_visitor level,
so we can remove this code from gen11+.

v2: Reduce size of the "i" array from 4 to 2 (Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:54:00 -07:00
Rafael Antognolli
9ea90aae1e intel/fs: Add a lowering pass for linear interpolation.
On gen11, instead of using a PLN instruction, we convert
FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the
fs_generator code.

This patch adds a lowering pass that does the same thing at the
fs_visitor. It also drops the usage of NF types, since we don't need the
extra precision and it lets us skip the accumulator. With all that, some
optimizations will still be run on the generated code, and we should get
better scheduling.

v2: Update comment about saturation and conditional mod (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:54:00 -07:00
Rafael Antognolli
c0504569ea intel/fs: Move the scalar-region conversion to the generator.
Move the scalar-region conversion from the IR to the generator, so it
doesn't affect the Gen11 path. We need the non-scalar regioning
for a later lowering pass that we are adding.

v2: Better commit message (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:54:00 -07:00
Rafael Antognolli
0778748eba intel/fs: Only propagate saturation if exec_size is the same.
Otherwise it could propagate the saturation from a SIMD16 instruction
into a SIMD8 instruction. With that, only part of the destination
register, which is the source of the move with saturation, would have
been updated.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:53:55 -07:00
Ian Romanick
a6ccc4c0c8 intel/fs: Add support for float16 to the fsign optimizations
Commit ad98fbc217 ("intel/fs: Refactor code generation for nir_op_fsign
to its own function") criss-crossed with c2b8fb9a81 ("anv/device:
expose VK_KHR_shader_float16_int8 in gen8+"), and I was not paying
enough attention when I rebased.  This adds back the float16 changes and
enables the optimization.

v2: Incorporate more changes from 19cd2f5deb and a8d8b1a139 that I
missed in the previous version.

Fixes: ad98fbc217 ("intel/fs: Refactor code generation for nir_op_fsign to its own function")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110474
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
2019-04-20 20:49:34 -07:00
Jason Ekstrand
828ec41154 anv: Rework the descriptor set layout create loop
Previously, we were storing the per-binding create info pointer in the
immutable_samplers field temporarily so that we can switch the order in
which we walk the loop.  However, now that we have multiple arrays of
structs to walk, it makes more sense to store an index of some sort.
Because we want to leave immutable_samplers as NULL for undefined
bindings, we store index + 1 and then subtract one later.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 23:26:41 +00:00
Jason Ekstrand
2b388c3d04 anv: Ignore descriptor binding flags if bindingCount == 0
I missed this on the first go round.  The bindingCount field of
VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero
which means the flags array is ignored.

Fixes: d6c9bd6e01 "anv: Put binding flags in descriptor set layouts"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 23:26:41 +00:00
Jason Ekstrand
9ce7c29724 anv/nir: Add a central helper for figuring out SSBO address formats
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
6e230d7607 anv: Implement VK_EXT_descriptor_indexing
Now that everything is in place to do bindless for all resource types
except input attachments and UBOs, VK_EXT_descriptor_indexing is
"trivial".

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
d6c9bd6e01 anv: Put binding flags in descriptor set layouts
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
c0d9926df7 anv: Use bindless handles for images
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
83af92e593 intel/fs: Add support for bindless image load/store/atomic
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
e6803f6b6f anv: Use bindless textures and samplers
This commit changes anv to put bindless handles and sampler pointers
into the descriptor buffer and use those instead of bindful when we run
out of binding table space.  This "spilling" of descriptors allows to to
advertise an almost unbounded number of images and samplers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
bf61f057f7 anv: Pass the plane into lower_tex_deref
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
f16fcb9db7 anv: Use write_image_view to initialize immutable samplers
Instead of setting it manually, call the helper.  When setting
descriptor sets becomes more complicated than just setting some struct
values, this will keep immutable sampler handling correct.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
e612c3b9bf anv: Count the number of planes in each descriptor binding
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
843286d324 intel/fs: Add support for bindless texture ops
We add two new texture sources for bindless surface and sampler handles.
Bindless surface handles are expected to be pre-shifted so that the
20-bit surface state table index is in the top 20 bits of the 32-bit
handle.  This lets us avoid any extra shifts in the shader.  Bindless
sampler handles are 32-byte aligned byte offsets from general state base
address.  We use 32-byte aligned instead of 16-byte aligned to avoid
having to use more indirect messages than needed.  It means we can't
tightly pack samplers but that's probably not a big deal.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
2edf29b933 intel,nir: Lower TXD with a bindless sampler
When we have a bindless sampler, we need an instruction header.  Even in
SIMD8, this pushes the instruction over the sampler message size maximum
of 11 registers.  Instead, we have to lower TXD to TXL.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
bd56ce8ce5 anv: Implement VK_KHR_shader_atomic_int64
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
79fb0d27f3 anv: Implement SSBOs bindings with GPU addresses in the descriptor BO
This commit adds a new way for ANV to do SSBO bindings by just passing a
GPU address in through the descriptor buffer and using the A64 messages
to access the GPU address directly.  This means that our variable
pointers are now "real" pointers instead of a vec2(BTI, offset) pair.
This carries a few of advantages:

 1. It lets us support a virtually unbounded number of SSBO bindings.

 2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't
    implement before because those atomic messages are only available
    in the bindless A64 form.

 3. It's way better than messing around with bindless handles for SSBOs
    which is the only other option for VK_EXT_descriptor_indexing.

 4. It's more future looking, maybe?  At the least, this is what NVIDIA
    does (they don't have binding based SSBOs at all).  This doesn't a
    priori mean it's better, it just means it's probably not terrible.

The big disadvantage, of course, is that we have to start doing our own
bounds checking for robustBufferAccess again have to push in dynamic
offsets.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
3cf78ec2bd anv: Lower some SSBO operations in apply_pipeline_layout
In order to avoid the potential overhead of A64 operations on all SSBO
ops, we look for those SSBO ops where we can get to the descriptor set
from the SSBO access operation and lower those to a binding-table
approach.  When robustBufferAccess is enabled, this lets the hardware do
the bounds checking for us.  It also avoids some potentially expensive
64-bit integer calculations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
e7a1e8f735 anv: Add a has_a64_buffer_access to anv_physical_device
This is more descriptive and a bit nicer than checking for gen >= 8 &&
use_softpin everywhere.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
b1a633d9fb intel/nir: Re-run int64 lowering in postprocess_nir
We're about to start doing 64-bit pointer calculations in ANV.  They
will get applied after brw_preprocess_nir which is where we currently do
64-bit integer arithmetic lowering.  Because we're adding 64-bit integer
arithmetic after the initial lowering has happened, we need to lower
again.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
146deec9ef anv/pipeline: Add skeleton support for spilling to bindless
If the number of surfaces or samplers exceeds what we can put in a
table, we will want to spill out to bindless.  There is no bindless
support yet but this gets us the basic framework that will be used by
later commits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
a7d4871846 anv/pipeline: Sort bindings by most used first
This commit just sorts the bindings by how often they're used vs the
array size of the binding.  This will let us make more nuanced decisions
about what goes in the binding table vs. what to make bindless.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
a5a0dc08f1 anv: Add a #define for the max binding table size
This also fixes a bug where we mis-calculate maximum binding table sizes
and may return true in vkGetDescriptorSetLayoutSupport even for sets too
large to fit in a binding table.

Fixes: ddc4069122 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
3b755b52e8 anv: Put image params in the descriptor set buffer on gen8 and earlier
This is really where they belong; not push constants.  The one downside
here is that we can't push them anymore for compute shaders.  However,
that's a general problem and we should figure out how to push descriptor
sets for compute shaders.  This lets us bump MAX_IMAGES to 64 on BDW and
earlier platforms because we no longer have to worry about push constant
overhead limits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Jason Ekstrand
83b943cc2f anv: Make all VkDeviceMemory BOs resident permanently
We spend a lot of time in the driver adding things to hash sets to track
residency.  The reality is that a properly built Vulkan app uses large
memory objects and sub-allocates from them.  In a typical frame, most of
if not all of those allocations are going to be resident for the entire
frame so we're really not saving ourselves much by tracking fine-grained
residency.  Just throwing everything in the validation list does make it
a little bit more expensive inside the kernel to walk the list and
ensure that all our VA is in order.  However, without relocations, the
overhead of that is pretty small.

If we ever do run into a memory pressure situation where the fine-
grained residency could even potentially help, we would likely be
swapping one page out to make room for another within the draw call and
performance is totally lost at that point.  We're better off swapping
out other apps and just letting ours run a whole frame.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 19:56:42 +00:00
Lionel Landwerlin
0d46e40467 anv: limit URB reconfigurations when using blorp
If the last graphics pipeline bound to the command buffer has enough
space in its VS URB entries for Blorp then avoid reconfiguring the URB
partitions.

v2: s/0/MESA_SHADER_VERTEX/ (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-19 16:58:06 +01:00
Lionel Landwerlin
84e70556fb intel/devinfo: add basic sanity tests on device database
v2: #undef NDEBUG (Eric)
    Use inc_include & inc_src (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com
2019-04-19 15:56:21 +00:00
Lionel Landwerlin
773e6aa9fd intel/devinfo: fix missing num_thread_per_eu on ICL
There was an assumption that num_thread_per_eu would be set in the
Gen8 features. Since this is mostly the same of all gen8->11 (except
GEN9_LP that overwrites it) let's just factor it out.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com
2019-04-19 15:56:21 +00:00
Jason Ekstrand
cd4ffb376f intel/fs: Account for live range lengths in spill costs
The current register allocator has a concept of "spill benefit" which is
based on the number of nodes with which a given node interferes.  The
idea is that you want to spill stuff with high interference because
those are the most likely registers to help when spilling.  However,
this fails to take into account the length of the live range so the
allocator frequently picks "cheap" (not many uses) registers which are
actually very short lived and so spilling them doesn't help with the
pressure situation.

This commit takes into account the length of the live range to make
long-lived registers more likely to get spilled than short-lived ones.
This encourages the spill chooser to choose slightly larger registers
which will affect a larger area of the program and hopefully we have to
spill fewer of them to get the same reduction in over-all register
pressure.

Shader-db results on Kaby Lake:

    total spills in shared programs: 23664 -> 12050 (-49.08%)
    spills in affected programs: 19243 -> 7629 (-60.35%)
    helped: 296
    HURT: 8

    total fills in shared programs: 32028 -> 25139 (-21.51%)
    fills in affected programs: 20378 -> 13489 (-33.81%)
    helped: 295
    HURT: 16

Of course, most of that is in Deus Ex...

Shader-db results on Kaby Lake (without Deus Ex):

    total spills in shared programs: 6479 -> 5834 (-9.96%)
    spills in affected programs: 3231 -> 2586 (-19.96%)
    helped: 40
    HURT: 4

    total fills in shared programs: 17165 -> 17099 (-0.38%)
    fills in affected programs: 6951 -> 6885 (-0.95%)
    helped: 40
    HURT: 7

Even without Deus Ex, the spill help is pretty respectable.  The worst
hurt shaders were one compute shader in Aztec Ruins and one fragment
shader in KSP that were each hurt by around 13% fill 9% spill.

VkPipeline-db results on Kaby Lake:

    total spills in shared programs: 9149 -> 8069 (-11.80%)
    spills in affected programs: 5197 -> 4117 (-20.78%)
    helped: 27
    HURT: 16

    total fills in shared programs: 26390 -> 25477 (-3.46%)
    fills in affected programs: 12662 -> 11749 (-7.21%)
    helped: 24
    HURT: 22

The Vulkan results were decidedly more mixed but we don't have nearly as
many apps in that database yet.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 23:04:45 +00:00
Lionel Landwerlin
dfd79079da anv: fix uninitialized pthread cond clock domain
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 843775bab7 ("anv: Rework fences")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 23:23:03 +01:00
Jason Ekstrand
db4a70e678 anv: Drop some unneeded ANV_FROM_HANDLE for physical devices
Ever since 48ed2a7bb0, we've had one at the top of the function.

Reviewed-by: Caio Marcelo de Oliveira Filho caio.oliveira@intel.com
2019-04-18 20:12:57 +00:00
Jason Ekstrand
981209d175 anv: Re-sort the GetPhysicalDeviceFeatures2 switch statement
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-04-18 20:12:57 +00:00
Ian Romanick
1711bf6cf2 intel/fs: Generate better code for fsign multiplied by a value
v2: Rebase on v2 changes in previous two commits.

v3: Rebase on 85c35885b3 ("nir: Rework nir_src_as_alu_instr to not take
a pointer").

shader-db results:

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 15297100 -> 15282141 (-0.10%)
instructions in affected programs: 956685 -> 941726 (-1.56%)
helped: 4527
HURT: 0
helped stats (abs) min: 1 max: 221 x̄: 3.30 x̃: 2
helped stats (rel) min: 0.07% max: 10.53% x̄: 1.85% x̃: 1.37%
95% mean confidence interval for instructions value: -3.48 -3.12
95% mean confidence interval for instructions %-change: -1.88% -1.81%
Instructions are helped.

total cycles in shared programs: 372809551 -> 372597886 (-0.06%)
cycles in affected programs: 13645512 -> 13433847 (-1.55%)
helped: 4362
HURT: 125
helped stats (abs) min: 1 max: 2088 x̄: 50.73 x̃: 28
helped stats (rel) min: 0.01% max: 28.20% x̄: 2.77% x̃: 2.39%
HURT stats (abs)   min: 1 max: 1836 x̄: 76.90 x̃: 28
HURT stats (rel)   min: <.01% max: 34.36% x̄: 3.03% x̃: 1.42%
95% mean confidence interval for cycles value: -50.98 -43.37
95% mean confidence interval for cycles %-change: -2.67% -2.55%
Cycles are helped.

total spills in shared programs: 23465 -> 23463 (<.01%)
spills in affected programs: 42 -> 40 (-4.76%)
helped: 1
HURT: 0

total fills in shared programs: 31766 -> 31763 (<.01%)
fills in affected programs: 69 -> 66 (-4.35%)
helped: 1
HURT: 0

Haswell
total instructions in shared programs: 13839992 -> 13828311 (-0.08%)
instructions in affected programs: 712503 -> 700822 (-1.64%)
helped: 3477
HURT: 0
helped stats (abs) min: 1 max: 221 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.07% max: 10.64% x̄: 1.96% x̃: 1.52%
95% mean confidence interval for instructions value: -3.58 -3.14
95% mean confidence interval for instructions %-change: -2.01% -1.92%
Instructions are helped.

total cycles in shared programs: 387026330 -> 386872483 (-0.04%)
cycles in affected programs: 11329966 -> 11176119 (-1.36%)
helped: 3307
HURT: 139
helped stats (abs) min: 2 max: 1776 x̄: 49.58 x̃: 18
helped stats (rel) min: 0.01% max: 20.38% x̄: 2.27% x̃: 1.79%
HURT stats (abs)   min: 1 max: 2314 x̄: 72.68 x̃: 20
HURT stats (rel)   min: <.01% max: 33.99% x̄: 2.28% x̃: 0.96%
95% mean confidence interval for cycles value: -49.31 -39.98
95% mean confidence interval for cycles %-change: -2.15% -2.01%
Cycles are helped.

LOST:   1
GAINED: 0

Ivy Bridge
total instructions in shared programs: 12045602 -> 12038463 (-0.06%)
instructions in affected programs: 623837 -> 616698 (-1.14%)
helped: 2498
HURT: 0
helped stats (abs) min: 1 max: 39 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.05% max: 10.00% x̄: 1.30% x̃: 1.05%
95% mean confidence interval for instructions value: -2.96 -2.75
95% mean confidence interval for instructions %-change: -1.34% -1.26%
Instructions are helped.

total cycles in shared programs: 181025675 -> 180891323 (-0.07%)
cycles in affected programs: 11329329 -> 11194977 (-1.19%)
helped: 2439
HURT: 47
helped stats (abs) min: 1 max: 1565 x̄: 57.06 x̃: 26
helped stats (rel) min: 0.02% max: 24.56% x̄: 2.02% x̃: 1.64%
HURT stats (abs)   min: 1 max: 1269 x̄: 102.51 x̃: 43
HURT stats (rel)   min: 0.11% max: 52.94% x̄: 4.15% x̃: 1.34%
95% mean confidence interval for cycles value: -59.91 -48.17
95% mean confidence interval for cycles %-change: -1.99% -1.82%
Cycles are helped.

Sandy Bridge, Iron Lake, and GM45 had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10896368 -> 10896339 (<.01%)
instructions in affected programs: 3767 -> 3738 (-0.77%)
helped: 17
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.71 x̃: 1
helped stats (rel) min: 0.13% max: 9.52% x̄: 3.58% x̃: 2.73%
95% mean confidence interval for instructions value: -2.27 -1.14
95% mean confidence interval for instructions %-change: -5.14% -2.03%
Instructions are helped.

total cycles in shared programs: 155091109 -> 155091021 (<.01%)
cycles in affected programs: 47241 -> 47153 (-0.19%)
helped: 15
HURT: 8
helped stats (abs) min: 2 max: 81 x̄: 15.73 x̃: 4
helped stats (rel) min: 0.03% max: 10.59% x̄: 1.55% x̃: 0.71%
HURT stats (abs)   min: 14 max: 32 x̄: 18.50 x̃: 17
HURT stats (rel)   min: 0.32% max: 2.79% x̄: 2.43% x̃: 2.71%
95% mean confidence interval for cycles value: -14.59 6.93
95% mean confidence interval for cycles %-change: -1.41% 1.08%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
2019-04-18 12:38:05 -07:00
Ian Romanick
06d2c11641 intel/fs: Add a scale factor to emit_fsign
Normally fsign generates -1, 0, or +1.  The new scale factor, S, causes
fsign to generate -S, 0, or +S.

v2: Rebase on v2 changes in previous commit.

v3: Rebase on 85c35885b3 ("nir: Rework nir_src_as_alu_instr to not take
a pointer").

Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
2019-04-18 12:37:48 -07:00
Ian Romanick
ad98fbc217 intel/fs: Refactor code generation for nir_op_fsign to its own function
v2: Call emit_fsign from inside the existing switch statement.
Suggested by Matt.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 12:37:48 -07:00