Commit graph

2416 commits

Author SHA1 Message Date
Jason Ekstrand
4639cc716e intel/blorp: Use mocs.tex for depth stencil
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-13 19:39:57 -08:00
Kenneth Graunke
866158b4b6 intel/tools/error: Decode compute shaders.
This is a bit more annoying than your average shader - we need to look
at MEDIA_INTERFACE_DESCRIPTOR_LOAD in the batch buffer, then hop over
to the dynamic state buffer to read the INTERFACE_DESCRIPTOR_DATA, then
hop over to the instruction buffer to decode the program.

Now that we store all the buffers before decoding, we can actually do
this fairly easily.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 17:11:02 -08:00
Kenneth Graunke
7049c38655 intel/tools/error: Use do-while for field iterator loops.
while loops skip the first field of the instruction/structure, which
is not what the code intended.  It works out because the field we're
looking for doesn't happen to be first, but we ought to do it right
regardless.

Found while writing the next patch, where Kernel Start Pointer is
the first field of INTERFACE_DESCRIPTOR_DATA.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 17:11:02 -08:00
Kenneth Graunke
8b749ee0ea intel/tools/error: Decode shaders while decoding batch commands.
This makes aubinator_error_decode's shader dumping work like aubinator.
Instead of printing them after the fact, it prints them right inside the
3DSTATE_VS/HS/DS/GS/PS packet that references them.  This saves you the
effort of cross-referencing things and jumping back and forth.

It also reduces a bunch of book-keeping, and eliminates the limitation
that we could only handle 4096 programs.  That code was also broken and
failed to print any shaders if there were under 4096 programs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 17:11:02 -08:00
Kenneth Graunke
4979bf2728 intel/tools/error: Save error state sections and decode them later.
This lets us complete parsing and storing of each buffer's data before
we begin decoding the batchbuffer.  This makes it possible to inspect
the state buffer and program buffer, so we can properly decode any
indirect state or shader programs.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:11:02 -08:00
Kenneth Graunke
eb8ad56ed2 intel/tools/error: Fix null termination of ring name string.
Ported from intel_error_decode.  We don't want to run off the end.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:11:01 -08:00
Kenneth Graunke
ac17b38e79 intel/tools/error: Drop unused MAX_RINGS #define.
Dead code.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:11:01 -08:00
Kenneth Graunke
596e860317 intel/tools/error: Refactor buffer matching, add more buffers.
Based on a similar patch to intel_error_decode by Chris Wilson.

While we're de-duplicating the gtt_offset calculation, we can simplify
it to assume two hex digits are there - the kernel has done this since
v4.6, and we already require error states from v4.10.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:51 -08:00
Kenneth Graunke
4bb119f00b intel/tools/error: Only decode a few sections of error states.
These three are the only we can reasonably decode with genxml.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:38 -08:00
Kenneth Graunke
00981e7c47 intel/tools/error: Drop unused parameters from decode() helper.
Also change count from a pointer into a value.  We were supposed to
be resetting it to 0 (and failed to), but that's gone since we dropped
the pre-ascii85 handling.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:38 -08:00
Kenneth Graunke
1898bf11a8 intel/tools/error: Drop support for non-ascii85 encoded error states.
Error state files used to look like:

   render ring --- gtt_offset = 0x0e8f6000
   00000000 :  69040000
   00000004 :  79090000
   ...
   00007ffc :  00000000
   --- ringbuffer = 0x00001000

There were thousands of lines between sections.  The file format changed
with Kernel 4.10, and now has a single ascii85-encoded line following
each section heading.  This is much easier to parse.

There are a bunch of bugs in our handling of the old style format,
where we'd decode the wrong data, at the wrong time.  Fixing all of
these is going to be a giant pain.  It's also a lot of extra code
complexity.  In order to properly decode indirect state, or compute
shaders, we'll also need to parse data in advance of decoding, which
is going to be a giant pain with this ad-hoc "decode everywhere!"
mentality.  So, let's just drop support for the older file format.

This unfortunately requires an error state generated by Kernel 4.10 or
later.  That's probably not the end of the world, as we encourage users
to upgrade to the latest kernel when encountering GPU hangs anyway.  It
might be a giant pain for people with LTS kernels, though...

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:38 -08:00
Kenneth Graunke
53586f88d7 intel/tools/error: Do ascii85 decode first.
The dashes "---" may occur within an ascii85 block, but only an ascii85
block starts with ':' or '~'.

Ported from Chris Wilson's intel-gpu-tools commit:
bceec7e1d8a160226b783c6344eae8cbf4ece144

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-11-13 17:10:38 -08:00
Dylan Baker
49fa074726 meson: Don't build intel shared components by default
It's a neat idea, and still useful in some cases, but the intel common
code is used by i965 and anvil only, this is a little clearer.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-11-13 13:43:15 -08:00
Jason Ekstrand
93200ea26d aubinator: Don't skip the first field in each subgroup
The previous iteration algorithm would advance the field pointer right
after we advance the group.  This meant that you would end up with
skipping the first field of the group.  In the common case, where the
only field is a struct (e.g. 3DSTATE_VERTEX_BUFFERS), it would get
skipped entirely.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 07:37:23 -08:00
Jason Ekstrand
74a9e51696 intel/genxml: Delete empty groups
They serve no purpose other than to just fill empty space in the packet
so each dword has something.  Just disallowing empty groups is a bit
easier on some of the tools.  This does not change the generated packing
headers in any way.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-13 07:37:23 -08:00
Jason Ekstrand
54a6f7eaca anv: Don't crash on invalid heap sizes when the PCI ID is overriden 2017-11-13 07:37:23 -08:00
Kenneth Graunke
9a0465b3a3 intel/tools: Fix detection of enabled shader stages.
We renamed "Function Enable" to "Enable", which broke our detection
of whether shaders are enabled or not.  So, we'd see a bunch of HS/DS
packets with program offsets of 0, and think that was a valid TCS/TES.

Fixes: c032cae9ff (genxml: Rename "Function Enable" to "Enable".)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-12 00:16:40 -08:00
Dylan Baker
854455498c autotools: Set C++ visibility flags on Intel
These flags are set for C sources, but not C++. This causes symbol
visibility leaks from the C++ parts of the Intel compiler.

Fixes: 700bebb958 ("i965: Move the back-end compiler to src/intel/compiler")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-11-10 09:41:55 -08:00
Chad Versace
cd6f79a71d anv/meson: Generate dev_icd.json
I tested this in a setup where the builddir was outside of the srcdir.

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-11-09 16:29:33 -08:00
Chad Versace
b7441ef252 anv: Fix architecture in intel_icd.{arch}.json
Use the host arch, not the target arch. In Meson and in recent
Autotools, the host arch is where the binary will be used. The target
arch is useful only when compiling a compiler.

See: http://mesonbuild.com/Cross-compilation.html
See: https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html
Reported-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-11-09 16:29:31 -08:00
Chad Versace
2a4798ad98 anv: Refactor anv_GetImageSubresourceLayout()
Its helper function, anv_surface_get_subresource_layout(), was not very
helpful. So fold it into the main function.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
69e3f0b02e anv/image: Refactor choice of isl_tiling_flags_t
Instead of choosing the tiling flags inside make_surface(), which is
called once per aspect in a loop, and which chooses the same tiling for
each aspect, choose the tiling flags exactly once before entering the
aspect loop.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
7bb4387105 anv: Refactor anv_get_format_plane() - explicit unsupported
The same local variable, 'plane_format', was returned on success *and*
failure. Be more explicit in distinguishing the two cases: return
'plane_format' on success and return 'unsupported' on failure.

This simplifies the diff in upcoming patches for
VK_EXT_image_drm_format_modifier.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
3ee7f4bc2f anv: Remove anv_physical_device_get_format_properties()
Fold its body into its sole caller,
anv_GetPhysicalDeviceFormatProperties().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
891d237667 anv: Simplify anv_physical_device_get_format_properties()
Now that get_image_format_properties() returns the correct
VkFormatFeatureFlags, we can remove the unneeded if-branch and some
local variables.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
b3e2ce0580 anv: Simplify anv_get_image_format_properties()
Now that get_image_format_features() has a VkImageTiling parameter, we
can bypass anv_physical_device_get_format_properties() and call
get_image_format_features() directly.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
cd3fe376e0 anv: Rename get_image_format_properties()
The name is misleading. It looks like vkGetPhysicalDeviceImageFormatProperties(),
but it actually implement vkGetPhysicalDeviceFormatProperties. Let's
rename it to what it actually does, get_image_format_features(), because it
returns VkFormatFeatureFlags.

For consistency, also rename get_buffer_format_properties() to
get_buffer_format_features().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
17ac61a2c9 anv: Fix get_image_format_properties() - YCbCr
Teach it to calculate the format features for YCbCr.

The goal (which is completed in this patch) is to incrementally fix
get_image_format_properties() to return a correct result.  Previously,
it returned incorrect VkFormatFeatureFlags which the caller needed clean
up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
eaa49ec3fc anv: Fix get_image_format_properties() - 3-channel formats
Teach it to calculate the format features for 3-channel formats.

The goal is to incrementally fix get_image_format_properties() to return
a correct result.  Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
6394e4a380 anv: Refactor get_image_format_properties() - Reduce params
Replace parameters 'enum isl_format' and 'struct anv_format_plane' with
new parameter 'const struct anv_format *'.

The goal is to incrementally fix get_image_format_properties() to return
a correct result.  Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-09 16:01:59 -08:00
Chad Versace
66647074a4 anv: Refactor get_image_format_properties() - base_isl_format
Rename parameter 'base' to 'base_isl_format'.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-09 16:01:59 -08:00
Chad Versace
c22a9f10be anv: Refactor get_image_format_properties() - plane_format
Rename parameter 'format' to 'plane_format'.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
096fc6915b anv: Refactor get_image_format_properties() - ASTC
Teach it to calculate the format features for ASTC.

The goal is to incrementally fix get_image_format_properties() to return
a correct result.  Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.

v2: New commit message

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
8ae4e97536 anv: Refactor get_image_format_properties() - depthstencil (v2)
Teach it to calculate the features of depthstencil formats.

The goal is to incrementally fix get_image_format_properties() to return
a correct result.  Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.

v2: New commit message

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-11-09 16:01:59 -08:00
Chad Versace
6720abf292 anv: Better types for 'aspect' function params
Some functions have a comment that says "Exactly one bit must be in
'aspect'". So change the type of their 'aspect' parameter from
VkImageAspectFlags to VkImageAspectFlagBits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-09 16:01:59 -08:00
Chad Versace
342c811646 anv: Refactor get_buffer_format_properties()
Make it a stand-alone function. Pre-patch, for some formats the function
returned incorrect VkFormatFeatureFlags which were cleaned up by the
caller.

This prepares for a cleaner implementation of
VK_EXT_image_drm_format_modifier.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-09 16:01:59 -08:00
Nicolai Hähnle
ffc2060616 anv: fix build failure
Fixes: e3a8013de8 ("util/u_queue: add util_queue_fence_wait_timeout")
2017-11-09 14:49:19 +01:00
Jason Ekstrand
951a5dc4cc intel/nir: Use the correct indirect lowering masks in link_shaders
Previously, if we were linking a vec4 VS with a SIMD8/16 FS, we wouldn't
lower indirects on the fragment shader which is wrong.  Instead of using
a single indirect mask, take advantage of our new little helper.

Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-08 20:10:04 -08:00
Timothy Arceri
f98a2768ca mesa: Add new fast mtx_t mutex type for basic use cases
While modern pthread mutexes are very fast, they still incur a call to an
external DSO and overhead of the generality and features of pthread mutexes.
Most mutexes in mesa only needs lock/unlock, and the idea here is that we can
inline the atomic operation and make the fast case just two intructions.
Mutexes are subtle and finicky to implement, so we carefully copy the
implementation from Ulrich Dreppers well-written and well-reviewed paper:

  "Futexes Are Tricky"
  http://www.akkadia.org/drepper/futex.pdf

We implement "mutex3", which gives us a mutex that has no syscalls on
uncontended lock or unlock.  Further, the uncontended case boils down to a
cmpxchg and an untaken branch and the uncontended unlock is just a locked decr
and an untaken branch.  We use __builtin_expect() to indicate that contention
is unlikely so that gcc will put the contention code out of the main code
flow.

A fast mutex only supports lock/unlock, can't be recursive or used with
condition variables.  We keep the pthread mutex implementation around as
for the few places where we use condition variables or recursive locking.
For platforms or compilers where futex and atomics aren't available,
simple_mtx_t falls back to the pthread mutex.

The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound
applications.  Most CPU bound cases are helped and some of our internal
bind_buffer_object heavy benchmarks gain up to 10%.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-09 12:07:48 +11:00
Jason Ekstrand
3e63cf893f intel/nir: Break the linking code into a helper in brw_nir.c
Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-08 14:09:51 -08:00
Jason Ekstrand
7364f080f9 intel/nir: Add a helper for getting the NoIndirect mask
Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-08 14:09:49 -08:00
Emil Velikov
ba414dba4f automake: intel: correctly append to the LIBADD variable
Commit 05fc62d89f sets the variable, yet it forgot the update the
existing reference to append (instead of assign).

Thus as-is the expat library was discarded from the link chain when
building with Android.

Fixes: 05fc62d89f ("automake: intel: move expat handling where it's
used")
Cc: Hongxu Jia <hongxu.jia@windriver.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-11-08 14:23:57 +00:00
Jason Ekstrand
d002950e54 intel/fs/nir: Return Q types from brw_reg_type_for_bit_size
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-11-07 10:41:24 -08:00
Jason Ekstrand
dee58ecd2e intel/fs/nir: Use Q immediates for load_const on gen8+
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-11-07 10:41:24 -08:00
Jason Ekstrand
9bb34892bf intel/fs/nir: Setup immediates based on type in i2b and f2b
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-11-07 10:41:24 -08:00
Jason Ekstrand
1cb210f4bc intel/reg: Add helpers for 64-bit integer immediates
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-11-07 10:41:24 -08:00
Jason Ekstrand
ab9220edd6 nir,intel/compiler: Use a fixed subgroup size
The GL_ARB_shader_ballot spec says that gl_SubGroupSizeARB is declared
as a uniform.  This means that it cannot change across an invocation
such as a draw call or a compute dispatch.  For compute shaders, we're
ok because we only ever use one dispatch size.  For fragment, however,
the hardware dynamically chooses between SIMD8 and SIMD16 which violates
the spec.  Instead, let's just pick a subgroup size based on the shader
stage.  The fixed size we choose for compute shaders is a bit higher
than strictly needed but there's no real harm in that.  The advantage is
that, if they do anything interesting with the value, NIR will see it as
an immediate and can optimize better.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand
a026458020 nir/lower_subgroups: Lower ballot intrinsics to the specified bit size
Ballot intrinsics return a bitfield of subgroups.  In GLSL and some
SPIR-V extensions, they return a uint64_t.  In SPV_KHR_shader_ballot,
they return a uvec4.  Also, some back-ends would rather pass around
32-bit values because it's easier than messing with 64-bit all the time.
To solve this mess, we make nir_lower_subgroups take a new parameter
called ballot_bit_size and it lowers whichever thing it gets in from the
source language (uint64_t or uvec4) to a scalar with the specified
number of bits.  This replaces a chunk of the old lowering code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand
28da82f978 nir: Add a new subgroups lowering pass
This commit pulls nir_lower_read_invocations_to_scalar along with most
of the guts of nir_opt_intrinsics (which mostly does subgroup lowering)
into a new nir_lower_subgroups pass.  There are various other bits of
subgroup lowering that we're going to want to do so it makes a bit more
sense to keep it all together in one pass.  We also move it in i965 to
happen after nir_lower_system_values to ensure that because we want to
handle the subgroup mask system value intrinsics here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00
Jason Ekstrand
1ca3a94427 intel/fs: Don't use automatic exec size inference
The automatic exec size inference can accidentally mess things up if
we're not careful.  For instance, if we have

add(4)    g38.2<4>D    g38.1<8,2,4>D    g38.2<8,2,4>D

then the destination register will end up having a width of 2 with a
horizontal stride of 4 and a vertical stride of 8.  The EU emit code
sees the width of 2 and decides that we really wanted an exec size of 2
which doesn't do what we wanted.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2017-11-07 10:37:52 -08:00