Commit graph

4529 commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho
f7d90c67c7 intel/compiler: Silence maybe-uninitialized warning in GCC 9.1.1
Compiler can't see that d is initialized.

    ../src/intel/compiler/brw_vec4_nir.cpp: In function ‘int brw::try_immediate_source(const nir_alu_instr*, brw::src_reg*, bool, const gen_device_info*)’:
    ../src/intel/compiler/brw_vec4_nir.cpp:984:12: warning: ‘d’ may be used uninitialized in this function [-Wmaybe-uninitialized]
      984 |          d = MAX2(-d, d);

Assert that we expect at least one component -- hence d going to be
set.  That by itself is not enough, so also zero initialize the
variable.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-23 13:25:27 -07:00
Jason Ekstrand
021fa28163 intel/nir: Add a helper for getting BRW_AOP from an intrinsic
So many duplicated switch statements....

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-21 17:19:55 +00:00
Jason Ekstrand
951cf94521 nir: Add explicit signs to image min/max intrinsics
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode.  Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle.  In SPIR-V, signed min/max are separate
opcodes from unsigned.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-21 17:19:55 +00:00
Arcady Goldmints-Orlov
3835535537 anv: inline uniforms blocks don't count toward descriptor set limits
In a descriptor set inline uniform blocks don't use up any bindings.
However, the presence of any inline uniform blocks doed require the
use of the descriptor buffer, which takes up one binding.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 16:48:45 +00:00
Kenneth Graunke
f741de236b isl: Enable Unorm Path in Color Pipe
Improves performance on my Icelake 8x8 locked to 700Mhz.  For example,
some GfxBench5 subtests have the following results:

- [i965] gl_manhattan: ................ 7.01119% +/- 0.180971% (n=5)
- [i965] gl_4 (Car Chase):              4.24351% +/- 0.175622% (n=5)
- [i965] gl_blending:  ................ 3.36327% +/- 0.180267% (n=5)
- [i965] gl_5_normal (Aztec Ruins):     1.67962% +/- 0.243534% (n=10)
- [iris] gl_manhattan: ................ 3.92357% +/- 0.073965% (n=25)
- [iris] gl_4 (Car Chase):              2.17746% +/- 0.0826858% (n=5)
- [iris] gl_blending:  ................ 2.79599% +/- 0.803652% (n=15)
- [iris] gl_5_normal (Aztec Ruins):     1.30930% +/- 0.106523% (n=25)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-15 10:39:09 -07:00
Rafael Antognolli
ceeaf93c8e anv: Properly initialize device->slice_hash.
When subslices_delta == 0 and we take the early return,
device->slice_hash is not initialized on GEN11. It then causes a
segfault when going through anv_DestroyDevice, if compiled with
valgrind.

Fixes: 7bc022b4bb ("anv/gen11: Emit SLICE_HASH_TABLE when pipes are
                    unbalanced.)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-15 09:42:48 -07:00
Danylo Piliaiev
72354d43d4 intel/compiler: Fix resource leak in error path
CID: 1452261

Fixes: 04a99515 "intel/compiler: add ability to override shader's assembly"

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-08-15 08:17:36 +00:00
Caio Marcelo de Oliveira Filho
1021abab07 intel/tools: Fix aub_file initialization in intel_dump_gpu
The `device` can be set earlier either by a command line or a by
intercepting an ioctl call to get the I915_PARAM_CHIPSET_ID done by
the application early.  In both cases `aub_file` and `devinfo` would
not be initialized.

Fix by splitting the conditions

- `device == 0`: use the FD to get both device and devinfo.
- Or `devinfo.gen == 0`: use `device` to initialize it.

And separatedly, initialize aub_file the first time it is needed.

Fixes: d594d2a052 ("intel/tools: use device info initializer")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 19:18:26 -07:00
Rafael Antognolli
7bc022b4bb anv/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.

v2: Don't need to set the mask - it's mbo (Ken).
2019-08-12 16:19:08 -07:00
Rafael Antognolli
ad513fd386 intel: Get information about pixel pipes subslices.
v2: Use 1 instead of 1UL (Ken).
2019-08-12 16:19:08 -07:00
Rafael Antognolli
32344dc581 intel/gen_decoder: Decode SLICE_HASH_TABLE. 2019-08-12 16:19:08 -07:00
Rafael Antognolli
e1cb71c182 intel/genxml: Update 3D_MODE and add SLICE_HASH_TABLE.
Add these fields and the 3DSTATE_SLICE_TABLE_STATE_POINTERS instruction
so we can properly configure the slice and subslice hashing on ICL+

v2: Make 'Mask' field a mbo (Ken).
2019-08-12 16:19:08 -07:00
Jason Ekstrand
d787a2d05e anv: Implement VK_KHR_pipeline_executable_properties
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
67cb55ad11 anv: Add a ralloc context to anv_pipeline
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
fec4bdff40 anv: Force a full re-compile when CAPTURE_INTERNAL_REPRESENTATION_TEXT is set
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
651fbbf9b8 anv/pipeline: Split setting up per-stage keys into its own loop
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
78f3dfb4a2 anv: Record shader compile stats in the pipeline cache
We're going to want these to be available regardless of caching.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
2af380d20f anv/pipeline: Stash generated code in the pipeline stage
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
8d3cbd0393 intel/fs: Add SLM size to brw_cs_prog_data
We don't need it for state setup but it's a useful statistic we want to
pass on to developers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Jason Ekstrand
134607760a intel/compiler: Fill a compiler statistics struct
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct.  This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-12 22:56:07 +00:00
Paulo Zanoni
866bb775de intel/fs: add 64 bit integer multiplication lowering
While NIR's lower_imul64() solves the case of 64 bit integer multiplications
generated early, we don't have a way to lower such instructions when they are
generated by our own backend, such as the scan/reduce intrinsics. We'll need
this soon, so implement it now.

An easy way to test this is to simply disable nir_lower_imul64 to let
those operations reach the backend.

v2:
  - Fix Q/UQ copy/paste errors (Caio).
  - Transform an 'if' into 'else if' (Caio).
  - Add an extra comment to clarify the need for 64b = 32b * 32b
    (Caio).
  - Make private functions private (Caio).
v3:
  - Remove ambiguity with 'b' and 'd' variables (Caio).
  - Allocate potentially less regs for the dwords (Caio).

Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Matt Turner <matt.turner@intel.com>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Paulo Zanoni
9217cf3b5e intel/compiler: invert the logic of lower_integer_multiplication()
Invert the logic of how progress is handled: remove the continue
statements and mark progress inside the places where it actually
happens.

We're going to add a new lowering that also looks for BRW_OPCODE_MUL,
so inverting the logic here makes the resulting code much easier to
follow.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Paulo Zanoni
6ba4717924 intel/compiler: don't instantiate a builder for each instruction
Don't instantiate a builder for each instruction during
lower_integer_multiplication(). Instantiate one only when needed.

On the other hand, these unneeded builders don't seem to cost much to
init, so I don't expect any significant difference in performance:
this is mostly about code organization.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Paulo Zanoni
75b3868dcc intel/compiler: extract subfunctions of lower_integer_multiplication()
The lower_integer_multiplication() function is already a little too
big. I want to add more to it, so let's reorganize the existing code
first. Let's start with just extracting the current code to
subfunctions. Later we'll change them a little more.

v2: Make private functions private (Caio).
v3: Fix typo (Caio).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-08-12 15:16:23 -07:00
Rhys Perry
7740149852 nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Francisco Jerez
c2fe7a0fb8 anv/gen9: Optimize slice and subslice load balancing behavior.
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.  According to Jason, improves Aztec Ruins
performance by 2.7%.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)

v2: Undo CPU performance micro-optimization done in i965 and iris due
    to lack of data justifying it on anv.  Use
    cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe
    control command directly.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-12 14:40:21 -07:00
Francisco Jerez
03cba9f5d9 intel/genxml: Add GT_MODE hashing defs for Gen9.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 13:17:58 -07:00
Jason Ekstrand
14c96a6300 anv: Implement VK_EXT_subgroup_size_control version 2
The version bump adds a proper features struct.

Fixes: d10de25309 "anv: Implement VK_EXT_subgroup_size_control"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-12 14:56:33 +00:00
Eric Engestrom
e4aa0fc63a anv: add missing break
Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-09 23:34:31 +01:00
Lionel Landwerlin
cefb4341b7 anv: drop unused code
We stopped using this when we moved to Jason's mi_builder.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-09 17:01:38 +03:00
Tapani Pälli
5e38db0c47 anv/android: disable shared representable image support explicitly
Android 9 loader conditionally advertises VK_KHR_shared_presentable_image
extension based on this property and it looks like it does not
initialize the struct before query.

Pragmas are added to ignore warnings with Android specific structure
types in same manner as commit 8d386e6eef  did.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-09 08:53:54 +03:00
Greg V
ac1561088d intel/perf: use MAJOR_IN_SYSMACROS/MAJOR_IN_MKDEV
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 134e750e16 ("i965: extract performance query metrics")
2019-08-08 21:44:33 +01:00
Greg V
c00ee00031 i965/tiled_memcpy: avoid creating bswap32 if it exists as a macro (e.g. on FreeBSD)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Greg V
7b520dc74f anv: add MAP_POPULATE fallback define for portability
FreeBSD does not have MAP_POPULATE

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Greg V
2be3f16600 anv: remove unused Linux-specific include
Fixes: 4201cc2dd3 ("anv: Implement VK_KHX_external_semaphore_fd")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-08-08 21:44:33 +01:00
Rhys Perry
c52c54a746 anv,i965,iris: deduplicate setting of total_shared
v5: add patch

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 12:10:39 -05:00
Rhys Perry
024a46a407 anv: use derefs for shared memory access
vkpipeline-db for my Skylake GPU:
total instructions in shared programs: 8847602 -> 8847896 (<.01%)
instructions in affected programs: 10165 -> 10459 (2.89%)
helped: 8
HURT: 2

total cycles in shared programs: 1606273555 -> 1606251634 (<.01%)
cycles in affected programs: 2201803 -> 2179882 (-1.00%)
helped: 7
HURT: 3

The shaders with more instructions is due to a loop over a shared array
in Three Kingdoms being unrolled (and creating a lot of nested ifs). Not sure
if that's good or bad.

One of the shaders with worse cycles is only worse by 0.04% and the other
two are the shaders with loops unrolled.

v2: add patch
v4: don't set spirv_options.shared_addr_format
v4: move comment concerning the shared address format used and NULL
v4: add vkpipeline-db results
v5: rename to nir_lower_vars_to_explicit_types
v5: move setting of total_shared to outside brw_compile_cs
v6: set shared_addr_format
v6: formatting changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 12:10:39 -05:00
Tapani Pälli
aba57b11ee anv: support GetSwapchainGrallocUsage2ANDROID for Android
New function supports gralloc1 usage flags that get set separately
for producer and consumer. As we still need to support old method too,
let's share common code and use android_convertGralloc0To1Usage helper.
Bump the VK_ANDROID_native_buffer version to indicate support for the
new call.

Changes were tested on Android Celadon P with Basemark GPU and various
Sascha Willems Vulkan demos.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 05:08:01 +00:00
Mark Janes
61c54a8878 intel/perf: fix debug typo
Misspelling was seen with INTEL_DEBUG=perfmon.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
2df1ab4d48 intel/perf: make gen_perf_query_object private
Encapsulate the details of this structure within the perf implemenation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
deea3798b6 intel/perf: make perf context private
Encapsulate the details of this data structure.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
1f4f421ce0 intel/perf: print debug information
INTEL_DEBUG=perfmon will iterate over the perf queries, printing
information about the state of each query.  Some of this information
will be private to intel/perf, and needs to a dump routine that can be
called from i965.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
a663c8c26e intel/perf: make internal methods private
Now that all references from i965 have been moved to perf, we can make
internal methods private again.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
be8b466cff intel/perf: make oa_sample_buffers private
All references to this data structure have been moved inside the perf
subsystem.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
f2a049b4e3 intel/perf: expose method to create query
By encapsulating this implementation within perf, we can eventually
make struct gen_perf_ctx private.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
9f5c160d82 intel/perf: move initialization of pipeline statistics metrics to gen_perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
9f84efb452 intel/perf: move get_query_data into gen_perf
This refactor moves several helper functions for get_query_data as
well:

 - accumulate_oa_reports
 - read_gt_frequency
 - get_pipeline_stats_data
 - get_oa_counter_data

Functions which are no longer referenced in brw_performance_query.c
have been removed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
73eccdc4a5 intel/perf: move delete_query to gen_perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
8c9eac1234 intel/perf: move is_query_ready to gen_perf
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00
Mark Janes
a9be292722 intel/perf: move wait_query to perf
The following methods have duplicate implementation of read_oa_samples_until in
brw_performance_query.c:

 - read_oa_samples_for_query
 - read_oa_samples_until

They ar still referenced by other methods in the file and will be
removed on the subsequent commit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-07 21:33:56 -07:00