Commit graph

91 commits

Author SHA1 Message Date
Eric Anholt
13ea7db760 mesa: Promote Intel's simple logging façade for Android to util/
I'm bringing up freedreno Vulkan on an Android phone, and my pains are
exactly what Chad said when working on Intel's vulkan for Android in
aa716db0f6 ("intel: Add simple logging façade for Android (v2)"):

    On Android, stdio goes to /dev/null. On Android, remote gdb is even
    more painful than the usual remote gdb. On Android, nothing works like
    you expect and debugging is hell. I need logging.

    This patch introduces a small, simple logging API that can easily wrap
    Android's API. On non-Android platforms, this logger does nothing
    fancy.  It follows the time-honored Unix tradition of spewing
    everything to stderr with minimal fuss.

    My goal here is not perfection. My goal is to make a minimal, clean API,
    that people hate merely a little instead of a lot, and that's good
    enough to let me bring up Android Vulkan.  And it needs to be fast,
    which means it must be small. No one wants to their game to miss frames
    while aiming a flaming bow into the jaws of an angry robot t-rex, and
    thus become t-rex breakfast, because some fool had too much fun desiging
    a bloated, ideal logging API.

Compared to trusty fprintf, _mesa_log[ewi]() is actually usable on
Android.  Compared to os_log_message(), this has different error levels
and supports format arguments.

The only code change in the move is wrapping flockfile/funlockfile in
!DETECT_OS_WINDOWS, since mingw32 doesn't have it.  Windows likely wants
different logging code, anyway.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6806>
2020-09-28 09:14:44 -07:00
Marcin Ślusarz
990b3782bc intel/compiler: fix Android build
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Fixes: 689acc7398 ("intel/compiler: Extract control barriers from scoped barriers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3087
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5354>
2020-06-06 18:42:03 +00:00
Kenneth Graunke
615270502c intel: Move anv_gem_supports_syncobj_wait to common code.
This will let me use this in iris.

We leave the existing anv function for anv_gem_stubs.c faking, but
move the contents to a helper in a new src/intel/common/gen_gem.c file.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
2020-05-01 19:00:02 +00:00
Francisco Jerez
188a3659ae intel/ir: Import shader performance analysis pass.
This introduces an analysis pass intended to estimate several
performance statistics of the shader, including cycle count latency
and throughput values, based on static modeling.  It has instruction
performance information more comprehensive than the current scheduling
pass for all platforms between Gen4-11, and works on both the FS and
VEC4 back-end.

The most immediate purpose of this pass is to implement a heuristic
meant to determine whether using SIMD32 dispatch for a fragment shader
can be expected to help more than it hurts.  In addition this will
allow the effect of passes run after scheduling (e.g. the TGL software
scoreboard pass and the VEC4 dependency control pass) to be visible in
shader-db statistics.

But that isn't the end of the story, other potential applications of
this pass (not part of this MR) I've been playing around with are:

 - Implement a similar SIMD16 heuristic allowing the identification of
   inefficient SIMD16 fragment shaders.

 - Implement similar SIMD16 and SIMD32 heuristics for the compute
   shader stage -- Currently compute shader builds always use the
   SIMD16 shader if available and never use the SIMD32 shader unless
   strictly necessary, which is suboptimal under certain conditions.

 - Hook up to the instruction scheduler in order to improve the
   accuracy of its timing information.

 - Use as heuristic in order to drive the selection of scheduling
   modes (Matt was experimenting with that).

 - Plug to the TGL software scoreboard pass in order to implement a
   more effective SBID token allocation algorithm, since in general
   the optimal token allocation depends on the timings of all
   instructions in the program.

 - Use its bottleneck detection functionality in order to implement a
   heuristic computing a more optimal bound for the number of fragment
   shader threads executed in parallel (by adjusting the
   MaximumNumberofThreadsPerPSD control of 3DSTATE_PS).

As a follow-up I'm planning to submit updated timing information for
Gen12 platforms -- Everything else required to support Gen12 like SWSB
handling is already included in this patch, but there were some IP
concerns regarding the TGL timing parameters since they cannot
currently be obtained with the documentation and hardware which is
publicly available.  The timing parameters for any previous Gen7-11
platforms can be obtained by anyone by sampling the timestamp register
using e.g. shader_time, though I have some more convenient
instrumentation coming up.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:01:03 -07:00
Lionel Landwerlin
dde96d31b7 intel/perf: move mdapi query definitions to their own file
Where they belong.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4344>
2020-03-27 14:14:49 +00:00
Lionel Landwerlin
33b9c7a7f6 intel/perf: break GL query stuff away
This stuff is somewhat specific to the GL extension & drivers. On
Vulkan we won't use this, it also made a rather large file.

v2: Fix Android build (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4344>
2020-03-27 14:14:49 +00:00
Jason Ekstrand
6fbe3f40a9 intel/isl: Add isl_aux_info.c to Makefile.sources
This should fix the Android build.

Fixes: 58d4749e56 "isl: Add a module which manages aux resolves"
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reported-by: Clayton Craft <clayton.a.craft@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3934>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3934>
2020-02-25 00:41:15 +00:00
Lionel Landwerlin
397ff2976b intel: Implement Gen12 workaround for array textures of size 1
Gen12 does not support RENDER_SURFACE_STATE::SurfaceArray = true &&
RENDER_SURFACE_STATE::Depth = 0. SurfaceArray can only be set to true
if Depth >= 1.

We workaround this limitation by adding the max(value, 1) snippet in
the shaders on the 3 components for texture array sizes.

Tested on Gen9 with the following Vulkan CTS tests :
dEQP-VK.image.image_size.2d_array.*

v2: Drop debug print (Tapani)
    Switch to GEN:BUG instead of Wa_

v3: Fix dEQP-VK.image.image_size.1d_array.* cases (Lionel)

v4: Fix dEQP-VK.glsl.texture_functions.query.texturesize.* cases
    (Missing tex_op handling) (Lionel)

v5: Missing break statement (Lionel)

v6: Fixup comment (Tapani)

v7: Fixup comment again (Tapani)

v8: Don't use sample_dim as index (Jason)
    Rename pass
    Simplify control flow

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v7)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
2020-01-26 22:27:03 +02:00
Jason Ekstrand
9baa33cef0 anv: Rework push constant handling
This substantially reworks both the state setup side of push constant
handling and the pipeline compile side.  The fundamental change here is
that we're no longer respecting the prog_data::param array and instead
are just instructing the back-end compiler to leave the array alone.
This makes the state setup side substantially simpler because we can now
just memcpy the whole block of push constants and don't have to
upload one DWORD at a time.

This also means that we can compute the full push constant layout
up-front and just trust the back-end compiler to not mess with it.
Maybe one day we'll decide that the back-end compiler can do useful
things there again but for now, this is functionally no different from
what we had before this commit and makes the NIR handling cleaner.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Jason Ekstrand
aecde23519 anv: Pre-compute push ranges for graphics pipelines
It turns off that emitting push constants is one of the hottest paths in
the driver and ANY work we do there costs us.  By pre-computing things a
bit ahead of time, we shave 5% off the runtime of a CPU-limited example
running with the Dawn WebGPU implementation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Lionel Landwerlin
c061185e17 intel/perf: add EHL performance query support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-11-15 13:14:30 +00:00
Lionel Landwerlin
b087b7bd90 intel/perf: fix Android build
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 15b7b56eb2 ("intel/perf: add TGL support")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
2019-10-31 11:20:30 +00:00
Jordan Justen
0d0290bb3f
intel/common: Add surface to aux map translation table support
Reworks:
 * Add ISL_FORMAT_B8G8R8X8_UNORM_SRGB to get_format_encoding (Nanley)
 * ralloc_free aux_map_buffer entries in gen_aux_map_finish. (Rafael)
 * verify_aligned_space => align_and_verify_space (Rafael)
 * Add mutex to aux-map code. (Rafael, Nanley)
 * Add gen_aux_map_fill_bos (Ken)
 * Make gen_aux_map_get_state_num lockless

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-28 00:09:13 -07:00
Jordan Justen
c848ab45f3
intel/common: Add interface to allocate device buffers
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-28 00:09:13 -07:00
Lionel Landwerlin
2b5f30b1d9 anv: implement VK_INTEL_performance_query
v2: Introduce the appropriate pipe controls
    Properly deal with changes in metric sets (using execbuf parameter)
    Record marker at query end

v3: Fill out PerfCntr1&2

v4: Introduce vkUninitializePerformanceApiINTEL

v5: Use new execbuf extension mechanism

v6: Fix comments in genX_query.c (Rafael)
    Use PIPE_CONTROL workarounds (Rafael)
    Refactor on the last kernel series update (Lionel)

v7: Only I915_PERF_IOCTL_CONFIG when perf stream is already opened (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-10-23 05:41:15 +00:00
Sagar Ghuge
7fb75ddfa7 intel: Add missing entry for brw_nir_lower_alpha_to_coverage in Makefile
Fixes: 7ecfbd4f6d ("nir: Add alpha_to_coverage lowering pass")

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-10-21 16:19:24 -07:00
Jordan Justen
be89fbd51e
intel/isl: Add gen12 depth/stencil surface alignments
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-17 14:47:23 -07:00
Francisco Jerez
265c7c8971 intel/fs/gen12: Introduce software scoreboard lowering pass.
Gen12+ hardware lacks the register scoreboard logic that used to
guarantee data coherency between register reads and writes in previous
generations.  This lowering pass runs after register allocation in
order to make up for it.

It works by performing global dataflow analysis in order to determine
the set of potential dependencies of every instruction in the shader,
and then inserts any required SWSB annotations and additional SYNC
instructions in order to guarantee data coherency.

v2: Drop unnecessary _safe list iteration (Caio).

v3: Temporarily workaround potential WaR hazard between FPU
    instruction and subsequent out-of-order write, pending
    clarification from the hardware team.  Drop redundant tracking of
    implicit access of acc0-1, since the hardware guarantees coherency
    of these (but not the other accumulators...).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Francisco Jerez
25dd67099d intel/eu: Rework opcode description tables to allow efficient look-up by either HW or IR opcode.
This rewrites the current opcode description tables as a more compact
flat data structure.  The purpose is to allow efficient constant-time
look-up by either HW or IR opcode, which will allow us to drop the
hard-coded correspondence between HW and IR opcodes -- See the next
commits for the rationale.

brw_eu.c is now built as C++ source so we can take advantage of
pointers to member in order to make the look-up function work
regardless of the opcode_desc member used as look-up key.

v2: Optimize devinfo struct comparison (Caio)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Jordan Justen
181be14d43
anv: Build for gen12
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-28 13:38:34 -07:00
Jordan Justen
6d63fd8a69
intel/isl: Build gen12 using gen11 code paths
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-28 13:38:33 -07:00
Jordan Justen
b42a05b436
intel/genxml: Build gen12 genxml
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-08-28 13:38:33 -07:00
Daniel Schürmann
c31f470066 anv,nir: Move lower_input_attachments pass from ANV to NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:02:50 +02:00
Jason Ekstrand
13f0c278c5 i965,iris: Move guardband calculations to a common location
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-21 14:18:59 +00:00
Iago Toral Quiroga
3e377c68f8 intel/compiler: add a NIR pass to lower conversions
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.

v2:
 - Consider fp16 rounding conversion opcodes
 - Properly handle swizzles on conversion sources.

v3
 - Run the pass earlier, right after nir_opt_algebraic_late (Jason)
 - NIR alu output types already have the bit-size (Jason)
 - Use 'is_conversion' to identify conversion operations (Jason)

v4:
 - Be careful about the intermediate types we use so we don't lose
   range and avoid incorrect rounding semantics (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Lionel Landwerlin
b48d6d7471 i965: move mdapi result data format to intel/perf
We want to reuse this in Anv.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
f6bba7760f i965: move mdapi data structure to intel/perf
We'll want to reuse those structures later on.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2019-04-17 14:10:42 +01:00
Lionel Landwerlin
134e750e16 i965: extract performance query metrics
We would like to reuse performance query metrics in other APIs. Let's
make the query code dealing with the processing of raw counters into
human readable values API agnostic.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-17 14:10:42 +01:00
Kenneth Graunke
fad7801afd i965: Move program key debugging to the compiler.
The i965 driver has a bunch of code to compare two sets of program keys
and print out the differences.  This can be useful for debugging why a
shader needed to be recompiled on the fly due to non-orthogonal state
dependencies.  anv doesn't do recompiles, so we didn't need to share
this in the past - but I'd like to use it in iris.

This moves the bulk of the code to the compiler where it can be reused.
To make that possible, we need to decouple it from i965 - we can't get
at the brw program cache directly, nor use brw_context to print things.
Instead, we use compiler->shader_perf_log(), and simply pass in keys.

We put all of this debugging code in brw_debug_recompile.c, and only
export a single function, for simplicity.  I also tidied the code a
bit while moving it, now that it all lives in one file.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-04-16 09:01:15 -07:00
Mark Janes
2393cc7f00 intel/common: move gen_debug to intel/dev
libintel_common depends on libintel_compiler, but it contains debug
functionality that is needed by libintel_compiler.  Break the circular
dependency by moving gen_debug files to libintel_dev.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-10 13:15:33 -07:00
Jason Ekstrand
9d437f9482 intel/fs: Drop the fs_surface_builder
All of the actual abstraction (except possibly setting size_written)
happens as part of the logical opcodes.  The only thing that the surface
builder is providing at this point is extra levels of functions to call
through.  I'm going to be adding bindless image support soon and all the
extra abstraction here is just getting in the way.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Tapani Pälli
864cc419eb intel/isl: move tiled_memcpy static libs from i965 to isl
Patch moves intel_tiled_memcpy[_sse41] libraries to isl, renames some
functions and types and makes the required build system changes for
meson, automake and Android. No functional changes are introduced.

v2: code cleanups, move isl_get_memcpy_type to i965 (Jason)
v3: move isl_mem_copy_fn to priv header, cleanups (Jason, Dylan)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-10 08:02:30 +02:00
Francisco Jerez
2c99c7a56c intel/fs: Remove existing lower_conversions pass.
It's redundant with the functionality provided by lower_regioning now.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:09 -08:00
Francisco Jerez
efa4e4bc5f intel/fs: Introduce regioning lowering pass.
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling CHV/BXT restrictions manually all over the
visitor.  The same mechanism is leveraged to lower unsupported type
conversions easily, which obsoletes the lower_conversions pass.

v2: Give conditional modifiers the same treatment as predicates for
    SEL instructions in lower_dst_modifiers() (Iago).  Special-case a
    couple of other instructions with inconsistent conditional mod
    semantics in lower_dst_modifiers() (Curro).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:09 -08:00
Jason Ekstrand
6339aba775 intel/compiler: Lower SSBO and shared loads/stores in NIR
We have a bunch of code to do this in the back-end compiler but it's
fairly specific to typed surface messages and the way we emit them.
This breaks it out into NIR were it's easier to do things a bit more
generally.  It also means we can easily share the code between the vec4
and FS back-ends if we wish.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-11-15 19:59:49 -06:00
Lionel Landwerlin
b43f955037 anv: stub internal android code
This reduces the amount of #ifdef ANDROID we'll have to have inside
the driver. Potentially offering better coverage of the android
extensions.

v2: Move anv_android.h include before anv_entrypoints.h (Tapani)
    Fix autotools android build (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-11-06 15:28:07 +00:00
Jason Ekstrand
37f7983bcc intel/compiler: Do image load/store lowering to NIR
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end.  This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down.  In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end.  On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166910 -> 15166872 (<.01%)
    instructions in affected programs: 5895 -> 5857 (-0.64%)
    helped: 15
    HURT: 0

Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Keith Packard
54d0daa481 anv: Add KHR_display extension to anv [v7]
This adds support for the KHR_display extension to the anv Vulkan
driver. The driver now attempts to open the master DRM node when the
KHR_display extension is requested so that the common winsys code can
perform the necessary operations.

v2: Make sure primary fd is usable

	When KHR_display is selected, we try to open the primary node
	instead of the render node in case the user wants to use
	KHR_display for presentation. However, if we're actually going
	to end up using RandR leases, then we don't care if the
	resulting fd can't be used for display, but the kernel also
	prevents us from using it for drawing when someone else has
	master.

v3:
	Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to vulkan_wsi_args

	Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>

v4:
	Adapt primary node usage to new wsi_device_init API

v5:
	Adopt Jason Ekstrand's coding conventions

        Declare variables at first use, eliminate extra whitespace between
        types and names. Wrap lines to 80 columns.

	Remove spurious MM_PER_PIXEL define

        Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>

v6:
	Open DRM master before initializing WSI layer.

	The DRM master FD is passed to the WSI layer during
	initialization, so we need to open the device slightly earlier
	in the function.

	Close DRM master in device_finish.

	Use anv_gem_get_param to detect working master_fd instead of
	directly using the ioctl.

        Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>

v7:
	Add vkCreateDisplayModeKHR. This doesn't actually create
	new modes, it only looks to see if the requested parameters
	matches an existing mode and returns that.

    	Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>

Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-19 14:17:46 -07:00
Scott D Phillips
4714784dae anv: move canonical_address calculation into a separate function
A later patch will make use of this in other places. Also, remove
dependency on undefined behavior of left-shifting a signed value.

v2: - move function into a separate header (Chris)
v3: (by Ken) Add new header to the various build systems.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-05-27 19:24:33 -07:00
Kenneth Graunke
7c22c150c4 intel: Move batch decoder/disassembler from tools/ to common/
Making these part of libintel_common allows us to use them in the DRI
driver.  The standalone tool binaries already link against the common
library, too, so it's no harder for them.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-05-02 09:27:56 -07:00
Jason Ekstrand
dfe18be09e anv: Implement vkCmdDispatchBase
This is part of the device groups extension/feature but it's a decent
chunk of work in its own right so it's worth breaking into its own
patch.  The mechanism we use is fairly straightforward: we just push the
base work group id into the shader and add it to the work group id we
get from dispatch.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-03-07 12:13:47 -08:00
Jordan Justen
272bef0601 intel: Split gen_device_info out into libintel_dev
Split out the device info so isl doesn't depend on intel/common. Now
it will depend on the new intel/dev device info lib.

This will allow the decoder in intel/common to use isl, allowing us to
apply Ken's patch that removes the genxml duplication of surface
formats.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-03-05 09:47:37 -08:00
Tapani Pälli
4449a1f80d intel: add new common header gen_defines.h
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2018-02-28 14:36:57 +02:00
Anuj Phogat
9673c21d4f anv/icl: Build anv libs for gen11
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-02-16 11:10:32 -08:00
Anuj Phogat
bff24e2173 intel/isl/icl: Build and use gen11 surface state emit functions
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-02-15 16:14:55 -08:00
Anuj Phogat
165a68b05a intel/genxml/icl: Generate packing headers
Move build system changes in to one patch (Ken, Emil)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-02-15 16:14:55 -08:00
Jason Ekstrand
dd088d4bec anv/extensions: Generate a header file with extension tables
This allows us better introspection into extensions.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-01-23 00:15:40 -08:00
Bas Nieuwenhuizen
e5b1bd6ab8 vulkan: move anv VK_EXT_debug_report implementation to common code.
For also using it in radv. I moved the remaining stubs back to
anv_device.c as they were just trivial.

This does not move the vk_errorf/anv_perf_warn or the object
type macros, as those depend on anv types and logging.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-01-17 11:27:52 +01:00
Francisco Jerez
af2c320190 intel/fs: Implement GRF bank conflict mitigation pass.
Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput.  This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation.  It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.

In a shader-db run on SKL this helps roughly 46k shaders:

   total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
   conflicts in affected programs: 816222 -> 407702 (-50.05%)
   helped: 46234
   HURT: 72

The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.

On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies.  E.g. for a shader-db run on SNB:

   total conflicts in shared programs: 944636 -> 623185 (-34.03%)
   conflicts in affected programs: 853258 -> 531807 (-37.67%)
   helped: 31052
   HURT: 19

And on BDW:

   total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
   conflicts in affected programs: 1179787 -> 748933 (-36.52%)
   helped: 47592
   HURT: 70

On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.

NOTE: This patch intentionally disregards some i965 coding conventions
      for the sake of reviewability.  This is addressed by the next
      squash patch which introduces an amount of (for the most part
      boring) boilerplate that might distract reviewers from the
      non-trivial algorithmic details of the pass.

The following patch is squashed in:

SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.

Acked-by: Matt Turner <mattst88@gmail.com>
2017-12-07 15:56:06 -08:00
Matt Turner
821ec473a8 i965: Rename intel_asm_annotation -> brw_disasm_info
It was the only file named intel_* in the compiler.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-17 12:14:38 -08:00