Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.
v2:
- Consider fp16 rounding conversion opcodes
- Properly handle swizzles on conversion sources.
v3
- Run the pass earlier, right after nir_opt_algebraic_late (Jason)
- NIR alu output types already have the bit-size (Jason)
- Use 'is_conversion' to identify conversion operations (Jason)
v4:
- Be careful about the intermediate types we use so we don't lose
range and avoid incorrect rounding semantics (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We'll want to reuse those structures later on.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
We would like to reuse performance query metrics in other APIs. Let's
make the query code dealing with the processing of raw counters into
human readable values API agnostic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The i965 driver has a bunch of code to compare two sets of program keys
and print out the differences. This can be useful for debugging why a
shader needed to be recompiled on the fly due to non-orthogonal state
dependencies. anv doesn't do recompiles, so we didn't need to share
this in the past - but I'd like to use it in iris.
This moves the bulk of the code to the compiler where it can be reused.
To make that possible, we need to decouple it from i965 - we can't get
at the brw program cache directly, nor use brw_context to print things.
Instead, we use compiler->shader_perf_log(), and simply pass in keys.
We put all of this debugging code in brw_debug_recompile.c, and only
export a single function, for simplicity. I also tidied the code a
bit while moving it, now that it all lives in one file.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
libintel_common depends on libintel_compiler, but it contains debug
functionality that is needed by libintel_compiler. Break the circular
dependency by moving gen_debug files to libintel_dev.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All of the actual abstraction (except possibly setting size_written)
happens as part of the logical opcodes. The only thing that the surface
builder is providing at this point is extra levels of functions to call
through. I'm going to be adding bindless image support soon and all the
extra abstraction here is just getting in the way.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Patch moves intel_tiled_memcpy[_sse41] libraries to isl, renames some
functions and types and makes the required build system changes for
meson, automake and Android. No functional changes are introduced.
v2: code cleanups, move isl_get_memcpy_type to i965 (Jason)
v3: move isl_mem_copy_fn to priv header, cleanups (Jason, Dylan)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling CHV/BXT restrictions manually all over the
visitor. The same mechanism is leveraged to lower unsupported type
conversions easily, which obsoletes the lower_conversions pass.
v2: Give conditional modifiers the same treatment as predicates for
SEL instructions in lower_dst_modifiers() (Iago). Special-case a
couple of other instructions with inconsistent conditional mod
semantics in lower_dst_modifiers() (Curro).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We have a bunch of code to do this in the back-end compiler but it's
fairly specific to typed surface messages and the way we emit them.
This breaks it out into NIR were it's easier to do things a bit more
generally. It also means we can easily share the code between the vec4
and FS back-ends if we wish.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This reduces the amount of #ifdef ANDROID we'll have to have inside
the driver. Potentially offering better coverage of the android
extensions.
v2: Move anv_android.h include before anv_entrypoints.h (Tapani)
Fix autotools android build (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end. This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down. In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end. On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166910 -> 15166872 (<.01%)
instructions in affected programs: 5895 -> 5857 (-0.64%)
helped: 15
HURT: 0
Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds support for the KHR_display extension to the anv Vulkan
driver. The driver now attempts to open the master DRM node when the
KHR_display extension is requested so that the common winsys code can
perform the necessary operations.
v2: Make sure primary fd is usable
When KHR_display is selected, we try to open the primary node
instead of the render node in case the user wants to use
KHR_display for presentation. However, if we're actually going
to end up using RandR leases, then we don't care if the
resulting fd can't be used for display, but the kernel also
prevents us from using it for drawing when someone else has
master.
v3:
Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v4:
Adapt primary node usage to new wsi_device_init API
v5:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Remove spurious MM_PER_PIXEL define
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v6:
Open DRM master before initializing WSI layer.
The DRM master FD is passed to the WSI layer during
initialization, so we need to open the device slightly earlier
in the function.
Close DRM master in device_finish.
Use anv_gem_get_param to detect working master_fd instead of
directly using the ioctl.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v7:
Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A later patch will make use of this in other places. Also, remove
dependency on undefined behavior of left-shifting a signed value.
v2: - move function into a separate header (Chris)
v3: (by Ken) Add new header to the various build systems.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Making these part of libintel_common allows us to use them in the DRI
driver. The standalone tool binaries already link against the common
library, too, so it's no harder for them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is part of the device groups extension/feature but it's a decent
chunk of work in its own right so it's worth breaking into its own
patch. The mechanism we use is fairly straightforward: we just push the
base work group id into the shader and add it to the work group id we
get from dispatch.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Split out the device info so isl doesn't depend on intel/common. Now
it will depend on the new intel/dev device info lib.
This will allow the decoder in intel/common to use isl, allowing us to
apply Ken's patch that removes the genxml duplication of surface
formats.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For also using it in radv. I moved the remaining stubs back to
anv_device.c as they were just trivial.
This does not move the vk_errorf/anv_perf_warn or the object
type macros, as those depend on anv types and logging.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput. This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation. It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.
In a shader-db run on SKL this helps roughly 46k shaders:
total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
conflicts in affected programs: 816222 -> 407702 (-50.05%)
helped: 46234
HURT: 72
The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.
On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies. E.g. for a shader-db run on SNB:
total conflicts in shared programs: 944636 -> 623185 (-34.03%)
conflicts in affected programs: 853258 -> 531807 (-37.67%)
helped: 31052
HURT: 19
And on BDW:
total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
conflicts in affected programs: 1179787 -> 748933 (-36.52%)
helped: 47592
HURT: 70
On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.
NOTE: This patch intentionally disregards some i965 coding conventions
for the sake of reviewability. This is addressed by the next
squash patch which introduces an amount of (for the most part
boring) boilerplate that might distract reviewers from the
non-trivial algorithmic details of the pass.
The following patch is squashed in:
SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.
Acked-by: Matt Turner <mattst88@gmail.com>
It was the only file named intel_* in the compiler.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm bringing up Vulkan in the Android container of Chrome OS (ARC++).
On Android, stdio goes to /dev/null. On Android, remote gdb is even more
painful than the usual remote gdb. On Android, nothing works like you
expect and debugging is hell. I need logging.
This patch introduces a small, simple logging API that can easily wrap
Android's API. On non-Android platforms, this logger does nothing fancy.
It follows the time-honored Unix tradition of spewing everything to
stderr with minimal fuss.
My goal here is not perfection. My goal is to make a minimal, clean API,
that people hate merely a little instead of a lot, and that's good
enough to let me bring up Android Vulkan. And it needs to be fast,
which means it must be small. No one wants to their game to miss frames
while aiming a flaming bow into the jaws of an angry robot t-rex, and
thus become t-rex breakfast, because some fool had too much fun desiging
a bloated, ideal logging API.
If people like it, perhaps we should quickly promote it to src/util.
The API looks like this:
#define INTEL_LOG_TAG "intel-vulkan"
#define DEBUG
intel_logd("try hard thing with foo=%d", foo);
n = try_foo(...);
if (n < 0) {
intel_loge("%s:%d: foo failed bigtime", __FILE__, __LINE__);
return VK_ERROR_DEVICE_LOST;
}
And produces this on non-Android:
intel-vulkan: debug: try hard thing with foo=93
intel-vulkan: error: anv_device.c:182: foo failed bigtime
v2: Fix meson build. [for dcbaker]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's already only ever called from brw_compile_cs and only handles
compute intrinsics. Let's just make it CS-specific. We can always
make it handle other stages again later if we want.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This pass implements all the implicit conversions required by the
VK_KHR_sampler_ycbcr_conversion specification.
It also inserts plane sources onto sampling instructions that we then
let the pipeline layout pass deal with, when mapping things correctly
to descriptors.
v2: Add new file to meson build (Lionel)
Use nir_frcp() rather than (1.0f / x) (Jason)
Reuse nir_tex_instr_dest_size() rather than handwritten one (Jason)
Return progress (Jason)
Account for array of samplers (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes:
CC isl/isl_format_layout.lo
In file included from
../../../../src/intel/isl/isl_storage_image.c:24:0:
../../../../src/intel/isl/isl_priv.h:170:29: fatal error:
isl_genX_priv.h: No such file or directory
compilation terminated.
Makefile:2936: recipe for target 'isl/isl_storage_image.lo' failed
make[5]: *** [isl/isl_storage_image.lo] Error 1
make[5]: *** Waiting for unfinished jobs....
In file included from ../../../../src/intel/isl/isl.c:36:0:
../../../../src/intel/isl/isl_priv.h:170:29: fatal error:
isl_genX_priv.h: No such file or directory
compilation terminated.
make[5]: *** [isl/isl.lo] Error 1
Makefile:2936: recipe for target 'isl/isl.lo' failed
make[4]: *** [all] Error 2
when running `make distcheck`.
v2: Fix commit title (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Patch adds required functionality for extension to manage a list of
application provided callbacks and handle debug reporting from driver
and application side.
v2: remove useless helper anv_debug_report_call
add locking around callbacks list
use vk_alloc2, vk_free2
refactor CreateDebugReportCallbackEXT
fix bugs found with crucible testing
v3: provide ANV_FROM_HANDLE and use it
misc fixes for issues Jason found
use vk_find_struct_const for finding ctor_cb
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I'm going to encapsulate all of the logic dealing with register types in
this file.
Rename the parameters for the hardware encodings from type -> hw_type at
the same time.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
this change reverts commit 4f695731, we want to be able to build
with -DDEBUG and gen_decoder on Android.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
As time goes on, extension advertising is going to get more complex.
Today, we either implement an extension or we don't. However, in the
future, whether or not we advertise an extension will depend on kernel
or hardware features. This commit introduces a python codegen framework
that generates the anv_EnumerateFooExtensionProperties functions as well
as a pair of anv_foo_extension_supported functions for querying for the
support of a given extension string. Each extension has an "enable"
predicate that is any valid C expression. For device extensions, the
physical device is available as "device" so the expression could be
something such as "device->has_kernel_feature". For instance
extensions, the only option is VK_USE_PLATFORM defines.
This mechanism also means that we have a single one-line-per-entry table
for all extension declarations instead of the two tables we had in
anv_device.c and the one we had in anv_entrypoints_gen.py. The Python
code is smart and uses the XML to determine whether an extension is an
instance extension or device extension.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This adds a NIR pass that decides which portions of UBOS we should
upload as push constants, rather than pull constants.
v2: Switch to uint16_t for the UBO block number, because we may
have a lot of them in Vulkan (suggested by Jason). Add more
comments about bitfield trickery (requested by Matt).
v3: Skip vec4 stages for now...I haven't finished wiring up support
in the vec4 backend, and so pushing the data but not using it
will just be wasteful.
Reviewed-by: Matt Turner <mattst88@gmail.com>
I want to remove vc4's dependency on headers from libdrm as well, but
storing multiple copies of drm_fourcc.h in our tree would be silly.
v2: Update Android.mk as well, move distcheck drm*.h references to
top-level noinst_HEADERS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Rob Herring <robh@kernel.org>
I want to use these in the OpenGL driver as well.
v2: Add to COMMON_FILES in Makefile.sources (caught by Emil)
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Matt Turner <mattst88@gmail.com>
With Ken's work to drop the library dependency on libdrm_intel, we now
only depend on libdrm for the kernel uapi headers it provides. It
seems like we're better off just embeddeding those headers ourselves,
making the lives of people developping news features tightly
integrated with the kernel a tiny bit easier.
This change also makes it a bit more obvious what cflags/libs are
required by the i915 drivers vs i965, by renaming INTEL_CFLAGS/LIBS
into I915_CFLAGS/LIBS.
Headers were generated from drm-tip on the following commit :
commit 6d61e70ccc21606ffb8a0a03bd3aba24f659502b
Merge: 338ffbf7cb5e c0bc126f97fb
Author: Dave Airlie <airlied@redhat.com>
Date: Tue Jun 27 07:24:49 2017 +1000
Backmerge tag 'v4.12-rc7' into drm-next
v2: Use installed files from the kernel (Daniel Vetter)
v3: Use headers from drm-next rather than drm-tip (Dave/Daniel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch just enables building Vulkan libs for gen10. We
still don't have gen 10 support enabled on Vulkan.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 (Jason Ekstrand):
- Take a view_mask rather than a whole subpass
- Build the view mask into the VS shader key
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2:
- Change the name to lower_conversions.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Combining all the files into a single string didn't make any
difference in the size of the aubinator binary.
With this change we now also embed gen4/4.5/5 descriptions, which
increases the aubinator size by ~16Kb.
v2 (Lionel): rebase makefiles
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>