This belongs to the protected memory feature but there's nothing about
it that's specific to protected memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is part of the device groups extension/feature but it's a decent
chunk of work in its own right so it's worth breaking into its own
patch. The mechanism we use is fairly straightforward: we just push the
base work group id into the shader and add it to the work group id we
get from dispatch.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This advertises the VK_KHR_shader_draw_parameters functionality as a
"core optimal feature" in Vulkan 1.1.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This requires us to rename any Vulkan API entrypoints which became core
in 1.1 to no longer have the KHR suffix.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
In this case, we say an entrypoint is supported if ANY of the extensions
is supported. This is because, in the XML, entrypoints don't require
extensions so much as extensions require entrypoints.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The original string map assumed that the mapping from strings to
entrypoints was a bijection. This will not be true the moment we
add entrypoint aliasing. This reworks things to be an arbitrary map
from strings to non-negative signed integers. The old one also had a
potential bug if we ever had a hash collision because it didn't do the
strcmp inside the lookup loop. While we're at it, we break things out
into a helpful class.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Fixes the following building errors:
external/mesa/src/intel/vulkan/anv_device.c:300: error: undefined reference to 'gen_get_pci_device_id_override'
external/mesa/src/intel/vulkan/anv_device.c:312: error: undefined reference to 'gen_get_device_name'
external/mesa/src/intel/vulkan/anv_device.c:313: error: undefined reference to 'gen_get_device_info'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: 272bef0601 "intel: Split gen_device_info out into libintel_dev"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This adds a missing library to the i965/Android.mk file, and updates
intel/Android.mk to include the new library. Without this, mesa does not
build on Android.
Fixes: 272bef0601 "intel: Split gen_device_info out into
libintel_dev"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Android surface/swapchain extensions are implemented by the loader. Patch
modifies both anv and radv extension scripts disabling currently exposed
ones. See also earlier commit 9f763c1f9b.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Just like commit 2ffe395 does for radv.
Fixes following dEQP test on i965:
dEQP-VK.api.info.android.no_unknown_extensions
v2: make it !ANDROID since this extension is not about
surfaces/swapchain
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We want people to be using ISL_FORMAT_*, rather than the genxml format
enumerations. This patch drops 10 separate copies, and drops a bunch
of ugly casting.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
[jordan.l.justen@intel.com: Minor changes for rebase]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Split out the device info so isl doesn't depend on intel/common. Now
it will depend on the new intel/dev device info lib.
This will allow the decoder in intel/common to use isl, allowing us to
apply Ken's patch that removes the genxml duplication of surface
formats.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reduces my build from 6451 warnings to 6301 warnings by silencing 150
instances of
../../SOURCE/master/src/intel/compiler/brw_inst.h: In function ‘brw_reg_type brw_inst_src1_type(const gen_device_info*, const brw_inst*)’:
../../SOURCE/master/src/intel/compiler/brw_inst.h:802:55: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
unsigned file = __builtin_strcmp("dst", #reg) == 0 ? \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
BRW_GENERAL_REGISTER_FILE : \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
brw_inst_##reg##_reg_file(devinfo, inst); \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../SOURCE/master/src/intel/compiler/brw_inst.h:811:1: note: in expansion of macro ‘REG_TYPE’
REG_TYPE(src1)
^~~~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
These days, we're just passing a pointer to a prog_data field, which
we already have access to. We can just use it directly.
(In the past, it was a pointer to a separate value.)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Commit bit in the message descriptor (Bit 13) must be always set
to true in CNL+ for memory fence messages. It also fixes a piglit
GPU hang on cnl+ in simulation environment.
Piglit test: arb_shader_image_load_store-shader-mem-barrier
See HSD ES # 1404612949
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This reverts commit a4031bdfa9. It's
redundant with the sample mask predication done at this point by the
common logical send lowering infrastructure, and rather buggy because
it wasn't applying the correct sample mask in shaders using discard,
since the dispatch mask returned by FS_OPCODE_MOV_DISPATCH_TO_FLAGS
doesn't reflect samples discarded by the shader, so it could have led
to data corruption in fragment shader invocations that execute discard
based on a non-dynamically uniform condition.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The main motivation is to enable HDC surface opcodes on ICL which no
longer allows the sample mask to be provided in a message header, but
this is enabled all the way back to IVB when possible because it
decreases the instruction count of some shaders using HDC messages
significantly, e.g. one of the SynMark2 CSDof compute shaders
decreases instruction count by about 40% due to the removal of header
setup boilerplate which in turn makes a number of send message
payloads more easily CSE-able. Shader-db results on SKL:
total instructions in shared programs: 15325319 -> 15314384 (-0.07%)
instructions in affected programs: 311532 -> 300597 (-3.51%)
helped: 491
HURT: 1
Shader-db results on BDW where the optimization needs to be disabled
in some cases due to hardware restrictions:
total instructions in shared programs: 15604794 -> 15598028 (-0.04%)
instructions in affected programs: 220863 -> 214097 (-3.06%)
helped: 351
HURT: 0
The FPS of SynMark2 CSDof improves by 5.09% ±0.36% (n=10) on my SKL
laptop with this change. According to Eero this improves performance
of the same test by 9% on BYT and by 7-8% on BXT J4205 and on SKL GT2
desktop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
This makes sure that the header-present bit of the message descriptor
is in sync with the IR instruction fields, which gives the optimizer
more control to avoid the overhead of setting up a message header when
it's possible to do so.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This shouldn't cause any functional change at this point, it changes
SHADER_OPCODE_FIND_LIVE_CHANNEL to use the flag register specified at
the IR level instead of the hard-coded f1.0, now that it can be
represented in backend_instruction::flag_subreg. This will be
necessary for scheduling to behave correctly once more things start
making use of f1.0.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows representing conditional mods and predicates on f1.0-f1.1
at the IR level by adding an extra bit to the flag_subreg
backend_instruction field.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
SLM has a chunk of special-purpose memory separate from L3 on ICL+, we
shouldn't allocate a partition for it on L3 anymore.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This gives the scheduler visibility into the headers which should
improve scheduling. More importantly, however, it lets the scheduler
know that the header gets written. As-is, the scheduler thinks that a
texture instruction only reads it's payload and is unaware that it may
write to the first register so it may reorder it with respect to a read
from that register. This is causing issues in a couple of Dota 2 vertex
shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104923
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This speeds up the Sascha Willems multisampling demo by around 25% when
using 8x or 16x MSAA.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We'll want to re-use the complex resolve predicate computations for MCS
resolves so it's nice to have them as helper functions.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This doesn't actually do anything because att_state->fast_clear is
determined based on the return value of anv_layout_to_fast_clear_type
which currently returns NONE for multisampled images.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is a bit complicated because we have to get the indirect clear
color in there somehow. In order to not do any more work in the shader
than needed, we set it up as it's own vertex binding which points
directly at the clear color address specified by the client.
Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There are enough #ifs in there that it's kind-of pointless to duplicate
it for each buffer.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>