Otherwise, only the first vec4 of a matrix or other complex type will
get marked as flat and we'll interpolate the others. This was caught by
a dEQP test which started failing because it did a SSO vs. non-SSO
comparison. Previously, we did the interpolation wrong consistently in
both versions. However, with one of Tim Arceri's NIR linkingpatches, we
started splitting the matrix input into vectors at link time in the
non-SSO version and it started getting correctly interpolated which
didn't match the broken SSO version. As of this commit, they both get
correctly interpolated.
Fixes: e61cc87c75 "i965/fs: Add a flat_inputs field to prog_data"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Gen10+ has an additional bit in MI_BATCH_BUFFER_END to signal the end
of the context image.
We select the largest size for the context image regardless of the
generation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
In both Python 2 and 3, zlib.Compress.compress() takes a byte string,
and returns a byte string as well.
In Python 2, the script was working because:
1. string literalls were byte strings;
2. opening a file in unicode mode, reading from it, then passing the
unicode string to compress() would automatically encode to a byte
string;
On Python 3, the above two points are not valid any more, so:
1. zlib.Compress.compress() refuses the passed unicode string;
2. compressed_data, defined as an empty unicode string literal, can't be
concatenated with the byte string returned by compress();
This commit fixes this by explicitly using byte strings where
appropriate, so that the script works on both Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The XML parser wants byte strings, not unicode strings.
In both Python 2 and 3, opening a file without specifying the mode will
open it for reading in text mode ('r').
On Python 2, the read() method of the file object will return byte
strings, while on Python 3 it will return unicode strings.
Explicitly specifying the binary mode ('rb') makes the behaviour
identical in both Python 2 and 3, returning what the XML parser
expects.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In Python 2, iterating over a byte-string yields single-byte strings,
and we can pass them to ord() to get the corresponding integer.
In Python 3, iterating over a byte-string directly yields those
integers.
Transforming the byte string into a bytearray gives us a list of the
integers corresponding to each byte in the string, removing the need to
call ord().
This makes the script compatible with both Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The hardware doesn't support byte immediates, so similar to setup_imm_df()
for doubles, these helpers work by loading the constant value into a
VGRF.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to
disable prefetching of binding tables for ICLLP A0 and B0
steppings. It fixes multiple gpu hangs in
ext_framebuffer_multisample* tests on ICLLP B0 h/w.
Anuj: Add comments and commit message.
Add gen 11 checks in the code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The pass can create a temporary result for the instruction and then
moves from it to the original destination, however, if the original
instruction was predicated, the mov has to be predicated as well.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases. Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward. This simplified the
passes, but made life painful for the callers.
This patch combines both into a single pass. If you give it a non-zero
static count, it uses that. If you give it Mesa state slots, it turns
it back into a built-in uniform. Otherwise, it does nothing.
This also moves the i965 uniform lowering out to shared code.
v2: Make token arrays const.
Reviewed-by: Eric Anholt <eric@anholt.net>
These are lowered by brw_nir_lower_vs_inputs(). If they weren't, we
would have already hit the unreachable() in emit_system_values_block().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The various base addresses are simply addresses. There may or may not
be a buffer located at those addresses. So, it doesn't make much sense
to request one. Just save the raw address so we can add it later, when
asking about BOs at the final <base + offset> address.
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Normally, i965 programs STATE_BASE_ADDRESS every batch, and puts all
state for a given base in a single buffer.
I'm working on a prototype which emits STATE_BASE_ADDRESS only once at
startup, where each base address is a fixed 4GB region of the PPGTT.
State may live in many buffers in that 4GB region, even if there isn't
a buffer located at the actual base address itself.
To handle this, we need to save the STATE_BASE_ADDRESS values across
multiple batches, rather than assuming we'll see the command each time.
Then, each time we see a pointer, we need to ask the driver for the BO
map for that data. (We can't just use the map for the base address, as
state may be in multiple buffers, and there may not even be a buffer
at the base address to map.)
v2: Fix things caught in review by Lionel:
- Drop bogus bind_bo.size check.
- Drop "get the BOs again" code - we just get the BOs as needed
- Add a message about interface descriptor data being unavailable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CovID: 1438132
Fixes: a99c9e63a0 "anv: finish the binding_table_pool on
destroyDevice when use_softpin"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
We might fail on master node drm fd because we won't have the right
permissions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Since various options within INTEL_DEBUG could impact code generation,
we need to set the disk cache driver_flags parameter based on the
INTEL_DEBUG flags in use.
An example that will affect the program generated by i965 is the
INTEL_DEBUG=nocompact option.
The DEBUG_DISK_CACHE_MASK value is added to mask the settings of
INTEL_DEBUG that can affect program generation.
v2:
* Use driver_flags (Tim)
* Also update Anvil (Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This extra character should not be used by snprintf, but we make it
available to verify that we printed the exact number we wanted, and
didn't overflow.
v2:
* Also update Anvil
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
we need rounding modes on other conversions involving floats and it is easier
to rename f2f16_undef than renaming all the other ones.
v2: rebased on master
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Python 2 has a range() function which returns a list, and an xrange()
one which returns an iterator.
Python 3 lost the function returning a list, and renamed the function
returning an iterator as range().
As a result, using range() makes the scripts compatible with both Python
versions 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In Python 2, dictionaries have 2 sets of methods to iterate over their
keys and values: keys()/values()/items() and iterkeys()/itervalues()/iteritems().
The former return lists while the latter return iterators.
Python 3 dropped the method which return lists, and renamed the methods
returning iterators to keys()/values()/items().
Using those names makes the scripts compatible with both Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant. One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader. This fixes some
new Vulkan CTS tests that test random combinations of bindings and, in
particular, test lots of UBOs and images together.
Cc: mesa-stable@lists.freedesktop.org
Cc: Kenneth Graunke <kenneth@whitecape.org>
It's actually also a bit safer, since now the compiler will warn if
the string is larger than the `.name` array.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to the spec, these should apply to all read/write access
types (so would be equivalent to specifying all other access types
individually). Currently, they were doing nothing.
v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The first fix attempt contained a nasty typo which somehow didn't get
caught in review. It also didn't work as intended because the sRGB
conversion was happening but then throwing away all but the red channel
because it dind't know it was RGB. Really, it's my fault for trying to
fix a bug without first writing tests. I've now written tests and they
pass with this change. :)
Fixes: 11712b9ca1 "intel/blorp: Fix blits to R8G8B8_UNORM_SRGB"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We've had several broadwell hangs that have come down to this bit just
not working correctly. Most recently, we've had a pile of hangs
reported with apps running under DXVK:
https://github.com/doitsujin/dxvk/issues/469
Instead, use the bit that doesn't try to imply weird D3D coherency
things and just force-enables the PS like we want.
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We support mipmapped and arrayed linear images so we need to support
vkGetImageSubresourceLayout on them. Fortunately, it's just a trivial
call into ISL.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Note that the use of ICMS_INNER_CONSERVATIVE disagrees with the GL driver.
Perhaps it's more performant than ICMS_NORMAL and is otherwise permitted?
Not sure, so I left it as-is.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When running gdb, make sure to pass the LD_PRELOAD variable only to
the executed program, not the debugger. Otherwise the debugger will
run the preloaded constructor/destructor too and bad things will
happen.
Suggested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The problem with passing the configuration of the dump lib through a
file descriptor is that it can be read only once. But under gdb you
might want to rerun your program multiple times.
This change hands the configuration through a temporary file that is
deleted once the command line passes to intel_dump_gpu has exited.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Rendering to a linear depth buffer on gen4 is causing a GPU hang in the
CI system. Until a better explanation is found, assume that errata is
applicable to all gen4 platforms.
Fixes fbe01625f6
("i965/miptree: Share tiling_flags in miptree_create").
Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107248
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In commit 86cb05a6d3 ("intel: aubinator: remove standard input
processing option") we removed the ability to process aub as an input
stream because we're now rely on mmapping the aub file to back the
buffers aubinator is parsing.
intel_aubdump was the provider of the standard input data and since
we've copied/reworked intel_aubdump into intel_dump_gpu within Mesa,
we don't need that code anymore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This lets us move the glBlitFramebuffer nonsense into the GL driver and
make the usage of BLORP mutch more explicit and obvious as to what it's
doing.
Reviewed-by: Chad Versace <chadversary@chromium.org>
At the moment, this is entirely internal but we'll expose it to clients
of the BLORP API in the next commit.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Instead of having quite so many singletons, we use a struct aub_file to
organize the bits we need for writing an aub file.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
For large buffers which span an entire l1 page table, we got the range
calculations wrong. In this case, we end up with an l1_start which is
the first byte represented by the given l1 table and an l1_end which is
the first byte after the range represented by the l1 table. Then
l2_start_index == L2_index(l2_end) due to roll-over. Instead, compute
lN_end using (1Ull << shift) - 1 so that lN_end is the last byte in the
range represented by the Nth level page table. When we do this, we
don't need the conditional expression anymore.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Code assumes that all the necessary fields will exist, but compiler
doesn't know about this. Provide zero as default values, like in other
decoding functions.
Fixes warnings
../../src/intel/common/gen_batch_decoder.c: In function ‘handle_media_interface_descriptor_load’:
../../src/intel/common/gen_batch_decoder.c:347:7: warning: ‘binding_entry_count’ may be used uninitialized in this function [-Wmaybe-uninitialized]
dump_binding_table(ctx, binding_table_offset, binding_entry_count);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c:347:7: warning: ‘binding_table_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
../../src/intel/common/gen_batch_decoder.c:346:7: warning: ‘sampler_count’ may be used uninitialized in this function [-Wmaybe-uninitialized]
dump_samplers(ctx, sampler_offset, sampler_count);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c:346:7: warning: ‘sampler_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
../../src/intel/common/gen_batch_decoder.c:343:7: warning: ‘ksp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ctx_disassemble_program(ctx, ksp, "compute shader");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c: In function ‘decode_dynamic_state_pointers’:
../../src/intel/common/gen_batch_decoder.c:663:54: warning: ‘state_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
const uint32_t *state_map = ctx->dynamic_base.map + state_offset;
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c: In function ‘gen_print_batch’:
../../src/intel/common/gen_batch_decoder.c:856:13: warning: ‘next_batch.map’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (next_batch.map == NULL) {
^
../../src/intel/common/gen_batch_decoder.c:860:13: warning: ‘next_batch.addr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
gen_print_batch(ctx, next_batch.map, next_batch.size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next_batch.addr);
~~~~~~~~~~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>