Instead of returning valid types as just a number, we now walk the list
and check the buffer's usage against the usage flags we store in the new
anv_memory_type structure. Currently, valid_buffer_usage == ~0.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Before, we were just comparing the type index to 0. Now we actually
look the type up in the table and check its properties to determine what
kind of mapping we want to do.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
This doesn't matter right now since it only affects whether or not we
set the kernel bit but, if we ever do anything else based on it, we'll
want it to be correct per-gen.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Up until now, we've been memsetting the auxiliary surface to 0 at
BindImageMemory time to ensure that it is properly initialized.
However, this isn't correct because apps are allowed to freely alias
memory between different images and buffers so long as they properly
track whether or not a particular image is valid and, if it isn't,
transition from UNINITIALIZED to something else before using it. We
now implement those transitions so we can drop the hack.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
This causes dEQP-VK.api.copy_and_blit.resolve_image.partial.* to start
failing due to test bugs. See CL 1031 for a test fix.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
A few cleanups and in particular initializing properly
the new pipe_draw_info fields.
This should fix the regression caused by
330d0607ed
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101088
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Mark the functions 'exec="skip"' in the XML instead. libGL will still
have the functions, but the driver won't try to use them. I verified
that this commit works with piglit's 'object-namespace-pollution glClear
vertex-array' on x64 with a driver built from mesa-12.0.3 tag.
In fairness, this test also works with a libGL built from 7927d03. I
believe it continues to work because on non-Windows platforms we
generate some extra, dummy dispatch functions that can be used when a
driver requests a function unknown to libGL. This was done to provide
some "forward" compatibility with drivers that need more functions.
This doesn't work on Windows because the Windows calling convention is
for the callee to clean up the stack. That's the theory anyway.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that we lower vars to regs, we no longer regress for anything that
does complex dereferences. (With tgsi, derefers are already lowered
before tgsi_to_nir, but not with glsl_to_nir.) In fact it actually
fixes a few things to bypass tgsi.
So make NIR the default (finally!)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Instead of using load/store_var intrinsics, which can have complex
derefs in the case of multi-dimensional arrays, lower these to regs
and handle the direct/indirect loads in get_src() and stores in
put_dst().
This should let us switch to using nir by default.
Signed-off-by: Rob Clark <robdclark@gmail.com>
standalone_compiler_cleanup() frees the glsl types, among other things,
so it needs to come after nir->ir3. But since we exit after dumping the
disassembly, it is easier to just not call it at all.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Won't ever hit this w/ a420 gpu, so this is dead code. Need to get astc
working to know whether to rip this out entirely or not.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Maybe there is a better way to do this. But by the time we get to
assigning uniform locs, we want the atomic_uint's to all be gone,
otherwise we assert in st_glsl_attrib_type_size().
Signed-off-by: Rob Clark <robdclark@gmail.com>
I wasn't sure if I should filter the flags so that we only use
flags that actually change the shader output. To avoid manual
updates we just pass in everything for now.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be used for things such as adding driver specific environment
variables to the key. Allowing us to set environment vars that change
the shader and not have the driver ignore them if it finds existing
shaders in the cache.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This prevents spurious failures when libtxc-dxtn-s2tc is installed.
Note: lp_test_format doesn't need any change since we were already
ignoring S3TC failures there.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Not really what the fast depth clear does, no matter whether you use
EXPCLEAR or not. Seems the fast clear using the DB HW always touches
the main buffer.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Did some RE'ing what several HTILE words give when read from a descriptor
with HTILE compression enabled.
Seems to align with -pro usage for D16 too.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
And correct implementation to specify only what we support.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Coverity caught the use of dead code copy-paste for
found_colors[] and num_found_colors.
CID: 1341850
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We're already verified that 'window' wasn't NULL, I'm guessing this
allocation error is about the newly created queue.
CID: 1409754
Fixes: 03dd9a88b0 ("egl/wayland: Use per-surface event queues")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
APPLE_vertex_array_object support was removed in 7927d0378f.
However it turns out we can't remove the functions because this
can cause issues when libglapi is used together with DRI
drivers built prior to said commit
Fixes: 7927d0378f ("mesa: drop APPLE_vertex_array_object support")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Potentially more efficient as it may avoid the struct being initialised
twice.
Also add var to the initialisation list while we are here.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If the str is long or isn't null-terminated, strlen() could take a lot
of time or even crash. I don't know why was it used in the first place,
maybe for platforms without strnlen(), but strnlen() is already used
inside of ralloc_strndup(), so this change should not additionally
break anything.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Overwhelming majority of shaders don't use line continuations. In my
shader-db only shaders from the Talos Principle and Serious Sam used
them, less than 1% out of all shaders. Optimize for this case, don't
do any copying if no line continuation was found.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
strcmp() is slow. Initiate comparison with "__LINE__" or "__FILE__"
only if the identifier starts with '_', which is rare.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is shorter and easier on the eyes. At the same time this
also ensures that we are always asserting that the table pointer
is not NULL. Currently that was not done for all situations.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Use our knowledge that pointers are at least 4 byte aligned to remove
the useless digits. Then shift by 6, 10, and 14 bits and add this to
the original pointer, effectively folding in the entropy of the higher
bits of the pointer into a 4-bit section. Stopping at 14 means we can
add the entropy from 18 bits, or at least a 600Kbyte section of memory.
Assuming that ralloc allocates from a linearly allocated heap less than
this we can make a very efficient pointer hashing function for our usecase.
Even if we are not on an architecture that is 4 byte aligned, there is
still a high big chance that the thing we are allocating is at least
8 bytes in size, so even then we will have entropy into the third bit.
The 4 bit increment on the shifts is chosen rather arbitrarily; if we
had chosen a 3 bit increment we would need to add another xor to
cover a decently sized memorypool. Increasing it to 5 bits would
spread our entropy more, possibly hurting us with more collisions on
hash tables of size less than 32. With a hash table of size 16 there
are a max of 11 entries, and we can assume that with such a small table
collisions are not that painfull.
This allows us to hash the whole 32 or 64 bit pointer at once,
instead of running FNV1a, looping through each byte and doing
increments, decrements, muls, and xors on every byte. This cuts
_mesa_hash_data from 1.5 % on profiles, to making _mesa_hash_pointer
show up with a 0.09% share. Collisions on insertion actually seems to be
ever so slightly lower with this hash function, as found by printing
a loop counter and sorting the data.
perf stat shows a 1.5% reduction in instruction count,
and a 5% reduction in stalled cycles. Shader-db runtime goes
from 225 to 220 seconds.
No instruction-count changes in shader-db, but there are some minor
changes in cycle-count that is likely caused by nir walking a set
in some of its passes, and this causing a different ordering.
That might eventually lead to a difference in register allocation.
However, the effect is a net positive;
total cycles in shared programs: 24739550 -> 24738482 (-0.00%)
cycles in affected programs: 374468 -> 373400 (-0.29%)
helped: 178
HURT: 49
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Before the swapchain event queue is destroyed, all proxy objects that reference
it must be dropped. Otherwise we risk a use-after-free if a frame callback event
or buffer release events are received afterwards.
This happens when an application destroys and recreates a swapchain in FIFO
mode between two frames without using the VkSwapchainCreateInfoKHR::oldSwapchain
mechanism to keep the old swapchain until after the next redraw.
Fixes: 5034c61558 ("vulkan/wsi/wayland: Use proxy wrappers for swapchain")
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org