This keeps the flags out of v3d_decode.c's output. In the generated code,
only the unpack functions see any change (where they now get the
restricted start value), and vc4 doesn't use the unpack functions yet.
I was writing the XML such that the address field overlapped various flags
in the alignment bits, which caused pain when trying to unpack for decode.
Instead, keep the XML matching the docs (address fields don't overlap),
and just infer the appropriate shift value during decode.
During pack, the address is just applied to the appropriate bits
already, ignoring the sub-byte start/end fields.
We simply pick r4 if available (anything else would force a MOV), then
round-robin through accumulators (avoids physical regfile RAW delay
slots), then round-robin through the physical regfile.
The effect on instruction count is pretty impressive:
total instructions in shared programs: 76563 -> 74526 (-2.66%)
instructions in affected programs: 66463 -> 64426 (-3.06%)
and we could probably do better with a little heuristic of "if we're going
to choose a physical reg, and other operands of instructions using this as
a src have the same physical regfile, then use the other regfile".
VC4 has had a tension, similar to pre-Sandybridge Intel, where we want to
use low-numbered registers (more parallelism on Intel, fewer delay slots
on vc4), but in order to give instruction scheduling the most freedom to
avoid delays we want to round-robin between registers of the same cost.
Our two heuristics so far have chosen one end or the other of that
tradeoff.
The callback, instead, hands the driver the set of registers that are
available, and the driver gets to make its own choice. This will be used
in vc4 to round-robin between registers of the same cost, and might be
used in the future for improving bank selection.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All the paths looping over adjacency had guards against considering
themselves (the non-obvious one was ra_any_neighbors_conflict(), which has
in_stack set).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I was going to indent this code another level, and decided it would be
easier to read as a helper.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Without this, a BlitFramebuffer would mark the whole framebuffer as being
changed (so we emit loads/stores of all of it) rather than just the
modified subset.
I don't know how I managed to leave this here for so long. Found when
working on a 1:1 overlapping blit extension for X11.
Cc: mesa-stable@lists.freedesktop.org
This gets us automatic CL decoding to a floating-point value, and drops a
magic number from the emit code. 250x250 shader runner tests now say they
have a center of 125.0 instead of 2000.
The device doesn't directly support this feature so we implement it with
additional shader code which sets the color output(s) w component to
1.0 (or max_int or max_uint).
Fixes 16 Piglit ext_framebuffer_multisample/*alpha-to-one* tests.
v2: only support unorm/float buffers, not int/uint, per Roland.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When we forcibly write white to FS outputs (for XOR mode emulation)
we were using a temp register. But that's not really necessary.
This also fixes the case of writing white to multiple color buffers.
Subsequent changes will build on this.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Performance delta on Core i5-4570 + Radeon R9 270:
Overlord: +20% in certain locations
Overlord II: +20% in certain locations
Oil Rush: +12% in most locations
War Thunder: +4-9% in benchmarks
Saints Row 2: +10-35% in certain locations
As Chris commented, it makes more sense to have batch buffer flushes
before the query. Usually applications like frame_retrace do a series
of queries and in that case, with flushes at the end of the queries,
we might still have the first query contained in 2 different batchs.
More generally it would be quite usual to have the query contained in
2 batch buffers because we never now what's the fill rate of the
current batch buffer.
If we move the flushing at the beginning of the queries, it's pretty
much guaranteed that queries will be contained in a single batch
buffer (unless the amount of commands is huge, but then it's only fair
to include reloading request times in the measurements).
Fixes: adafe4b733 ("i965: perf: minimize the chances to spread queries across batchbuffers")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Always initialise whandle.modifier for DRIImage modifier queries, so if
the driver doesn't support it then we return false for the query.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: d33fe8b84e ("st/dri: enable DRIimage modifier queries")
In the DRIImage queryImage hook, check if resource_get_handle() failed
and return FALSE if so.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For textures we must not approximate the calculation with `stride *
height`, or `slice_stride * depth`, as that can easily lead to buffer
overflows, particularly for partial transfers.
This should address the issue that Bruce Cherniak found and diagnosed.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Add uintptr_t cast to fix 'cast to pointer from integer of different size'
warning on 32bit build (build error on Android M).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Until we support sync fd, don't report the info.
Fixes CTS dEQP-VK.api.external.semaphore.sync_fd.* from crashing.
Fixes: eaa56eab6 (radv: initial support for shared semaphores (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
No need for all that switching when we can just assign a nice little
variable with the number of layers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gen4 have commands which start with KernelStartPointer, which is a
struct, so if we initialize it struct = { 0 }, we get warnings on some
compilers:
"GCC (pre 4.9?) can throw a Wmissing-braces on[1] while clang
-Wmissing-field-initializers [2]." - Emil
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119
[2] https://bugs.llvm.org/show_bug.cgi?id=21689
This change works around that and will silence such warnings. It is both
a GCC and a clang extension.
v2:
- Use {} instead of memset macro (Matt)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
With commit 5124bf9823, a framebuffer interface hash table is
created in st_gl_api_create(), which is called in
dri_init_screen_helper() for each screen. When the hash table is
overwritten with multiple calls to st_gl_api_create(), it can cause
race condition. This patch fixes the problem by creating a
framebuffer interface hash table per state tracker manager.
Fixes crash with steam.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101876
Fixes: 5124bf9823 ("st/mesa: add destroy_drawable interface")
Tested-by: Christoph Haag <haagch@frickel.club>
Reviewed-by: Brian Paul <brianp@vmware.com>
If the underlying driver does not support modifiers, dmabuf will still
advertise formats through the 'modifier' event, but send them with an
invalid modifier. Ignore them if this is the case, rather than passing
them through to the driver.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
This is similar to other functions that create objects.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
To return GL_OUT_OF_MEMORY if NewSamplerObject fails.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
To avoid inlining compressed_tex_sub_image() a bunch of times.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Otherwise we'll attemt to generate the header even we don't need to.
In that case the dependencies may not be met, leading to build failure.
Fixes: 166852e "configure.ac: rework wayland-protocols handling"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The extension should be in the list as returned by getExtensions().
Seems to have gone unnoticed since close to nobody wants to change the
vblank mode for the software driver.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This change updates wayland-egl-abi-check.c with the latest changes to
wl_egl_window.
Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We need wl_egl_window to be a versioned struct in order to keep track of
ABI changes.
This change makes the first member of wl_egl_window the version number.
An heuristic in the wayland driver is added so that we don't break
backwards compatibility:
- If the first field (version) is an actual pointer, it is an old
implementation of wl_egl_window, and version points to the wl_surface
proxy.
- Else, the first field is the version number, and we have
wl_egl_window::surface pointing to the wl_surface proxy.
Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>