There's no reason why we need to use the calculated payload_node_count
value which is just first_non_payload_grf aligned up. The grf_used
value will be aligned up to 16 anyway (which is a much bigger alignment)
before being handed off to hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We only have one node per VGRF so this was adding way too much
interference. No idea how we didn't catch this before.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15311100 -> 15311100 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 355468050 -> 355543197 (0.02%)
cycles in affected programs: 2472492 -> 2547639 (3.04%)
helped: 17
HURT: 20
Fixes: 014edff0d2 "intel/fs: Add interference between SENDS sources"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to be able to call ra_allocate() and, when it fails, mutate the
graph and try again rather than re-building the graph from scratch.
This commit moves all the scratch bits except the final register
allocation (which is really an out value not scratch) into sub-structs
named "tmp" to make it clear which things are scratch. It also adds
bits to the ra_select() initialization loop to initialize things (since
we can't trust rzalloc anymore) and copy q_test and forced_reg over.
Reviewed-by: Eric Anholt <eric@anholt.net>
Unfortunately, we can't quite follow the standard C conventions for
these because ralloc doesn't know the sizes of pointers.
Reviewed-by: Eric Anholt <eric@anholt.net>
In the last phase of the schedule and RA loop, the RA call is redundant
if we spill. Immediately afterwards, we're going to see that we
couldn't allocate without spilling and call back into RA and tell it to
go ahead and spill. We've known about it for a while but we've always
brushed over it on the theory that, if you're going to spill, you'll be
calling RA a bunch anyway and what does one extra RA hurt? As it turns
out, it hurts more than you'd expect. Because the RA interference graph
gets sparser with each spill and the RA algorithm is more efficient on
sparser graphs, the RA call that we're duplicating is actually the most
expensive call in the RA-and-spill loop.
There's another extra RA call we do that's a bit harder to see which
this also removes. If we try to compile a shader that isn't the minimum
dispatch width and it fails to allocate without spilling we call fail()
to set an error but then go ahead and do the first spilling RA pass and
only after that's complete do we detect the fail and bail out. By
making minimum dispatch widths part of the spill condition, we side-step
this problem.
Getting rid of these extra spills takes the compile time of a nasty
Aztec Ruins shader from about 28 seconds to about 26 seconds on my
laptop. It also makes shader-db 1.5% faster
Shader-db results on Kaby Lake:
total instructions in shared programs: 15311100 -> 15311100 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 355468050 -> 355468050 (0.00%)
cycles in affected programs: 0 -> 0
helped: 0
HURT: 0
Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The most expensive part of register allocation is the ra_simplify step
which is a fixed-point algorithm with a worst-case complexity of O(n^2)
which adds the registers to a stack which we then use later to do the
actual allocation. This commit uses bit sets and changes the core loop
of ra_simplify to first walk 32-node chunks and then walk each chunk.
This lets us skip whole 32-node chunks in one go based on bit operations
and compute the minimum q value potentially 32x as fast. Of course, the
algorithm still has the same fundamental O(n^2) run-time but the
constant is now much lower.
In the nasty Aztec Ruins compute shader, this shaves a full four seconds
off the 30s compile time for a release build of mesa. In a debug build
(needed for accurate stack traces), perf says that ra_select takes 20%
of runtime before this patch and only 5-6% of runtime after this patch.
It also makes shader-db runs faster.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15311100 -> 15311100 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 355468050 -> 355468050 (0.00%)
cycles in affected programs: 0 -> 0
helped: 0
HURT: 0
Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%)
Reviewed-by: Eric Anholt <eric@anholt.net>
We only use q_total if the reg is not assigned so there's no point in
updating it if the reg is not assigned. This has no known perf benefit
but it will reduce churn in a future commit.
Reviewed-by: Eric Anholt <eric@anholt.net>
This shaves about half a second off the 30 second compile time of one of
the compute shaders in Aztec ruins.
Reviewed-by: Eric Anholt <eric@anholt.net>
Both apps and we (see virgl_buffer_transfer_flush_region) might
flush regions that are unmodified. We have to read back for those
flushes.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
It currently has no user and is probably incorrect (resource_wait is
required in some more cases). Remove it so that we can focus on
transfers first.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
No surface requires an auxiliary surface to operate correctly. Fall back
to an uncompressed surface if mesa fails to create and allocate an
auxiliary surface. This enables adding more restrictions to ISL without
having to update iris.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
intel_tiling_supports_hiz() and intel_miptree_supports_hiz() duplicate
much the work done by isl_surf_get_hiz_surf(). Replace them with simple
expressions.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Import some restrictions from intel_tiling_supports_hiz() and
intel_miptree_supports_hiz().
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
intel_tiling_supports_ccs() and intel_miptree_supports_ccs() duplicate
much the work done by isl_surf_get_ccs_surf(). Drop them both and index
a boolean array to choose CCS_D in intel_miptree_choose_aux_usage().
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This function duplicates much the work done by isl_surf_get_mcs_surf().
Replace it with a simple expression.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Import some restrictions from intel_miptree_supports_mcs() and don't
assume that the caller knows which device generations are supported.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
No surface requires an auxiliary surface to operate correctly. Fall back
to an uncompressed surface if mesa fails to create and allocate an
auxiliary surface. This enables adding more restrictions to ISL without
having to update i965.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Instead of checking the API variant on entry of set_varying_vp_inputs
to check if we can ever be interrested in fixed function processing
or not, we can check if we are actually fixed function processing.
To check this we can use the immediately updated
gl_context::VertexProgram._VPMode value that tells us if we have a
user provided shader program or if we are in fixed function processing
either through an internal TNL shader of directly through hardware.
When doing so, we also need to recheck the varying_vp_inputs variable
at the time gl_context::VertexProgram._VPMode is set to VP_MODE_FF.
Put asserts at the consumers of gl_context::varying_vp_inputs to make
sure gl_context::VertexProgram._VPMode is set to VP_MODE_FF. By that
gl_context::varying_vp_inputs should be up to date then.
By not looking at the opengl api for this decision we should actually
catch more cases where we can avoid setting a state change flag, including
the ones where we cannot get into VP_MODE_FF by the choice of the api.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The precondition stated in the comment is not true. The values mentioned are
only set from _mesa_update_state which in turn may not yet be called.
For now set the _NEW_VARYING_VP_INPUTS flag a bit more often, we will narrow
that down to a minimum again in a later patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Whenever a buffer is allocated, e.g. by the first draw call or EGL call after a
buffer swap, make sure the size is up to date. Prior to this commit, we
failed to do so when querying the buffer age, or swapping buffers
without any prior EGL call or draw call.
Signed-off-by: Jonas Ådahl <jadahl@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The spec says we can't create another surface if we already created a
surface with the given window or pixmap. Implement this check.
This behavior is exercised by piglit/egl-create-surface.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Each platform stores this in a different place:
- platform_drm uses dri2_surf->gbm_surf->base
- platform_android uses dri2_surf->window
- platform_wayland uses dri2_surf->wl_win
- platform_x11 uses dri2_surf->drawable
- platform_x11_dri3 uses dri3_surf->loader_drawable.drawable
- haiku doesn't even store it!
We need access to the native surface since the specification asks us
to refuse creating a new surface if there's already an EGLSurface
associated with native_surface.
An alternative to this patch would be to create a new
API.GetNativeWindow callback that each platform would have to
implement. While that's something we can definitely do, I prefer
this approach.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Otherwise we risk to read past the end of the buffer.
In addition, change the loop counters to unsigned to be consistent
with the types.
Fixes: afa8707ba9
softpipe: add SSBO/shader atomics support.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
As with the previous value of 5000 we seemed to be reaching OOM in some
circumstances.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Since last Friday, these two tests have been fixed:
dEQP-GLES2.functional.shaders.functions.control_flow.return_in_nested_loop_fragment
dEQP-GLES2.functional.shaders.linkage.varying_7
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
It sure looks like we just want both of them to be nonzero, and && is
probably going to be cheaper than * anyway.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
My gcc can't see that the uninitialized value from the PIPE_BUFFER case
isn't used from the !PIPE_BUFFER cases later.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
__u64 is a ulonglong on x86_64, not uint64_t, so my gcc was complaining
about the wrong type being passed in.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The .editorconfig helps with the tabs, but we've got this
two-tabs-from-previous-indentation line continuation style that requires
whacking the c-file-offsets. This will throw emacs warnings when first
opening a file in the directory, press '!' to shut it up for the future.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The editorconfig takes precedence over dir-locals in emacs26 with
editorconfig enabled, so the /.editorconfig was affecting these
directories.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>