Walk the IR instead. This happens when preprocessing so it doesn't really
matter, but it complicates the nir_variable audit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
Prior to this change, agx_opt_cse is our most expensive backend pass, due to the
time spent hashing instructions. hash_instr was calling into XXH32 a massive
number of times, often to hash only a single bit. It's much faster to hash
entire blocks of memory at a time. Optimize to do just that.
With this change, agx_opt_cse is now cheaper than instruction selection as
it should be.
No shader-db changes (except CPU time decrease).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
We do need to use undefs instead of zeroes in this internal collect. While this
vector gets copypropped out, it'd cause us to fail compilation if noopt is on.
Fix that.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
This will be used for compute kernels (and transform feedback) in the (near)
future. For now, let's get the opcode plumbed in the backend to reduce some of
the rebase pain.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
If a render target isn't written to, we don't use the sample mask. Avoid
generating the intermediate instructions, common with gl_FragColor. It will get
DCE'd, but this means less work for DCE, which should help for shader jank since
this pass gets called per-variant.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
These are useful for layered staging resources. Tested by forcing linear
textures and running dEQP-GLES3.functional.texture.format.sized.2d_array.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
Use the freedreno lowering. It'll be slow but I don't know of any apps that
actually use this and it's required for GL 3.0.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
We can end up logging both the buffer that the toplevel cmdstream is
allocated, as well as the sub-allocated part of that buffer. Possibly
the kernel could do better about this, but to avoid undecodeable
cmdstream dumps and devcores, detect this case and deal with it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20496>
We also want to ensure we don't hit the limit of max suballoc BOs.
Piglit drawoverhead would manage to hit this.
Fixes: 4861067689 ("freedreno/drm: Add sub-allocator")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20496>
Some CTS tests enable all extensions ... , which combined with having
no shader cache on some platforms results in some CTS tests timing
out (in particular tests recreating the device all the time).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20422>
Nothing significant in shader-db on RV530:
total instructions in shared programs: 134963 -> 134957 (<.01%)
instructions in affected programs: 1108 -> 1102 (-0.54%)
helped: 7
HURT: 1
total temps in shared programs: 17153 -> 17154 (<.01%)
temps in affected programs: 38 -> 39 (2.63%)
helped: 2
HURT: 3
Just some fluctuations from pair scheduling due to different code order.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20208>
There are still some ftruncs left as most of them originates in
nir_lower_int_to_float and that is currently called after nir_opt_algebraic
in ntt.
No change in shader-db.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20208>
Negligible amount of instructions saved on RV530:
total instructions in shared programs: 134970 -> 134963 (<.01%)
instructions in affected programs: 2273 -> 2266 (-0.31%)
helped: 9
HURT: 1
The one hurt shader is when we fail to recognize the x - ffract(x)
pattern and skip the don't emit ftrunc optimization as implemented
in the previous patch due to some non-trivial swizzles going on.
Signed-off-by: Pavel Ondračka <pave.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20208>
We already skip emitting ftrunc in nir_lower_int_to_float when there is
ffloor, fround or any other integer-making opcode preceding f2i32. However
if lower_ffloor is set for driver that doesn't support integers, the lowered
x - ffract(x) patterns would not be recognized and extra ftruct would be
emitted, doing unnecessary rounding.
This optimization only works if there is no non-trivial swizzling used for
the fadd, fneg and ffract involved, which seems to be 99% of the cases according
to my testing.
This is needed to enable nir ffloor lowering on r300 driver without regressions.
I'm not sure if this helps anybody else, the only hardware which sets
lower_ffloor and converts ints to floats (and can't do trunc) are some old
etnaviv cards, so maybe it will help there a bit.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20208>
I think we should distinguish between dynamic states (applications) and
hardware states, and this will allow us to use vk_viewport_state
instead of our own structs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20314>
dangling_attr_ref=true can be set when the following happens:
glBegin(GL_TRIANGLES)
glVertex(...)
glVertex(...)
glColor4(...)
glVertex(...)
When glColor4 is hit, the first 2 vertices are copied to the vertex_store
by upgrade_vertex, but since this is done before glColor4 new values are
copied, we make a note to fixup these attribute laters using dangling_attr_ref.
This causes very slow rendering. What this commit does instead, is in this
situation, the new attribute value are backported to the vertex store for the
copied vertices after upgrade_vertex is done updating the layout.
This avoids the slow corner case.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7912
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20495>
Given a GPU platform, there are multiple device ids. This commit
adds ability to specify device id for the shim, instead of using
one of the hard-coded device ids per platform.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20526>
Currently, process_singlesync_signals() checks if fd == -1 to handle
possible errors in the drmSyncobjExportSyncFile function. But, fd is not
initialized, which means that drmSyncobjExportSyncFile might fail and
the error will not be handled as fd might not be equal to -1.
Therefore, initialize the fd variable with value -1 to ensure proper
error handling.
cc: mesa-stable
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20475>
This is for VARYING_SLOT_VARx_16BIT slots varying streamout.
OpenGL ES will store 16bit medium precision varying to these slots.
Vulkan is not allowed to streamout varying less than 32bit.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20157>