We can use the pointer stored in the temps array directly.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We can use the pointer stored in the temps array directly.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the alloca'd array case, no longer create redundant and unused allocas
for the individual elements; create getelementptrs instead.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This won't affect the output, but it was, technically, wrong.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This won't affect the output, but it was, technically, wrong.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_unop_fract already forbade integer types in ir_validate. ir_unop_rcp,
ir_unop_rsq, and ir_unop_sqrt should also forbid them in ir_validate.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some shaders include code that looks like:
uniform int i;
uniform vec4 bones[...];
foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]);
CSE would do some work on this:
x = i * 3
foo(bones[x], bones[x + 1], bones[x + 2]);
The compiler may then add '<< 4 + base' to the index calculations.
This results in expressions like
x = i * 3
foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]);
Just rearranging the math to produce (i * 48) + 16 saves an
instruction, and it allows CSE to do more work.
x = i * 48;
foo(bones[x], bones[x + 16], bones[x + 32]);
So, ~6 instructions becomes ~3.
Some individual shader-db results look pretty bad. However, I have a
really, really hard time believing the change in estimated cycles in,
for example, 3dmmes-taiji/51.shader_test after looking that change in
the generated code.
G45
total instructions in shared programs: 4020840 -> 4010070 (-0.27%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs: 98829000 -> 98784990 (-0.04%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0
Ironlake
total instructions in shared programs: 6418887 -> 6408117 (-0.17%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs: 143504542 -> 143460532 (-0.03%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0
Sandy Bridge
total instructions in shared programs: 8357887 -> 8339251 (-0.22%)
instructions in affected programs: 432715 -> 414079 (-4.31%)
helped: 2795
HURT: 0
total cycles in shared programs: 118284184 -> 118207412 (-0.06%)
cycles in affected programs: 6114626 -> 6037854 (-1.26%)
helped: 2478
HURT: 317
Ivy Bridge
total instructions in shared programs: 7669390 -> 7653822 (-0.20%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs: 68381982 -> 68263684 (-0.17%)
cycles in affected programs: 1972658 -> 1854360 (-6.00%)
helped: 2458
HURT: 307
Haswell
total instructions in shared programs: 7082636 -> 7067068 (-0.22%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs: 68282020 -> 68164158 (-0.17%)
cycles in affected programs: 1891820 -> 1773958 (-6.23%)
helped: 2459
HURT: 261
Broadwell
total instructions in shared programs: 9002466 -> 8985875 (-0.18%)
instructions in affected programs: 658784 -> 642193 (-2.52%)
helped: 2795
HURT: 5
total cycles in shared programs: 78503092 -> 78450404 (-0.07%)
cycles in affected programs: 2873304 -> 2820616 (-1.83%)
helped: 2275
HURT: 415
Skylake
total instructions in shared programs: 9156978 -> 9140387 (-0.18%)
instructions in affected programs: 682625 -> 666034 (-2.43%)
helped: 2795
HURT: 5
total cycles in shared programs: 75591392 -> 75550574 (-0.05%)
cycles in affected programs: 3192120 -> 3151302 (-1.28%)
helped: 2271
HURT: 425
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There's no guarantee that there is one, and we don't need one anyway.
Fixes piglit tests:
glx@glx-fbconfig-bad
glx@glx_ext_import_context@import context, multi process
glx@glx_ext_import_context@import context, single process
Fixes: 2e3f067458 ("glx: fix error code when there is no context bound")
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
It's fairly rare that the BB layout puts BBs after the exit block, which
is likely the reason these issues lingered for so long.
This fixes a fraction of issues with the giant pixmark piano shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
The current logic used to determine the execution size of sampler
messages was based on special-casing several argument and opcode
combinations, which unsurprisingly missed the possibility that some
messages could exceed the payload size limit or not depending on the
number of coordinate components present. In particular:
- The TXL, TXB and TEX messages (the latter on non-FS stages only)
would attempt to use SIMD16 on Gen7+ hardware even if a shadow
reference was present and the texture was a cubemap array, causing
it to overflow the maximum supported sampler payload size and
crash.
- The TG4_OFFSET message with shadow comparison was falling back to
SIMD8 regardless of the number of coordinate components, which is
unnecessary when two coordinates or less are present.
Both cases have been handled incorrectly ever since cubemap arrays and
texture gather were respectively enabled (the current logic used by
the SIMD lowering pass is almost unchanged from the previous no16
fall-back logic used pre-SIMD lowering times).
Fixes the following GL4.5 conformance test on Gen7-8 (the bug also
affects Gen9+ in principle, but SKL passes the test by luck because it
manages to use the TXL_LZ message instead of TXL):
GL45-CTS.texture_cube_map_array.sampling
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97267
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes it easier for the caller to find out how many scalar
components are actually read by the instruction. As a bonus we no
longer need to special-case BAD_FILE in the implementation of
fs_inst::regs_read.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplifies the code slightly and will allow the SIMD lowering
pass to find out easily what the actual texturing opcode is in order
to determine the maximum execution size of texturing instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All three fields of the vbuffer_attrs[] array are assigned in the following
loop. The remaining elements of the array are not used.
Tested with full Piglit run, Heaven 4.0, etc.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We don't need to loop over the max number of color buffers, just the
current number (which is usually one).
Tested with full Piglit run, Heaven 4.0, etc.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Fixes regression with team_fortress_2 trace.
This change has been in our in-house tree for some time.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Since clears are more or less just normal draws, there isn't that much
benefit in having hand-rolled clear path. Add support to use u_blitter
instead if gen specific backend doesn't implement ctx->clear().
Signed-off-by: Rob Clark <robdclark@gmail.com>
Not (currently) state that is overwridden by u_blitter itself, but
drivers with custom blit/clear which are reusing part of the u_blitter
infrastructure will use it.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
eglMakeCurrent can also be used to change the active display. In that
case, we need to decrement ref_count of the previous display (possibly
destroying it), and increment it on the next display.
Also, old_dsurf/old_rsurf cannot be non-NULL if old_ctx is NULL, so
we only need to test if old_ctx is non-NULL.
v2: Save the old display before destroying the context.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97214
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Alexandr Zelinsky <mexahotabop@w1l.ru>
Tested-by: Alexandr Zelinsky <mexahotabop@w1l.ru>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Apply the median and matrix filter before the compostioning
we apply the deinterlacing first to avoid the extra overhead
in processing the past and the future surfaces in deinterlacing.
v2: apply the filters on all the surfaces (Christian)
v3: use get_sampler_view_planes() instead of
get_sampler_view_components() and iterate over
VL_MAX_SURFACES (Christian)
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Ian recently changed the preprocessor to allow this in most GLSL
versions, but not GLSL ES 3.00+. This patch converts the existing
test that expects a failure to a #version 300 es shader, and adds
a #version 110 shader to make sure that it's allowed.
Fixes 'make check'.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97307
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
I'm not sure if anything even uses this, but I found this on radv, so
just fix it on anv for consistency.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Avoid use-after-free on error.
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Avoid use-after-free on error.
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Avoid use-after-free on error.
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>