This optimization is incorrect with 64-bit operations, because the
channel-splitting logic in emit_asm ends up being applied twice to
the source operands.
A lucky coincidence of how the writemask test works resulted in this
optimization basically never being applied anyway. As far as I can tell,
the only case where it would (incorrectly) have been applied is something
like
dvec2 d;
float x = (float)d.y;
which nobody seems to have ever done. But the moral equivalent does occur
in one of the component layout piglit test.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Empty writemasks mean "copy everything", so we can always just use the number
of vector elements (which uses the GLSL meaning here, i.e. each double is a
single element/writemask bit).
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For specifying an exact location/component.
v2: change the order of parameters (Dave)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
v2: change the order of parameters (Dave)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
This is a screen cap because drivers are expected to support it either
for all shader types or for none of them.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
radeonsi no longer supports pixel shaders without interpolation optimizations,
which led to assertion failures in si_shader_ps when running shader-db.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
total instructions in shared programs :2286901 -> 2284473 (-0.11%)
total gprs used in shared programs :335256 -> 335273 (0.01%)
total local used in shared programs :31968 -> 31968 (0.00%)
local gpr inst bytes
helped 0 41 852 852
hurt 0 44 23 23
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We shouldn't be using wildcard here in the first place, but changing that
is some effort. As it stands, make -p confirms that glapi_gen_mapi_deps only
contains mapi_abi.py when building outside the Mesa tree.
As a result, only some of the tables were updated when XML files change, but
not the tables for shared glapi. This change ensures that we pick up the
XML files and scripts from the source tree as dependencies also for shared
glapi.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This should make the code more robust if a shader tries to use inputs which
aren't defined by the vertex element layout (which usually shouldn't happen).
No piglit change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Relocate a 'default:' to the end of a case stmt and fix an
indent issue.
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
If pPhysicalDevices is too small for all physical devices,
the driver must return VK_INCOMPLETE.
Since only a single physical device is supported, this is only the case
when pPhysicalDeviceCount == 0 && pPhysicalDevices != NULL.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
pViewportState may be NULL if rasterization is disabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Many of these can be "NULL if the pipeline has rasterization disabled."
Also, we should assert that pMultisampleState exists.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's supposed to be how much at least we want to grow the cs, not the
minimum size of the cs after growth.
v2: Unbreak use_ib_bos.
Don't mask the ib_size when !use_ib_bos, since it's not needed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This checks the kernel driver name is amdgpu before calling
libdrm_amdgpu.
This avoids the following error:
amdgpu_device_initialize: DRM version is 1.6.0 but this driver is only compatible with 3.x.x
when run on a machine with i915 graphics as well as amdgpu.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Probably unlikely however ensure we don't leak a heap allocation
on the fail path.
V.2:
also fix missing 'amdgpu_device_deinitialize()' calls (Emil Velikov).
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Drop/add a few newlines where appropriate and drop a couple of
unnessary braces.
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
SwrStoreTiles now takes a mask of surfaces to store. Reduces
overhead when storing multiple render targets.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
Align and use streaming store instructions for BE fifo queues.
Provides slightly faster enqueue and doesn't pollute the caches.
Add appropriate memory fences to ensure streaming writes are
globally visible.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>