Prep work to reduce the noise in the next patch.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Going to convert this pass to parameterized lower_io_to_temporaries, and
we want the user to be able to specify whether to lower outputs or
inputs or both. The restriction of running this pass before validate
to avoid output reads no longer applies.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A pass to lower complex (struct/array/mat) inputs/outputs to primitive
types. This allows, for example, linking that removes unused components
of a larger type which is not indirectly accessed.
In the near term, it is needed for gallium (mesa/st) support for NIR,
since only used components of a type are assigned VBO slots, and we
otherwise have no way to represent that to the driver backend. But it
should be useful for doing shader linking in NIR.
v2: use glsl_count_attribute_slots() rather than passing a type_size
fxn pointer
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Handled by tgsi_emulate for glsl->tgsi case.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The goal is to allow the pipe driver to request something other than
TGSI, but detect whether what is getting is TGSI vs what it requested.
The pipe drivers will always have to support TGSI (and convert that into
whatever it is that they prefer), but in some cases we should be able to
skip the TGSI intermediate step (such as glsl->nir vs glsl->tgsi->nir).
I think pipe_compute_state should get similar treatment. Currently,
afaict, it has one user and one consumer, which has allowed it to be
sloppy wrt. supporting alternative IR's.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The use of transfer_inline_write() in TexSubImage path (see fb9fe352ea)
exposed a bug for "layer_first" resources (ie. a4xx) not setting correct
layer_stride.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Currently, when cross validating global variables, all global variables
seen in the shaders that are part of a program are saved in a table.
When checking a variable this already exist in the table, we check both
are initialized to the same value. If the already saved variable does
not have an initializer, we copy it from the new variable.
Unfortunately this is wrong, as we are modifying something it is
constant. Also, if this modified variable is used in
another program, it will keep the initializer, when it should have none.
Instead of copying the initializer, this commit replaces the old
variable with the new one. So if we see again the same variable with an
initializer, we can compare if both are the same or not.
v2: convert tabs in whitespaces (Kenenth Graunke)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Juha-Pekka found this back in May 2015:
<1430915727-28677-1-git-send-email-juhapekka.heikkila@gmail.com>
From the discussion, obviously it would be preferable to make
ralloc_size no longer return zeroed memory, but Juha-Pekka found that
it would break Mesa.
In <56AF1C57.2030904@gmail.com>, Juha-Pekka mentioned that patches
exist to fix i965 when ralloc_size is fixed to not zero memory, but
the patches have not made their way to mesa-dev yet.
For now, let's stop doing the double zeroing of rzalloc buffers.
v2:
* Move ralloc_size code to rzalloc_size, and add a comment as
suggested by Ken.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
% pattern rules are a GNU extension. Convert the use of one to a
inference rule to allow this to build on OpenBSD.
v2: inference rules can't have additional prerequisites so add a target
rule to still depend on gen_pack_header.py
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use GALLIVM_DEBUG=dumpbc for dumping of modules as bitcode.
Instead of a fixed llvmpipe.bc name, use ir_<modulename>.bc so multiple
modules can be dumped (albeit it might still overwrite previous modules,
particularly the modules from draw tend to always have the same name).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes this build error.
CXX rasterizer/memory/libswrAVX_la-ClearTile.lo
In file included from rasterizer/memory/ClearTile.cpp:34:0:
./rasterizer/memory/Convert.h: In function ‘uint16_t Convert32To16Float(float)’:
./rasterizer/memory/Convert.h:170:9: error: ‘__builtin_isnan’ is not a member of ‘std’
if (std::isnan(val))
^
./rasterizer/memory/Convert.h:170:9: note: suggested alternative:
<built-in>: note: ‘__builtin_isnan’
./rasterizer/memory/Convert.h:176:14: error: ‘__builtin_isinf_sign’ is not a member of ‘std’
else if (std::isinf(val))
^
./rasterizer/memory/Convert.h:176:14: note: suggested alternative:
<built-in>: note: ‘__builtin_isinf_sign’
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95180
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Otherwise constants which aren't live get an undefined constant location.
When we go to set up param and pull_param we end up assigning all unused
uniforms to slot 0. This cases the Vulkan driver to segfault because it
doesn't have pull_param.
This fixes bugs in the Vulkan driver introduced in c3fab3d000.
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
This fixes a crash (but not the test):
GL45-CTS.shader_texture_image_samples_tests.functional_test
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was colliding badly and making
GL45-CTS.buffer_storage.map_persistent_texture
fail on radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
When doing GetTexSubImage using a PBO, we should check if it involves
a signed/unsigned conversion and bail if it does, just like in the
other cases.
This fixes:
GL33-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo
on Haswell at least.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95324
Reviewed-by: Matt Turer <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
The calculated limit gave problems on SI as it was > 32 KiB
and the hardware LDS size on SI is only 32 KiB. It isn't
correct anyway when processing multiple patches in a threadgroup.
As we potentially have any number of patches such that the
used LDS is at most the hardware LDS size, and exact size
per patch is not known at compile time, this seems like
the only valid bound.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We index into these based on var->data.driver_location, which might have
gaps (ie. two inputs, one w/ drvloc 0 and other 2). This shows up in
(for example) 'bin/copyteximage 1D', but was only noticed recently due
to additional asserts.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Continue using ADD in the other case because a fragment shader backend
could fuse the ADD with a MUL to generate a MAD for ((x && y) || z).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There is nothing left that can generate them. These used to be
generated by ir_to_mesa or by the assembler for various NV extensions
that have been removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Nothing that consumes the output of this backend consumes them
navtively. This is *not* the way i915 has implemented these
instructions, but, as far as I am able to tell, this is the way both the
Cg compiler and the HLSL compiler implement these operations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Nothing that consumes the output of this backend consumes them
navtively. This is the way i915 has implemented these instructions
since it began consuming GLSL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Compute support seems to be pretty stable now, and according to piglit
it doesn't seem to break 3D state.
As a side effect, this will expose ARB_compute_shader on GK110/GK208.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This prevents IB rejections due to insane memory usage from
many concecutive texture uploads.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This implements:
- Linear-to-linear partial copies. (unaligned)
- Tiled-to-linear and linear-to-tiled partial copies.
(unaligned except 1-2 Bpp)
- Tiled-to-tiled partial copies aligned to 8x8.
v2: Extend the SDMA L2T VM fault workaround to T2L.
- Same algorithm, just applied to T2L.
(and using a 0-based address and surface.bo_size instead of buf->size)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Most of this has never worked according to the new test.
The new code will be radically different.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: - adjustments for exercising all important SDMA code paths
- decrease the probability of getting huge sizes (faster testing)
- increase the probability of getting power-of-two dimensions
- change the memory cap to 128MB (faster testing)
- better detect which engine has been used
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
this is more robust and probably fixes some bugs already
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
a staging cube texture with array_size % 6 != 0 doesn't work very well
just use 2D_ARRAY or 2D for all staging textures
Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It's for the buffer cache.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>