Add layer support to export individual array layers.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add offset support to handle NV12 offsets as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For exporting a specific layer of an array texture.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We are going to need this for EGL_EXT_image_dma_buf_import.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: Squash multiple commits addressing the new parameter in different
files so we don't break the build (Iago)
v3: Fix tgsi (Samuel)
v4: Fix nir_clone.c (Samuel)
v5: Fix vc4 and freedreno (Iago)
v6 (Sam)
- Fix build errors in nir_lower_indirect_derefs
- Use helper to get type size from nir_alu_type.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Some opcodes need explicit bitsizes, and sometimes we need to use the
double version when constant folding.
v2: fix output type for u2f (Iago)
v3: do not change vecN opcodes to be float. The next commit will add
infrastructure to enable 64-bit integer constant folding so this is isn't
really necessary. Also, that created problems with source modifiers in
some cases (Iago)
v4 (Jason):
- do not change bcsel to work in terms of floats
- leave ldexp generic
Squashed changes to handle different bit sizes when constant
folding since otherwise we would break the build.
v2:
- Use the bit-size information from the opcode information if defined (Iago)
- Use helpers to get type size and base type of nir_alu_type enum (Sam)
- Do not fallback to sized types to guess bit-size information. (Jason)
Squashed changes in i965 and gallium/nir drivers to support sized types.
These functions should only see sized types, but we can't make that change
until we make sure that nir uses the sized versions in all the relevant places.
A later commit will address this.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Now that the field exists in the instruction, we can make discards less
special. As a bonus, that means that we should be able to merge some more
.sf instructions together when we get around to that.
This causes some scheduling changes, as it allows tlb_color_reads to be
delayed past the discard condition setup. Since the tlb_color_read ends
up later, this may mean performance improvements, but I haven't tested.
total instructions in shared programs: 78114 -> 78035 (-0.10%)
instructions in affected programs: 1922 -> 1843 (-4.11%)
total estimated cycles in shared programs: 234318 -> 234329 (0.00%)
estimated cycles in affected programs: 8200 -> 8211 (0.13%)
The register allocator doesn't really do anything about the temp, so it
doesn't seem like it should matter. However, the scheduler would think
that a new def is being created.
This doesn't change anything yet, but it avoids a bunch of regressions in
the next commit.
This is done instead of copy propagating the VPM reads into the
instructions using them, because VPM reads have to stay in order.
shader-db results:
total instructions in shared programs: 78509 -> 78114 (-0.50%)
instructions in affected programs: 5203 -> 4808 (-7.59%)
total estimated cycles in shared programs: 234670 -> 234318 (-0.15%)
estimated cycles in affected programs: 5345 -> 4993 (-6.59%)
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Rhys Kidd <rhyskidd@gmail.com>
This file will contain optimization passes for both vpm reads
and writes.
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some rasterization code relies (for sse) on the first and third planes
(but not the second for now) being 128bit aligned, and we didn't get that
on 32bit - I mistakenly thought the 64bit number in the struct would get
the thing aligned to 64bit even on 32bit archs.
Stephane Marchesin really figured this out.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
The logic was comparing actual ints, not true/false values.
This meant that it was emitting always multiple line segments instead of just
one even if the stipple test had the same result, which looks inefficient, and
the segments also overlapped thus breaking line aa as well.
(In practice, with the no-op default line stipple pattern, for a 10-pixel
long line from 0-9 it was emitting 10 segments, with the individual segments
ranging from 0-1, 0-2, 0-3 and so on.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94193
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
All these img filter loops iterate through NUM_CHANNELS, not QUAD_SIZE.
In practice both are of course the same unchangeable value (4), but it
makes the code look a bit confusing. Moreover, some of the functions were
actually given an array of 4 values according to the declaration, yet the
code was addressing values 0/4/8/12 out of it, so fix this by just saying
it's a pointer to floats like the other functions.
While here, also add comment about not quite correct filtering.
There's no actual code difference.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The filt_args->offset wasn't assigned but was always used later leading
to a crash (as far as I can tell, texel offsets don't actually make much
sense with anisotropic filtering, but because there's no explicit setting
if offsets are enabled there the array is always accessed).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94481
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
CC: <mesa-stable@lists.freedesktop.org>
This allows drivers to make smarter decisions e.g. about whether the image
has to be decompressed.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Frontends should have this information readily available, and it simplifies
image LOAD/STORE/ATOM* handling especially with indirect image access.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to launch_grid.
This commit fixes this by zero-ing the entire pipe_grid_info struct when
declaring it, to avoid similar problems popping-up in the future.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
[ Francisco Jerez: Trivial codestyle fix. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Better tracking of resource state and synchronization.
A follow on commit will clean up resource functions into a new
swr_resource.cpp file.
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
The following commits broke things by starting to feed us unhandled
extract_u16/extract_u8 opcodes:
commit 905ff86198
Author: Matt Turner <mattst88@gmail.com>
AuthorDate: Wed Feb 3 14:28:31 2016 -0800
Commit: Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u16.
commit 76289fbfa8
Author: Matt Turner <mattst88@gmail.com>
AuthorDate: Thu Jan 21 09:09:48 2016 -0800
Commit: Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u8.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to
find out whether the input is less than 0). Secondly the current
approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced
instead of inf.
Instead we switch to the less accurate rcp(rsq(x)) method - this behaves
nicely for all valid inputs. We still don't do this for DSQRT since the
RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson
steps right now. Eventually we should have a separate library function
for DSQRT that does it more precisely (and perhaps move this lowering to
the post-opt phase).
This fixes a number of dEQP precision tests that were expecting better
behavior for infinite inputs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The idea is that a single triangle will cover the whole area being
drawn, allowing the blit shader to do its work. However the max fb size
is 16384x16384, which means that the triangle we draw needs to be twice
that in order to cover the whole area fully. Increase the size of the
triangle to 32768x32768.
This fixes a number of dEQP tests that were failing because a blit was
involved which would miss some of the resulting texture.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Bitfields where shuffled around for the better on a4xx, so we don't need
any patching on this one. It appears to be something we set entirely in
the gmem code so no conflict between tiling and render state like we had
in a3xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Move where we pick dummy FS for binning pass, so the whole driver sees
the same dummy/no-op FS stage.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Sampler states don't really make sense with buffer textures, but they
can be set anyway, so we need to be defensive here. This bug was lurking
for a while and was finally noticed due to PBO uploads setting sampler
states.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94284
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
Tested-by: Shawn Starr <shawn.starr@rogers.com>
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Clear DCC flags if necessary when binding a new sampler view.
v2: Do not reset DCC flags of bound sampler views.
v3: Check that we have a real texture (Nicolai)
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Found by inspection of the source based on a bisected bug report.
This bug has been in the code for a long time, but the more recent PBO upload
feature exposed it because it leads to more uses of buffer textures.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94388
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>