This task was finished as of:
d9079648d0.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Since scaling isn't involved, we don't need multiple extents.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Since we're using texelFetch with a sampled image, a sampler is no
longer needed. This agrees with the Vulkan Spec section 13.2.4
Descriptor Set Updates:
sampler is a sampler handle, and is used in descriptor updates for types
VK_DESCRIPTOR_TYPE_SAMPLER and VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER
if the binding being updated does not use immutable samplers.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The texelFetch operation requires that the sampled texture coordinates
be unnormalized integers. This will simplify the copy shader for
w-tiled images (stencil buffers).
v2 (Jason):
Use f2i for texel coords
Fix num_components indirectly
Use float inputs for interpolation
Nest tex_pos functions
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This reverts commit f391683922.
Some conflicts had to be resolved in order for this revert to be
successful.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* Add fields in meta struct
* Add support in meta init/teardown
* Switch to custom meta_emit_blit2d()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These will be customized for blit2d operations.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is cleaner than using glBindAttribLocation().
Not all drivers support the extension, but I don't think those drivers
use GLSL in the first place. Apparently some Meta shaders already use
GL_ARB_explicit_attrib_location, so I think it should be okay.
Honestly, I'm not sure how the old code worked anyway - we bound the
attribute location for "texcoords", while all the shaders capitalized
or spelled it differently.
v2: Convert another instance in brw_meta_fast_clear.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When a user defines a point size array and enables it, the point
size value set via glPointSize should be ignored. To achieve this,
we can simply toggle ctx->VertexProgram.PointSizeEnabled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42187
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is done instead of copy propagating the VPM reads into the
instructions using them, because VPM reads have to stay in order.
shader-db results:
total instructions in shared programs: 78509 -> 78114 (-0.50%)
instructions in affected programs: 5203 -> 4808 (-7.59%)
total estimated cycles in shared programs: 234670 -> 234318 (-0.15%)
estimated cycles in affected programs: 5345 -> 4993 (-6.59%)
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Rhys Kidd <rhyskidd@gmail.com>
This file will contain optimization passes for both vpm reads
and writes.
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some rasterization code relies (for sse) on the first and third planes
(but not the second for now) being 128bit aligned, and we didn't get that
on 32bit - I mistakenly thought the 64bit number in the struct would get
the thing aligned to 64bit even on 32bit archs.
Stephane Marchesin really figured this out.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
The logic was comparing actual ints, not true/false values.
This meant that it was emitting always multiple line segments instead of just
one even if the stipple test had the same result, which looks inefficient, and
the segments also overlapped thus breaking line aa as well.
(In practice, with the no-op default line stipple pattern, for a 10-pixel
long line from 0-9 it was emitting 10 segments, with the individual segments
ranging from 0-1, 0-2, 0-3 and so on.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94193
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
All these img filter loops iterate through NUM_CHANNELS, not QUAD_SIZE.
In practice both are of course the same unchangeable value (4), but it
makes the code look a bit confusing. Moreover, some of the functions were
actually given an array of 4 values according to the declaration, yet the
code was addressing values 0/4/8/12 out of it, so fix this by just saying
it's a pointer to floats like the other functions.
While here, also add comment about not quite correct filtering.
There's no actual code difference.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The filt_args->offset wasn't assigned but was always used later leading
to a crash (as far as I can tell, texel offsets don't actually make much
sense with anisotropic filtering, but because there's no explicit setting
if offsets are enabled there the array is always accessed).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94481
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
CC: <mesa-stable@lists.freedesktop.org>
This allows drivers to make smarter decisions e.g. about whether the image
has to be decompressed.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is required to preserve the image variable's coherent/restrict/volatile
qualifiers in TGSI.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Frontends should have this information readily available, and it simplifies
image LOAD/STORE/ATOM* handling especially with indirect image access.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The enums MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS and
MAX_COMBINED_SHADER_OUTPUT_RESOURCES are equal and should therefore only
appear once.
Noticed while implementing ARB_shader_image_load_store without previously
implementing SSBO.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Should have no functional change. The IP value of an instruction that
reads src_var cannot possibly be after the end of the live interval of
the variable it's reading from, by the definition of live interval.
Might save future readers a momentary WTF while trying to understand
this code.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bug found by the liveness analysis validation pass that will be
introduced in a later commit. The no-op MOV check in
opt_register_coalesce() was removing instructions which makes the
cached liveness analysis calculation inconsistent with the shader IR.
We were failing to set progress to true in that case though, which
means that invalidate_live_intervals() wouldn't necessarily be called
at the end of the function.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bug found by the liveness analysis validation pass that will be
introduced in a later commit. fixup_3src_null_dest() was allocating
registers which makes the cached liveness analysis calculation
incomplete, so it must be invalidated.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bug found by the liveness analysis validation pass that will be
introduced in a later commit. opt_sampler_eot() was allocating
registers and inserting and removing instructions, which makes the
cached liveness analysis calculation inconsistent with the shader IR,
so it must be invalidated.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to launch_grid.
This commit fixes this by zero-ing the entire pipe_grid_info struct when
declaring it, to avoid similar problems popping-up in the future.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
[ Francisco Jerez: Trivial codestyle fix. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This should help the next person working on hardware enabling figure out
where in the Intel PRMs to find the magic platform hardware values.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
When reading the source code, it's useful to indicate that a group of
fields in a struct are related in someway. There were several places
where people tried to group related structure members with the {@
syntax, without realizing they also needed to add the \name syntax in
order to generate correct doxygen html.
There are several files with groupings that look like this:
struct foo {
/**
* Related fields description
* @{
*/
int bar;
char baz;
/** @} */
long qux;
}
However, the doxygen syntax for grouping is:
struct foo {
/**
* \name Related fields description
* @{
*/
int bar;
char baz;
/** @} */
long qux;
}
https://www.stack.nl/~dimitri/doxygen/manual/grouping.html
Without the group name definition, the fields don't get properly
grouped. Instead, the group description is applied to the first field.
Fix the Intel hardware information structure, brw_device_info to
properly group the GPU hardware limitations and hardware quirks fields.
Once you've run `cd doxygen; make clean; make all`,
updated documentation can be found at
mesa/doxygen/i965/structbrw__device__info.html
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Better tracking of resource state and synchronization.
A follow on commit will clean up resource functions into a new
swr_resource.cpp file.
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
From the point it's constructed the CFG contains the only existing
copy of the program IR, and it never becomes invalid. Calling
backend_shader::invalidate_cfg would have destroyed the program
structure irrecoverably -- We weren't calling it at all for a good
reason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
The following commits broke things by starting to feed us unhandled
extract_u16/extract_u8 opcodes:
commit 905ff86198
Author: Matt Turner <mattst88@gmail.com>
AuthorDate: Wed Feb 3 14:28:31 2016 -0800
Commit: Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u16.
commit 76289fbfa8
Author: Matt Turner <mattst88@gmail.com>
AuthorDate: Thu Jan 21 09:09:48 2016 -0800
Commit: Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u8.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to
find out whether the input is less than 0). Secondly the current
approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced
instead of inf.
Instead we switch to the less accurate rcp(rsq(x)) method - this behaves
nicely for all valid inputs. We still don't do this for DSQRT since the
RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson
steps right now. Eventually we should have a separate library function
for DSQRT that does it more precisely (and perhaps move this lowering to
the post-opt phase).
This fixes a number of dEQP precision tests that were expecting better
behavior for infinite inputs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The idea is that a single triangle will cover the whole area being
drawn, allowing the blit shader to do its work. However the max fb size
is 16384x16384, which means that the triangle we draw needs to be twice
that in order to cover the whole area fully. Increase the size of the
triangle to 32768x32768.
This fixes a number of dEQP tests that were failing because a blit was
involved which would miss some of the resulting texture.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Bitfields where shuffled around for the better on a4xx, so we don't need
any patching on this one. It appears to be something we set entirely in
the gmem code so no conflict between tiling and render state like we had
in a3xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>