v2:
- make 16-bit be its own separate case (Jason)
v3:
- Drop the result_int temporary (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Extended math with half-float operands is only supported since gen9,
but it is limited to SIMD8. In gen8 we lower it to 32-bit.
v2: quashed together the following patches (Jason):
- intel/compiler: allow extended math functions with HF operands
- intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
- intel/compiler: extended Math is limited to SIMD8 on half-float
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(allow extended math functions with HF operands,
extended Math is limited to SIMD8 on half-float)
The hardware doesn't support half-float for these.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There are some hardware restrictions that brw_nir_lower_conversions should
have taken care of before we get here.
v2:
- rebased on top of regioning lowering pass
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since we handle booleans as integers this makes more sense.
v2:
- rebased to incorporate new boolean conversion opcodes
v3:
- rebased on top regioning lowering pass
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)
Going forward having these split is a bit more convenient since these two
groups have different restrictions.
v2:
- Rebased on top of new regioning lowering pass.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.
v2:
- Consider fp16 rounding conversion opcodes
- Properly handle swizzles on conversion sources.
v3
- Run the pass earlier, right after nir_opt_algebraic_late (Jason)
- NIR alu output types already have the bit-size (Jason)
- Use 'is_conversion' to identify conversion operations (Jason)
v4:
- Be careful about the intermediate types we use so we don't lose
range and avoid incorrect rounding semantics (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This changes requires LLVM r356465.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This changes requires LLVM r356465.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
LLVM 9+ now supports 8-bit and 16-bit types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
No support for 64-bit compare&swap atomic operations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Except compare&swap which is still buggy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use the raw version (ie. IDXEN=0) because vindex is unused.
Use the old intrinsic for compare&swap because the new one
hangs the GPU for some reasons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and
now llvm 9 removed the signed versions as well - they were proposed for
removal earlier, but the pattern to recognize those was very complex,
so it wasn't done then. However, instead of these arch-specific
intrinsics, there's now arch-independent intrinsics for saturated
add / sub, both for signed and unsigned, so use these.
They should have only advantages (work with arbitrary vector sizes,
optimal code for all archs), although I don't know how well they work
in practice for other archs (at least for x86 they do the right thing).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a race condition where anv_gen_files are executed before
genxml files, which causes a build failure
v2: add dependency on idep_genxml (Lionel)
Fixes: d1992255bb
("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We can deduct the size from another field, let's just save some space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
The Gen10+ expected format adds an additional counter which we can't
disclose yet. We can still make the size of the expected query result
match.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
One more thing we want to share between the different APIs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
We'll want to reuse this in our Vulkan extension.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
We'll want to reuse those structures later on.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
We would like to reuse performance query metrics in other APIs. Let's
make the query code dealing with the processing of raw counters into
human readable values API agnostic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This blit can fail, but this is not new; in the old version we
didn't even try to blit in this case. So let's just document the
limitation for now, and leave this for another day.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
When running on OpenGL ES, we can't just map any format for reading,
because of limitations on glReadPixels. So let's fall back to the
blit code-path, and translate the pixels to the correct format in the
end.
This fixes the remaining failures of KHR-GL32.packed_pixels.* apart
from the sRGB tests.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Without this, we can't for instance convert between r8_sint and
r8g8b8a8_sint. But that's pretty useful, so let's support it as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Otherwise, virglrenderer will reject the resource.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
We currently don't support writing to resources that uses a temporary
staging-resource to resolve the pixels. If a write-bit was set, we
forgot to perform a blit back to the old resource, followed by trying to
update the wrong resource, which lacks backing-storage. The end-result
would be that nothing useful happened.
This approach also fixes a few smaller bugs, like using the wrong box
(without x y and z zeroed out), which means a partial update of a
multisampled texture could result in the wrong part of the texture being
updated.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
In case we're resolving, we need to wait for the resolved resource
instead of the original one.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
It's hard to read the code that decides if we want to queue up an unmap
or destroy the transfer right away. So let's make it a bit simpler, by
setting a bool in case we want to queue it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
There's no reason to keep an extra indentation level here, let's merge
the two if-conditions.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This isn't the temporary resource itself, it's the template that we'll
create the resource from. So let's name it appropriately.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This is only written to, never read. Let's just get rid of it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This is a port of Jason's 8379bff6c4
from i965 to iris. We can't find anything relevant in the documentation
and no one we've talked to has been able to help us pin down a solution.
Unfortunately, we have to put the hack in both iris_blit() and
iris_copy_region(). st/mesa's CopyImage() implementation sometimes
chooses to use pipe->blit() instead of pipe->resource_copy_region().
For blits, we only do the hack if the blit source format doesn't match
the underlying resource (i.e. it's reinterpreting the bits). Hopefully
this should not be too common.
We were failing to set up payload[1] for use by LocalInvocationIndex/ID
and shared variable accesses if gl_WorkGroupID/gl_GlobalInvocationID
wasn't used (possibly because you only have one workgroup). You're always
going to use payload[1], and payload[0] is common enough and we have DCE
in the backend to clean it up if it happens to not be used.
Fixes assertion failures in the CTS since Karol's cleanup when NIR started
noticing that we were reading an invalid component.
Fixes: 5450f1c9fb ("v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value")