The polynomial used for asin_expr() was suboptimal (and its source was
not documented).
A better approximation is found in the _Handbook_of_Mathematical_Functions_
by Abramowitz and Stegun, which is used in Nvidia's Cg toolkit. However,
while this approximation gives a good absolute error bound, its relative
error exceeds the 4096 ulp allowed by the Vulkan spec. Taking a page
from the spirv implementation of asin(), we implement a piecewise
approximation where a Taylor series is used for small values of |x|.
This patch also harmonizes the GLSL and Vulkan implementations by moving
the implementation to common code (nir_builder).
Running tests on asin() with a grid of 64000 samples between 0.0 and +1.0,
the original asin() at 32 bits has:
```
glsl spirv
RMSE: 1.756451e-04 1.609091e-04
worst abs error: 3.904104e-04 at 0.937001 3.904104e-04 at 0.937001
worst ulp error: 11800 at 6.2499e-05 3826 at 0.841331
```
whereas the new implementation has for both:
```
RMSE: 2.528056e-05
worst abs error: 4.962087e-05 at 0.451149
worst ulp error: 2379 at 0.215106
```
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40862>
This fixes a regression found in blender's unit testing, which called
fabs(-0.0) and invoked an NIR optimization that is was not valid for
the parameter -0.0. IEEE 754 requires that abs clear the sign bit
for the value -0.0.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41060>
This change extends r600_lds_constant_buffer to
implement a fully conformant gl_PrimitiveID at
the tes and tcs stages.
This change was tested on cayman and barts. Here are the tests fixed:
spec/arb_tessellation_shader/execution/tcs-primitiveid-instanced: fail pass
spec/arb_tessellation_shader/execution/tes-no-tcs-primitiveid-instanced: fail pass
spec/arb_tessellation_shader/execution/tes-primitiveid-instanced: fail pass
khr-gl4[4-6]/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass
khr-gles31/core/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass
khr-glesext/tessellation_shader/tessellation_shader_tessellation/gl_invocationid_patchverticesin_primitiveid: fail pass
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40297>
With the pass order shuffling, code like `(x & 0xf) + (x & 0xfffffff0)` gets
optimized to bitfield_select(0xF, x, x). But it would be much better to optimize
simply to x. nir_opt_algebraic would do that for us but we run this pass too
late for algebraic to save us from ourselves, so be smarter.
Observed on dEQP-GLES31.functional.compute.basic.image_atomic_op_local_size_8
with Jay, this saves an instruction there.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40956>
It turns out we need the color sysvals recorded in system_values_read,
and PARAM_GEN is for point smoothing.
Acked-by: Pierre-Eric
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40556>
Initial data is loaded via bindless_image_load, atomic swap via
bindless_image_atomic_swap.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39932>
This is a 2x16 bitpacked version of load_pixel_coord which maps directly to the
hardware value and is much easier for Jay to consume due to the sadness that is
true 16-bit on Intel. Jay will lower to this internally.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
Jay will use this to lower & optimize subgroup shuffles. This is closer to
how Intel hardware works but still much higher level than the hardware
primitive. This gets us NIR optimizations on the multiply however.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
This exposes the underlying render target write message directly, which Jay will
use to lower RT writes in NIR. I'm still on the fence about what exactly this
should look like but this is good enough for GLES3.0 (so, multiple render
targets but not necessarily dual source blending).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
This maps directly to what Intel's thread payload gives us, allowing us to
optimize out frcp's in some cases. Jay will use this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
These lowered versions map to what Jay can deal with. The hardware is more
flexible but we're not due to data model restrictions. We choose to lower to get
us off the ground, we can revisit later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>
This will let us share a common scratch swizzling between brw and jay.
Changes by Ken:
- Use an immediate SIMD width when known so we don't need to re-lower
- Switch to load_simd_width_intel because it may not match
info->api_subgroup_size on Vulkan without VK_EXT_subgroup_size_control
- Stop using DWord Scattered Write messages for scratch. These take an
offset in DWords, and our offsets are now always in bytes. This also
means that we no longer create MEMORY_OPCODE_* IR with inconsistent
units of either bytes or dwords. Yikes. We use byte scattered
messages now.
fossil-db stats on Battlemage:
Instrs: 500477504 -> 500450056 (-0.01%); split: -0.01%, +0.00%
CodeSize: 7807432368 -> 7806786192 (-0.01%); split: -0.01%, +0.00%
Cycle count: 62404008370 -> 62398437734 (-0.01%); split: -0.01%, +0.00%
Fill count: 546690 -> 546695 (+0.00%); split: -0.00%, +0.00%
Max live registers: 141257956 -> 141258100 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 72350283 -> 72336544 (-0.02%)
Totals from 99 (0.01% of 1581969) affected shaders:
Instrs: 366593 -> 339145 (-7.49%); split: -7.58%, +0.09%
CodeSize: 6425936 -> 5779760 (-10.06%); split: -10.06%, +0.00%
Cycle count: 2412009876 -> 2406439240 (-0.23%); split: -0.26%, +0.03%
Fill count: 19675 -> 19680 (+0.03%); split: -0.02%, +0.04%
Max live registers: 17600 -> 17744 (+0.82%); split: -0.09%, +0.91%
Non SSA regs after NIR: 37894 -> 24155 (-36.26%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40843>
We are already at our limit of 31 texture opcodes, and cannot add any
more without expanding the opcode hashing in nir_instr_set. Thankfully,
it's at 29 bits, so adding one here is possible still.
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40833>
A set is large and expensive to iterate.
This is faster (overall fossilize-replay difference):
Difference at 95.0% confidence
-250 +/- 28.9257
-2.04849% +/- 0.235211%
(Student's t, pooled s = 34.1626)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
This would have been fine for a set, but we're going to use a dynarray for
the predecessors in the future.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40242>
If we can't determine what variable is accessed, we should assume it could
be any.
This might make things worse with Vulkan since it does
vulkan_resource_index+load_vulkan_descriptor, but I don't think it matters
much. SSBO stores are rarely used in vertex shaders.
fossil-db (navi21):
Totals from 1 (0.00% of 202427) affected shaders:
Instrs: 442 -> 445 (+0.68%)
Latency: 2038 -> 2043 (+0.25%)
InvThroughput: 432 -> 437 (+1.16%)
VALU: 295 -> 298 (+1.02%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40740>
It's required by descriptor heap. There is already a NIR pass that
optimizes non-uniform access, so this should be mostly safe.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40768>
This isn't required for deref instructions because it's possible to
get the image format back from the variable but it will be useful for
descriptor heap.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40768>
These two new variable modes are used to relax restrictions on deref
casts through because it's possible to cast different modes from the
heap pointers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40768>