v2: use ARB_texture_multisample enable bit
Patch adds extension enable bit and enables required keywords
and builtin functions for the extension.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This got missed because the piglit test only tested int images to avoid a
combinatiorial explosion of format, targets, stages and sizes which
takes more than 5 minutes to test on nvidia's driver.
This patch also drops the IMAGE_FUNCTION_AVAIL_ATOMIC which is not applicable
to the image_size codepath but was not hurting in any way.
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
See issue from the ARB_texture_query_lod spec for LOD vs Lod confusion:
(3) The core specification uses the "Lod" spelling, not "LOD". Should
this extension be modified to use "Lod"?
RESOLVED: The "Lod" spelling is the correct spelling for the core
specification and the preferred spelling for use. However, use of
"LOD" also exists, as the extension predated the core specification,
so this extension won't remove use of "LOD".
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
The code is heavily inspired from Francisco Jerez's code supporting the
image_load_store extension.
Backends willing to support this builtin should handle
__intrinsic_image_size.
v2: Based on the review of Ilia Mirkin
- Enable the extension for GLES 3.1
- Fix indentation
- Fix the return type (float to int, number of components for CubeImages)
- Add a warning related to GLES 3.1
v3: Based on the review of Francisco Jerez
- Refactor the code to share both add_image_function and _image with the other
image-related functions
v4: Based on Topi Pohjolainen's comments
- Do not add parenthesis for the return value
v5: based on Francisco Jerez's comments:
- Fix a few indent issues
- Reduce the size of a condition by testing the dimension and array properties
instead of enumerating all the formats.
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Patch separates array samplers from the texture_multisample check so that we
can enable only [iu]sampler2DMS, [iu]sampler2DMSArray are not supported.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
v2: Change GL version from 400 to 420. Noticed by Tapani and Ilia.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ARB_shading_language_packing is part of GLSL 4.2, not 4.0 as I
mistakenly believed. The following functions are available only with
ARB_shading_language_packing, GLSL 4.2 (not GLSL 4.0), or ES 3.0:
- packSnorm2x16
- unpackSnorm2x16
- packHalf2x16
- unpackHalf2x16
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Create a new search function to look for matching built-in functions by name
and use it for built-in function redefinition or overload in GLSL ES 3.00.
GLSL ES 3.0 spec, chapter 6.1 "Function Definitions", page 71
"A shader cannot redefine or overload built-in functions."
While in GLSL ES 1.0 specification, chapter 8 "Built-in Functions"
"User code can overload the built-in functions but cannot redefine them."
So this check is specific to GLSL ES 3.00.
This patch fixes the following dEQP tests:
dEQP-GLES3.functional.shaders.functions.invalid.overload_builtin_function_vertex
dEQP-GLES3.functional.shaders.functions.invalid.overload_builtin_function_fragment
dEQP-GLES3.functional.shaders.functions.invalid.redefine_builtin_function_vertex
dEQP-GLES3.functional.shaders.functions.invalid.redefine_builtin_function_fragment
No piglit regressions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This implements the bulk of the builtin functions for fp64 support.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Our current atan()-approximation is pretty inaccurate at 1.0, so
let's try to improve the situation by doing a direct approximation
without going through atan.
This new implementation uses an 11th degree polynomial to approximate
atan in the [-1..1] range, and the following identitiy to reduce the
entire range to [-1..1]:
atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x)
This range-reduction idea is taken from the paper "Fast computation
of Arctangent Functions for Embedded Applications: A Comparative
Analysis" (Ukil et al. 2011).
The polynomial that approximates atan(x) is:
x * 0.9999793128310355 - x^3 * 0.3326756418091246 +
x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 +
x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444
This polynomial was found with the following GNU Octave script:
x = linspace(0, 1);
y = atan(x);
n = [1, 3, 5, 7, 9, 11];
format long;
polyfitc(x, y, n)
The polyfitc function is not built-in, but too long to include here.
It can be downloaded from the following URL:
http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m
This fixes the following piglit test:
shaders/glsl-const-folding-01
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
According to GLES (i.e. 1.0 and above) spec textureCubeLod and
texture2DProjLod are built in functions. We seem to disable support
for these functions with GLES. This patch enables the support.
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84355
All of the GL image enums fit in 16-bits.
Also move the fields from the anonymous "image" structucture to the next
higher structure. This will enable packing the bits with the other
bitfield.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0
After (32-bit): 70 40,577,421,777 68,487,584 62,973,695 5,513,889 0
Before (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0
After (64-bit): 74 37,124,603,758 95,891,808 88,466,712 7,425,096 0
A real savings of 346KiB on 32-bit and 262KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Historically, we've implemented the rules for overriding built-in
functions by creating multiple ir_functions and relying on the symbol
table to hide the one containing built-in functions. That works, but
has a few drawbacks, so the next patch will change it.
Instead, we'll have a single ir_function for a particular name, which
will contain both built-in and user-defined signatures. Passing an
extra parameter to matching_signature makes it easy to ignore built-ins
when they're supposed to be hidden.
I didn't add the parameter to exact_matching_signature since it wasn't
necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
V2: - Don't assume everyone wants interpolateAtSample() lowered to
interpolateAtOffset. It turns out this isn't what we want most
of the time for i965. Lowering can be added later in an ir pass
which drivers opt into, rather than bolting it straight into the
builtin definition.
- Only expose the interpolateAt* builtins in the fragment language.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This will be necessary to implement EndStreamPrimitive().
EndPrimitive() will produce an ir_end_primitive with the default stream 0.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will be necessary to implement EmitStreamVertex().
EmitVertex() will produce an ir_emit_vertex with the default stream 0.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The M_PI*f macros used a preprocessor paste to append 'f'
to M_PI defines, which works if the values are only numbers
but breaks on OpenBSD where M_PI definitions have casts
and brackets to meet requirements of a future version of POSIX,
http://austingroupbugs.net/view.php?id=801http://austingroupbugs.net/view.php?id=828
Simplify the M_PI*f macros by using casts directly in the defines
as suggested by Kenneth Graunke.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78665
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
In all uses of dotlike() we're writing generic code that operates on 1-4
component vectors. That our IR requires ir_binop_dot expressions'
operands to be 2+ component vectors is an implementation detail that's
not important when implementing built-in functions with dot(), which is
defined for scalar floats in GLSL.
Reviewed-by: Eric Anholt <eric@anholt.net>
ARB_gpu_shader5 and ES 3.0 expose different subsets of
ARB_shading_language_packing.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To fix MSVC compile breakage. Evidently, _restrict is an MSVC keyword,
though the docs only mention __restrict (with two underscores).
Note: we may want to also rename _volatile to volatile_flag to be
consistent.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74900
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Because of the combinatorial explosion of different image built-ins
with different image dimensionalities and base data types, enumerating
all the 242 possibilities would be annoying and a waste of .text
space. Instead use a special path in the built-in builder that loops
over all the known image types.
v2: Generate built-ins on GLSL version 4.20 too. Rename
'_has_float_data_type' to '_supports_float_data_type'. Avoid
duplicating enumeration of image built-ins in create_intrinsics()
and create_builtins().
v3: Use a more orthodox approach for passing image built-in generator
parameters.
v4: Cosmetic changes.
Acked-by: Paul Berry <stereotype441@gmail.com>
Add predicates to query if a GLSL type is or contains an image.
Rename sampler_coordinate_components() to coordinate_components().
v2: Use assert instead of unreachable.
v3: No need to use a separate code-path for images in
coordinate_components() after merging image and sampler fields in
the glsl_type structure.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Consider a multithreaded program with two contexts A and B, and the
following scenario:
1. Context A calls initialize(), which allocates mem_ctx and starts
building built-ins.
2. Context B calls initialize(), which sees mem_ctx != NULL and assumes
everything is already set up. It returns.
3. Context B calls find(), which fails to find the built-in since it
hasn't been created yet.
4. Context A finally finishes initializing the built-ins.
This will break at step 3. Adding a lock ensures that subsequent
callers of initialize() will wait until initialization is actually
complete.
Similarly, if any thread calls release while another thread is still
initializing, or calling find(), the mem_ctx/shader would get free'd while
from under it, leading to corruption or use-after-free crashes.
Fixes sporadic failures in Piglit's glx-multithread-shader-compile.
Bugzilla: https://bugs.freedesktop.org/69200
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1 10.0" <mesa-stable@lists.freedesktop.org>
The type of all three parameters are identical, so we don't need to
specify it three times. The predicate is always identical too, so we
don't need to make it a parameter, either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Formal function parameters are always ir_variable objects, not an
arbitrary ir_instruction. So there's no need to dynamically cast here.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
foreach_iter and exec_list_iterators have been deprecated for some time now;
we just hadn't ever bothered to convert code to the newer foreach_list
and foreach_list_safe macros.
In these cases, we aren't editing the list, so we can use foreach_list
rather than foreach_list_safe.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we had an enum called gl_shader_type which represented
pipeline stages in the order they occur in the pipeline
(i.e. MESA_SHADER_VERTEX=0, MESA_SHADER_GEOMETRY=1, etc), and several
inconsistently named functions for converting between it and other
representations:
- _mesa_shader_type_to_string: gl_shader_type -> string
- _mesa_shader_type_to_index: GLenum (GL_*_SHADER) -> gl_shader_type
- _mesa_program_target_to_index: GLenum (GL_*_PROGRAM) -> gl_shader_type
- _mesa_shader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string
This patch tries to clean things up so that we use more consistent
terminology: the enum is now called gl_shader_stage (to emphasize that
it is in the order of pipeline stages), and the conversion functions are:
- _mesa_shader_stage_to_string: gl_shader_stage -> string
- _mesa_shader_enum_to_shader_stage: GLenum (GL_*_SHADER) -> gl_shader_stage
- _mesa_program_enum_to_shader_stage: GLenum (GL_*_PROGRAM) -> gl_shader_stage
- _mesa_progshader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string
In addition, MESA_SHADER_TYPES has been renamed to MESA_SHADER_STAGES,
for consistency with the new name for the enum.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Also rename the "target" field of _mesa_glsl_parse_state and the
"target" parameter of _mesa_shader_stage_to_string to "stage".
Reviewed-by: Brian Paul <brianp@vmware.com>