Commit graph

66814 commits

Author SHA1 Message Date
Iago Toral Quiroga
e1ed4f2532 mesa: Recompute LegalTypesMask if the GL API has changed
The current code computes ctx->Array.LegalTypesMask just once,
however, computing this needs to consider ctx->API so we need
to make sure that the API for that context has not changed if
we intend to reuse the result.

The context API can change, at least, if we go through
_mesa_meta_begin, since that will always force
API_OPENGL_COMPAT until we call _mesa_meta_end. If any
operation in between these two calls triggers a call to
update_array_format, then we might be caching a value for
LegalTypesMask that will not be right once we have called
_mesa_meta_end and restored the context API.

Fixes the following 179 dEQP tests in i965:
dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.normalize.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.output_types.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.multiple_attributes.input_types.3_*fixed2*
dEQP-GLES3.functional.draw.random.{2,18,28,68,83,106,109,156,181,191}

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-12-09 11:40:00 +01:00
Eduardo Lima Mitev
09cb149ba7 mesa: Returns zero samples when querying GL_NUM_SAMPLE_COUNTS when internal format is integer
From GL ES 3.0 specification, section 6.1.15 Internal Format Queries (page 236),
multisampling is not supported for signed and unsigned integer internal formats.

Fixes 19 dEQP tests under 'dEQP-GLES3.functional.state_query.internal_format.*'.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-09 11:40:00 +01:00
Eduardo Lima Mitev
7894278717 mesa: Enables GL_RGB and GL_RGBA unsized internal formats for OpenGL ES 3.0
GL_RGB and GL_RGBA are valid internal formats on a GLES3 profile. See
"Table 1. Unsized Internal Formats" at
https://www.khronos.org/opengles/sdk/docs/man3/html/glTexImage2D.xhtml.

Fixes 2 dEQP tests:
- dEQP-GLES3.functional.state_query.internal_format.rgb_samples
- dEQP-GLES3.functional.state_query.internal_format.rgba_samples

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-12-09 11:40:00 +01:00
Eduardo Lima Mitev
242ad32655 mesa: Considers GL_DEPTH_STENCIL_ATTACHMENT a valid argument for FBO invalidation under GLES3
In OpenGL and OpenGL-ES 3+, GL_DEPTH_STENCIL_ATTACHMENT is a valid attachment point for the family of functions
that invalidate a framebuffer object (e.g, glInvalidateFramebuffer, glInvalidateSubFramebuffer, etc).
Currently, a GL_INVALID_ENUM error is emitted for this attachment point.

Fixes 21 dEQP test failures under 'dEQP-GLES3.functional.fbo.invalidate.*'.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-09 11:40:00 +01:00
Eric Anholt
8420a95692 vc4: Reserve rb31 instead of r3 for raddr conflict spills.
This increases the cost of a raddr b conflict spill (save r3 to rb31, move
src1 to r3, move rb31 back to r3 when done, instead of just move src1 to
r3), but on average thanks to instruction pairing it's more worthwhile to
have another accumulator.

total instructions in shared programs: 46428 -> 46171 (-0.55%)
instructions in affected programs:     38030 -> 37773 (-0.68%)
2014-12-09 01:04:46 -08:00
Eric Anholt
ab1b1fa6fb vc4: Prioritize allocating accumulators to short-lived values.
The register allocator walks from the end of the nodes array looking for
trivially-allocatable things to put on the stack, meaning (assuming
everything is trivially colorable and gets put on the stack in a single
pass) the low node numbers get allocated first.  The things allocated
first happen to get the lower-numbered registers, which is to say the fast
accumulators that can be paired more easily.

When we previously made the nodes match the temporary register numbers,
we'd end up putting the shader inputs (VS or FS) in the accumulators,
which are often long-lived values.  By prioritizing the shortest-lived
values for allocation, we can get a lot more instructions that involve
accumulators, and thus fewer conflicts for raddr and WS.

total instructions in shared programs: 52870 -> 46428 (-12.18%)
instructions in affected programs:     52260 -> 45818 (-12.33%)
2014-12-09 00:55:14 -08:00
Dave Airlie
0d4272cd8e r600g: fix regression since UCMP change
Since d8da6decea where the
state tracker started using UCMP on cayman a number of tests
regressed.

this seems to be r600g is doing CNDGE_INT for UCMP which is >= 0,
we should be doing CNDE_INT with reverse arguments.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-12-09 11:54:46 +10:00
Matt Turner
2a0bef91ca program: Delete dead _mesa_realloc_instructions.
Dead since 2010 (commit 284ce209).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-08 17:02:19 -08:00
Matt Turner
811a1836c8 swrast: Remove 'inline' from tex filter functions.
Reduces .text size of mesa_dri_drivers.so (i965-only) by 62k, or 1.4%.

Note that we don't remove inline from lerp_2d(), which has a comment
above it saying it definitely should be inlined. Though, removing the
inline keyword from it doesn't actually change the compiled code for me.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-12-08 17:02:19 -08:00
Matt Turner
8af4aaf351 Don't cast the return value of malloc/realloc
See commit 2b7a972e for the Coccinelle script.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-08 17:02:19 -08:00
Matt Turner
f0a8bcd84e Use calloc instead of malloc/memset-0
See commit 6bda027e for the Coccinelle script.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-08 17:02:19 -08:00
Matt Turner
9019e5e195 Remove useless checks for NULL before freeing
See commits 5067506e and b6109de3 for the Coccinelle script.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-08 17:02:19 -08:00
Kristian Høgsberg
cae7a2a031 i965/skl: Add Skylake PCI IDs
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
2014-12-08 16:33:59 -08:00
Damien Lespiau
5bad948fa8 i965/skl: Emit depth stall workaround for gen9 as well
The docs say that we shouldn't need this workaround for gen8+, but just
removing it, causes gpu hangs.  We'll revisit this, but for now, just
extend the workaround to gen9.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2014-12-08 16:33:59 -08:00
Ben Widawsky
9404494b9b i965/skl: Fix GS thread count location
SKL moves the GS threadcount to dw8 from dw7, and no longer does the
divide by 2 thing.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
2014-12-08 16:33:59 -08:00
Vinson Lee
d20235f79a i965: Fix union usage for G++ <= 4.6.
This patch fixes this build error with G++ <= 4.6.

  CXX    test_vf_float_conversions.o
test_vf_float_conversions.cpp: In function ‘unsigned int f2u(float)’:
test_vf_float_conversions.cpp:63:20: error: expected primary-expression before ‘.’ token

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86939
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-08 16:25:16 -08:00
Eric Anholt
70dd3df344 vc4: Interleave register allocation from regfile A and B.
The register allocator prefers low-index registers from vc4_regs[] in the
configuration we're using, which is good because it means we prioritize
allocating the accumulators (which are faster).  On the other hand, it was
causing raddr conflicts because everything beyond r0-r2 ended up in
regfile A until you got massive register pressure.  By interleaving, we
end up getting more instruction pairing from getting non-conflicting
raddrs and QPU_WSes.

total instructions in shared programs: 55957 -> 52719 (-5.79%)
instructions in affected programs:     46855 -> 43617 (-6.91%)
2014-12-08 16:08:13 -08:00
Eric Anholt
46741c1b87 vc4: Fix decision for whether the MIN operation writes to the B regfile. 2014-12-08 16:08:13 -08:00
Eric Anholt
24c5ab7bbb vc4: Drop dependency on r3 for color packing.
We can avoid it by carefully ordering the packing.  This is important as a
step in giving r3 to the register allocator.

total instructions in shared programs: 56087 -> 55957 (-0.23%)
instructions in affected programs:     18368 -> 18238 (-0.71%)
2014-12-08 16:08:13 -08:00
Eric Anholt
dfbf58c439 vc4: Add support for GL 1.0 logic ops. 2014-12-08 16:08:13 -08:00
Eric Anholt
5045d8ca42 vc4: Add support for TGSI_OPCODE_UCMP.
This is being emitted now from st_glsl_to_tgsi.cpp.
2014-12-08 16:08:13 -08:00
Tom Stellard
c16436149c radeonsi/compute: Clamp COMPUTE_TMPRING_SIZE.WAVES to: num_cu * 32
This is the maximum value allowed for this field.
2014-12-08 17:20:50 -05:00
Tom Stellard
0e1c085f17 winsys/radeon: Always report at least 1 compute unit
All uses of this require that the value be at least one, so it's
easier to report at least one than having to wrap all uses
in MAX2(max_compute_units, 1).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2014-12-08 17:20:50 -05:00
Tom Stellard
67dcbcd92c radeonsi: Program RASTER_CONFIG for harvested GPUs v5
Harvested GPUs have some of their render backends disabled, so
in order to prevent the hardware from trying to render things
with these disabled backends we need to correctly program
the PA_SC_RASTER_CONFIG register.

v2:
  - Write RASTER_CONFIG for all SEs.

v3:
  - Set GRBM_GFX_INDEX.INSTANCE_BROADCAST_WRITES bit.
  - Set GRBM_GFX_INFEX.SH_BROADCAST_WRITES bit when done setting
    PA_SC_RASTER_CONFIG.
  - Get num_se and num_sh_per_se from kernel.

v4:
  - Get correct value for num_se
  - Remove loop for setting PA_SC_RASTER_CONFIG
  - Only compute raster config when a backend has been disabled.

v5: Michel Dänzer
  - Fix computation for chips with multiple SEs

https://bugs.freedesktop.org/show_bug.cgi?id=60879

CC: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
2014-12-08 17:20:50 -05:00
Roland Scheidegger
fea5c2640b draw: (trivial): remove double semicolon 2014-12-09 00:10:41 +01:00
Abdiel Janulgue
49e0431211 st/mesa: For vertex shaders, don't emit saturate when SM 3.0 is unsupported
There is a bug in the current lowering pass implementation where we lower saturate
to clamp only for vertex shaders on drivers supporting SM 3.0. The correct behavior
is to actually lower to clamp only when we don't support saturate which happens
on drivers that don't support SM 3.0

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-12-08 20:14:26 +02:00
Abdiel Janulgue
4ea8c8d56c glsl: Don't optimize min/max into saturate when EmitNoSat is set
v3: Fix multi-line comment format (Ian)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-12-08 20:14:17 +02:00
Abdiel Janulgue
39f7b72428 ir_to_mesa: Remove sat to clamp lowering pass
Fixes an infinite loop in swrast where the lowering pass unpacks saturate into
clamp but the opt_algebraic pass tries to do the opposite.

v3 (Ian):
This is a revert of commit cfa8c1cb "ir_to_mesa: lower ir_unop_saturate" on
the ir_to_mesa.cpp portion. prog_execute.c can handle saturates in vertex
shaders, so classic swrast shouldn't need this lowering pass.

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83463
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-12-08 20:14:10 +02:00
Michael Forney
5d64da401c loader: Add missing EXPAT_CFLAGS to libloader.la CPPFLAGS
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-08 08:50:27 -08:00
Matt Turner
f65200ccc9 i965: Remove default from brw_instruction_name switch to catch missing names.
The case-range extension is available in clang and gcc at least back to
3.4.0.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-08 08:50:26 -08:00
Matt Turner
b6a71cbb64 i965: Add missing opcode names.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-08 08:50:26 -08:00
Matt Turner
6383e206c0 i965: Add opcode names for set_omask and set_sample_id.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-08 08:50:26 -08:00
Chad Versace
7e8ba77c49 egl: Expose EGL_KHR_get_all_proc_addresses and its client extension
Mesa already implements the behavior of EGL_KHR_get_all_proc_addresses
and EGL_KHR_client_get_all_proc_addresses. This patch just exposes the
extension strings.

See: https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_get_all_proc_addresses.txt
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2014-12-07 20:58:25 -08:00
Emil Velikov
0b6e0aa5ae docs: add news item and link release notes for mesa 10.3.5
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-12-07 19:22:11 +00:00
Emil Velikov
7409ad5147 docs: Add sha256 sums for the 10.3.5 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 1ba2029184)
2014-12-07 19:22:11 +00:00
Emil Velikov
8d235e0c70 Add release notes for the 10.3.5 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit c90b0db1ae)
2014-12-07 19:22:11 +00:00
Ilia Mirkin
043b79461f freedreno/a2xx: silence warning about missing DEPTH32X
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:53 -05:00
Ilia Mirkin
c416f49ebe freedreno/a3xx: handle index_bias (i.e. base_vertex)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:50 -05:00
Ilia Mirkin
b38b40d7bb freedreno/a3xx: add bgr565 texturing and rendering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:47 -05:00
Ilia Mirkin
e02ed16cb5 freedreno/a3xx: add support for SRGB render targets
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:43 -05:00
Ilia Mirkin
39a7c049d3 freedreno/a3xx: output RGBA16_FLOAT from fs for certain outputs
Fixes R11G11B10F rendering, and is required for SRGB format support.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:40 -05:00
Ilia Mirkin
3674c76edf freedreno/a3xx: re-enable rgb10_a2 render targets
There were previously regressions regarding border colors, which the
updated swizzle logic resolves.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:37 -05:00
Ilia Mirkin
fc94b2c2a0 freedreno/a3xx: fix border color swizzle to match texture format desc
This is a hack since it uses the texture information together with the
sampler, but I don't see a better way to do it. In OpenGL, there is a
1:1 correspondence.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:33 -05:00
Ilia Mirkin
97fef2db5c freedreno/a3xx: fix alpha-blending on RGBX formats
Expert debugging assistance provided by Chris Forbes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:20 -05:00
Chris Forbes
6b01969345 glcpp: Fix can not to cannot in error message
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-07 11:49:28 +13:00
Chris Forbes
b49a069bd3 glcpp: Disallow undefining GL_* builtin macros.
Fixes the piglit test: spec/glsl-es-3.00/compiler/undef-GL_ES.vert

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-07 11:47:45 +13:00
Chris Forbes
ed56c16820 i965/Gen6-7: Fix point sprites with PolygonMode(GL_POINT)
This was an oversight in the original patch. When PolygonMode is
used, then front faces, back faces, or both may be rendered as
points and are affected by point sprite state.

Note that SNB/IVB can't actually be fully conformant here, for
a legacy context -- we don't have separate sets of pointsprite
enables for front and back faces. Haswell ignores pointsprite
state correctly in hardware for non-point rasterization, so can
do this correctly, but it doesn't seem worth it.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86764
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-07 11:46:42 +13:00
Chris Forbes
092c73a7c3 i965: Fix regs read for FS_OPCODE_INTERP_PER_SLOT_OFFSET
Dead code elimination was eating the Y offset.

Fixes the piglit test:
spec/ARB_gpu_shader5/arb_gpu_shader5-interpolateAtOffset-nonconst

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-07 10:29:26 +13:00
Chris Forbes
680f72d6f2 i965: Add opcode names for FS interpolation opcodes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-07 10:29:20 +13:00
Roland Scheidegger
d8da6decea mesa/st: don't use CMP / I2F for conditional assignments with native integers
The original idea was to optimize away the condition by integrating it directly
into the CMP instruction. However, with native integers this requires an extra
I2F instruction. It is also fishy because the negation used didn't really honor
ieee754 float comparison rules, not to mention the CMP instruction itself
(being pretty much a legacy instruction) doesn't really have defined special
float value behavior in any case.
So, use UCMP and adjust the code trying to optimize the condition away
accordingly (I have absolutely no idea if such conditions are actually hit
or would be translated away somewhere else already).

v2: cosmetic changes

No piglit regressions on llvmpipe.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-12-06 18:03:25 +01:00