Commit graph

44765 commits

Author SHA1 Message Date
Chris Wilson
5b09cf5c57 i915: out-of-bounds write in calc_live_regs()
From a Coverity defect report.

src/mesa/drivers/dri/i915/i915_fragprog.c
   301  /*
   302   * TODO: consider moving this into core
   303   */
   304  static bool calc_live_regs( struct i915_fragment_program *p )
   305  {
   306      const struct gl_fragment_program *program = &p->FragProg;
   307      GLuint regsUsed = 0xffff0000;
-> 308      uint8_t live_components[16] = { 0, };
   309      GLint i;
   310
   311      for (i = program->Base.NumInstructions - 1; i >= 0; i--) {
   312          struct prog_instruction *inst =
&program->Base.Instructions[i];
   313          int opArgs = _mesa_num_inst_src_regs(inst->Opcode);
   314          int a;
   315
   316          /* Register is written to: unmark as live for this and
preceeding ops */
   317          if (inst->DstReg.File == PROGRAM_TEMPORARY) {
-> 318              if (inst->DstReg.Index > 16)
   319                 return false;
   320
-> 321              live_components[inst->DstReg.Index] &= ~inst->DstReg.WriteMask;
   322              if (live_components[inst->DstReg.Index] == 0)
   323                  regsUsed &= ~(1 << inst->DstReg.Index);
   324          }
   325
   326          for (a = 0; a < opArgs; a++) {
   327              /* Register is read from: mark as live for this and preceeding ops */
   328              if (inst->SrcReg[a].File == PROGRAM_TEMPORARY) {
   329                  unsigned c;
   330
   331                  if (inst->SrcReg[a].Index > 16)
   332                     return false;
   333
   334                  regsUsed |= 1 << inst->SrcReg[a].Index;
   335
   336                  for (c = 0; c < 4; c++) {
   337                      const unsigned field = GET_SWZ(inst->SrcReg[a].Swizzle, c);
   338
   339                      if (field <= SWIZZLE_W)
   340                          live_components[inst->SrcReg[a].Index] |= (1U << field);
   341                  }
   342              }
   343          }
   344
   345          p->usedRegs[i] = regsUsed;
   346      }

Reported-by: Vinson Lee <vlee@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40022
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 67582e6eef)
2011-10-14 17:28:45 -07:00
Carl Worth
eb0fd67f6a glcpp: Add a test for #elif with an undefined macro.
As written, this test correctly raises an error for #elif being used
with an undefined macro (and not as an argument to "defined"). If the
preceding #if were '#if 1' then this diagnositc would correctly be
hidden. That allows code such as the following to not raise an error:

	#ifndef MAYBE_UNDEFINED
	#elif MAYBE_UNDEFINED < 5
	...
	#endif

So this test case is working as expected already. We add it here just
to improve test coverage.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
(cherry picked from commit 201485bae0)
2011-10-14 17:28:45 -07:00
Carl Worth
c6dfde2136 glcpp: Raise error if defining any macro containing two consecutive underscores
The specification reserves any macro name containing two consecutive
underscores, (anywhere within the name). Previously, we only raised
this error for macro names that started with two underscores.

Fix the implementation to check for two underscores anywhere, and also
update the corresponding 086-reserved-macro-names test.

This also fixes the following two piglit tests:

	spec/glsl-1.30/preprocessor/reserved/double-underscore-02.frag
	spec/glsl-1.30/preprocessor/reserved/double-underscore-03.frag

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Carl Worth <cworth@cworth.org>
(cherry picked from commit c4aaf7943c)
2011-10-14 17:28:45 -07:00
Carl Worth
71bd5d424c glcpp: Implement token pasting for non-function-like macros
This is as simple as abstracting one existing block of code into a
function call and then adding a single call to that function for the
case of a non-function-like macro.

This fixes the recently-added 097-paste-with-non-function-macro test
as well as the following piglit tests:

	spec/glsl-1.30/preprocessor/concat/concat-01.frag
	spec/glsl-1.30/preprocessor/concat/concat-02.frag

Also, the concat-04.frag test now passes for the right reason. The
test is intended to fail the compilation, but before this commit it
was failing compilation (and hence passing the test) for the wrong
reason.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
(cherry picked from commit 28842c2331)
2011-10-14 17:28:45 -07:00
Carl Worth
33e1019d95 glcpp: Test a non-function-like macro using the token paste operator
Apparently we never implemented this, (but we've got a GLSL 1.30 test
in piglit that is exercising this case).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
(cherry picked from commit 7bb3403e01)
2011-10-14 17:28:45 -07:00
Carl Worth
ab94d6f902 glcpp: Fix two (or more) successive applications of token pasting
There was already a loop here to look for multiple token pastes, but
it was mistakenly incrementing the iterator counter after performing
one paste.

Instead, leave the loop iterator in place to coalesce as many tokens
as necessary into one.

This fixes the recently add 096-paste-twice test as well as the
following piglit test:

	spec/glsl-1.30/preprocessor/concat/concat-03.frag

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
(cherry picked from commit 3c01a58944)
2011-10-14 17:28:45 -07:00
Ian Romanick
5becf89a17 i915: Only emit program errors when INTEL_DEBUG=wm or INTEL_DEBUG=fallbacks
This makes piglit a lot more happy.  The errors are logged when
INTEL_DEBUG=fallbacks because the application is about to hit a big
software fallback.  We frequently ask people to run applications that
are hitting software fallbacks with INTEL_DEBUG=fallbacks so the we
can help them debug the reason for the software fallback.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 0290a018a5)
2011-10-14 17:28:45 -07:00
Ian Romanick
13a2d4a985 i915: Fail without crashing if a Mesa IR program uses too many registers
This can only happen in GLSL shaders because assembly shaders that use
too many temps are rejected by core Mesa.  It is easiest to make this
happen with shaders that contain flow-control that could not be lowered.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 3bb2f0dde1)
2011-10-14 17:28:45 -07:00
Ian Romanick
f2d166583d ir_to_mesa: Emit warnings instead of errors for IR that can't be lowered
Rely on the driver to do the right thing.  This probably means falling
back to software.  Page 88 of the OpenGL 2.1 spec specifically says:

    "A shader should not fail to compile, and a program object should
    not fail to link due to lack of instruction space or lack of
    temporary variables. Implementations should ensure that all valid
    shaders and program objects may be successfully compiled, linked
    and executed."

There is no provision for saying "No" to a valid shader that is
difficult for the hardware to handle, so stop doing that.

On i915 this causes a large number of piglit tests to change from FAIL
to WARN.  The warning is because the driver still emits messages to
stderr like "i915_program_error: Unsupported opcode: BGNLOOP".

It also fixes ES2 conformance CorrectFull_frag and CorrectParse1_frag
on i915 (and probably other hardware that can't handle loops).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 322c3bf9dc)
2011-10-14 17:28:42 -07:00
Ian Romanick
49d2c552a5 ir_to_mesa: Use Add linker_error instead of fail_link
The functions were almost identical.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 8aadd89d07)
2011-10-14 17:28:03 -07:00
Ian Romanick
a864b15b83 mesa: Ensure that gl_shader_program::InfoLog is never NULL
This prevents assertion failures in ralloc_strcat.  The ralloc_free in
_mesa_free_shader_program_data can be omitted because freeing the
gl_shader_program in _mesa_delete_shader_program will take care of
this automatically.

A bunch of this code could use a refactor to use ralloc a bit more
effectively.  A bunch of the things that are allocated with malloc and
owned by the gl_shader_program should be allocated with ralloc (using
the gl_shader_program as the context).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 89193933cb)
2011-10-14 17:27:17 -07:00
Ian Romanick
e458c3ddbb linker: Make linker_{error,warning} generally available
linker_warning is a new function.  It's identical to linker_error
except that it doesn't set LinkStatus=false and it prepends "warning: "
on messages instead of "error: ".

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 379a32f42e)
2011-10-14 17:27:17 -07:00
Ian Romanick
b8a46f910d linker: Make linker_error set LinkStatus to false
Remove the other places that set LinkStatus to false since they all
immediately follow a call to linker_error.  The function linker_error
was previously known as linker_error_printf.  The name was changed
because it may seem surprising that a printf function will set an
error flag.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 586e741ac1)
2011-10-14 17:27:17 -07:00
Kenneth Graunke
eca2a91a9b i965: Fix inconsistent indentation in brw_eu_emit.c.
Most of these functions used three spaces for the first level of
indentation, but four spaces for the next level.  One used tabs and then
three spaces.  Some used 3/4 in a then block but 3/3 in the else block.

Normally I try to avoid field days like this, but since the functions
were so inconsistent, even internally, it was making it difficult to
edit without introducing spurious whitespace changes.

So, just get it over with.  git diff -b shows 0 lines changed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b861479f83)
2011-10-13 16:35:40 -07:00
Kenneth Graunke
1f083e1839 i965: Allow SIMD16 color writes on Ivybridge.
Again, the check was needlessly specific: this works fine on Gen7.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 7db874bf4c4273d2d46218b1490d312fe2654284)
2011-10-13 14:06:11 -07:00
Kenneth Graunke
9bbf2a343f i965/fs: Allow SIMD16 with control flow on Ivybridge.
The check was designed to forbid it on old generations (Gen5/Ironlake),
not on new ones.  It just works on Gen7/Ivybridge.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit ae5da817e2aeb9f9447fdd6d2eb4b22d6f8f6a87)
2011-10-13 14:06:11 -07:00
Kenneth Graunke
38dfedccb2 i965: Emit depth stalls and flushes before changing depth state on Gen6+.
Fixes OpenArena on Gen7.  Technically, adding only the first depth stall
fixes it, but the documentation says to do all three, and the Windows
driver seems to do it.

Not observed to fix anything on Gen6 yet.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38863
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 02c4dc807e91640c69c8addc3c797300a3c536ad)
2011-10-13 14:06:11 -07:00
Kenneth Graunke
1e0e116d6d i965: Fix incorrect maximum PS thread count shift on Ivybridge.
At one point, the documentation said that max thread count in 3DSTATE_PS
was at bit offset 23, but it's actually 24 on Ivybridge.  Not only did
this halve our thread count, it caused us to write 1 into a bit 23, which
is marked as MBZ (must be zero).  Furthermore, it made us write an even
number into this field, which is apparently not allowed.  Apparently we
were just lucky it worked.

NOTE: This is a candidate for the 7.11 branch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 556e7eea80de778b44a37d51cb757ce32221d1e3)
2011-10-13 14:06:11 -07:00
Eric Anholt
aaadd4c111 i965: Fix polygon stipple offset state flagging.
_NEW_WINDOW_POS wasn't a real Mesa state flag, but we were missing
_NEW_BUFFERS to update the stipple offset when FBO binding or window
size changed, and _NEW_POLYGON to update when stippling gets enabled.

Fixes oglconform's tristrip test.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
(cherry picked from commit d598851d401f7f34d623c9cfbd85d7f5faccd7c2)
2011-10-13 13:59:06 -07:00
Eric Anholt
0d31b130bb i965: Add missing _NEW_POLYGON flag to polygon stipple upload.
Because we skip the pattern upload when stippling is disabled, we need
to check again when it might have been turned on.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
(cherry picked from commit e19541aa2ad05f687c859001b62713209787c9c8)
2011-10-13 13:58:57 -07:00
Kenneth Graunke
e4b1dce9ec i965: Use proper texture alignment units for cubemaps on Gen5+.
In particular, S3TC compressed textures need align_h == 4.

Fixes skybox errors in Quake 4 and FEAR.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34628
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit e0e688ca5441e2c8bc59ec7488bc1bc4ba196602)
2011-10-13 13:58:51 -07:00
Kenneth Graunke
0f87fe948a i965/gen5+: Fix incorrect miptree layout for non-power-of-two cubemaps.
For power-of-two sizes, h0 == mt->height0 since it's already a multiple
of two.  However, for NPOT, they're different; h1 should be computed
based on the original size.

Fixes piglit test "cubemap npot" and oglconform test "textureNPOT".

NOTE: This is a candidate for stable release branches.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit bebc19448f45dbe8c3b016d440403f52e1036e15)
2011-10-13 13:58:45 -07:00
Eric Anholt
f484fc7476 i965/fs: Respect ARB_color_buffer_float clamping.
This was done in the old codegen path, but not the new one.  Caught by
piglit fbo tests after the conversion to GLSL ff_fragment_shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit da53ca641106e47f1d74386d8dc0f7eebeec5225)
2011-10-13 13:58:37 -07:00
Marek Olšák
b9c7773e0d r300g: fix rendering with a non-zero index bias in draw_elements_immediate
NOTE: This is a candidate for the stable branches.
(cherry picked from commit 5506f6ef96)
2011-10-04 17:48:33 +02:00
Paul Berry
7d2ff4ae77 glsl: improve the accuracy of the asin() builtin function.
The previous formula for asin(x) was algebraically equivalent to:

sign(x)*(pi/2 - sqrt(1-|x|)*(A + B|x| + C|x|^2))

where A, B, and C were arbitrary constants determined by a curve fit.

This formula had a worst case absolute error of 0.00448, an unbounded
worst case relative error, and a discontinuity near x=0.

Changed the formula to:

sign(x)*(pi/2 - sqrt(1-|x|)*(pi/2 + (pi/4-1)|x| + A|x|^2 + B|x|^3))

where A and B are arbitrary constants determined by a curve fit.  This
has a worst case absolute error of 0.00039, a worst case relative
error of 0.000405, and no discontinuities.

I don't expect a significant performance degradation, since the extra
multiply-accumulate should be fast compared to the sqrt() computation.

Fixes piglit tests {vs,fs}-asin-float and {vs,fs}-atan-*
(cherry picked from commit d4c80f5f85)
2011-10-02 21:39:30 +02:00
Paul Berry
1bbf124ff8 glsl hierarchical visitor: Do not overwrite base_ir for parameter lists.
This patch fixes a bug in ir_hirearchical_visitor: when traversing an
exec_list representing the formal or actual parameters of a function,
it modified base_ir to point to each parameter in turn, rather than
leaving it as a pointer to the enclosing statement.  This was a
problem, since base_ir is used by visitor classes to locate the
statement containing the node being visited (usually so that
additional statements can be inserted before or after it).  Without
this fix, visitors might attempt to insert statements into parameter
lists.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit cc81eb09b9)
2011-10-02 19:57:57 +02:00
Eric Anholt
ca7560765c glsl: When assiging from a whole array, mark it as used.
Fixes piglit link-uniform-array-size.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 407a1001ae)
2011-10-02 19:57:57 +02:00
Eric Anholt
878d701da4 glsl: When assigning to a whole array, mark the array as accessed.
The vs-varying-array-mat2-col-row-wr test writes a mat2[3] constant to
a mat2[3] varying out array, and also statically accesses element 1 of
it on the VS and FS sides.  At link time it would get trimmed down to
just 2 elements, and then codegen of the VS would end up generating
assignments to the unallocated last entry of the array.  On the new
i965 VS backend, that happened to land on the vertex position.

Some issues remain in this test on softpipe, i965/old-vs and
i965/new-vs on visual inspection, but i965 is passing because only one
green pixel is probed, not the whole split green/red quad.
2011-10-02 19:57:56 +02:00
Paul Berry
c19b963ad6 glsl: Remove field array_lvalue from ir_variable.
The array_lvalue field was attempting to enforce the restriction that
whole arrays can't be used on the left-hand side of an assignment in
GLSL 1.10 or GLSL ES, and can't be used as out or inout parameters in
GLSL 1.10.

However, it was buggy (it didn't work properly for built-in arrays),
and it was clumsy (it unnecessarily kept track on a
variable-by-variable basis, and it didn't cover the GLSL ES case).

This patch removes the array_lvalue field completely in favor of
explicit checks in ast_parameter_declarator::hir() (this check is
added) and in do_assignment (this check was already present).

This causes a benign behavioral change: when the user attempts to pass
an array as an out or inout parameter of a function in GLSL 1.10, the
error is now flagged at the time the function definition is
encountered, rather than at the time of invocation.  Previously we
allowed such functions to be defined, and only flagged the error if
they were invoked.

Fixes Piglit tests
spec/glsl-1.10/compiler/qualifiers/fn-{out,inout}-array-prohibited*
and
spec/glsl-1.20/compiler/assignment-operators/assign-builtin-array-allowed.vert.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 00792e3586)
2011-10-02 19:57:08 +02:00
Eric Anholt
95185c7fe2 glsl: Clarify error message about whole-array assignment in GLSL 1.10.
Previously, it would produce:

    Failed to compile FS: 0:6(7): error: non-lvalue in assignment

and now it produces:

    Failed to compile FS: 0:5(7): error: whole array assignment is not
    allowed in GLSL 1.10 or GLSL ES 1.00.

Also, add spec quotation to the two places we have code for array
lvalues in GLSL 1.10.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 525cec98a5)
2011-10-02 19:56:53 +02:00
Paul Berry
e1221a8811 glsl: Rework oversize array check for gl_TexCoord.
The check now applies both when explicitly declaring the size of
gl_TexCoord and when implicitly setting the size of gl_TexCoord by
accessing it using integral constant expressions.

This is prep work for adding similar size checks to gl_ClipDistance.

Fixes piglit tests texcoord/implicit-access-max.{frag,vert}.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 93b9758d01)
2011-10-02 19:21:49 +02:00
Paul Berry
f732b5a999 glsl: Fix type error when lowering integer divisions
This patch fixes a bug when lowering an integer division:

  x/y

to a multiplication by a reciprocal:

  int(float(x)*reciprocal(float(y)))

If x was a plain int and y was an ivecN, the lowering pass
incorrectly assigned the type of the product to be float, when in fact
it should be vecN.  This caused mesa to abort with an IR validation
error.

Fixes piglit tests {fs,vs}-op-div-int-ivec{2,3,4}.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit af501e2b29)
2011-10-02 19:19:49 +02:00
Paul Berry
0129d5297b glsl: Perform implicit type conversions on function call out parameters.
When an out parameter undergoes an implicit type conversion, we need
to store it in a temporary, and then after the call completes, convert
the resulting value.  In other words, we convert code like the
following:

void f(out int x);
float value;
f(value);

Into IR that's equivalent to this:

void f(out int x);
float value;
int out_parameter_conversion;
f(out_parameter_conversion);
value = float(out_parameter_conversion);

This transformation needs to happen during ast-to-IR convertion (as
opposed to, say, a lowering pass), because it is invalid IR for formal
and actual parameters to have types that don't match.

Fixes piglit tests
spec/glsl-1.20/compiler/qualifiers/out-conversion-int-to-float.vert and
spec/glsl-1.20/execution/qualifiers/vs-out-conversion-*.shader_test,
and bug 39651.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39651

Reviewed-by: Chad Versace <chad@chad-versace.us>
(cherry picked from commit 67b5a3267d)
2011-10-02 19:19:08 +02:00
Paul Berry
27f00df2b7 glsl: Check array size is const before asserting that no IR was generated.
process_array_type() contains an assertion to verify that no IR
instructions are generated while processing the expression that
specifies the size of the array.  This assertion needs to happen
_after_ checking whether the expression is constant.  Otherwise we may
crash on an illegal shader rather than reporting an error.

Fixes piglit tests array-size-non-builtin-function.vert and
array-size-with-side-effect.vert.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d4144a123b)
2011-10-02 19:17:42 +02:00
Paul Berry
8dcfe15a9a glsl: Constant-fold built-in functions before outputting IR
Rearranged the logic for converting the ast for a function call to
hir, so that we constant fold before emitting any IR.  Previously we
would emit some IR, and then only later detect whether we could
constant fold.  The unnecessary IR would usually get cleaned up by a
later optimization step, however in the case of a builtin function
being used to compute an array size, it was causing an assertion.

Fixes Piglit test array-size-constant-relational.vert.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38625
(cherry picked from commit 789ee6516b)
2011-10-02 19:16:47 +02:00
Paul Berry
2c0e00de23 glsl: Emit function signatures at toplevel, even for built-ins.
The ast-to-hir conversion needs to emit function signatures in two
circumstances: when a function declaration (or definition) is
encountered, and when a built-in function is encountered.

To avoid emitting a function signature in an illegal place (such as
inside a function), emit_function() checked whether we were inside a
function definition, and if so, emitted the signature before the
function definition.

However, this didn't cover the case of emitting function signatures
for built-in functions when those built-in functions are called from
inside the constant integer expression that specifies the length of a
global array.  This failed because when processing an array length, we
are emitting IR into a dummy exec_list (see process_array_type() in
ast_to_hir.cpp).  process_array_type() later checks (via an assertion)
that no instructions were emitted to the dummy exec_list, based on the
reasonable assumption that we shouldn't need to emit instructions to
calculate the value of a constant.

This patch changes emit_function() so that it emits function
signatures at toplevel in all cases.

This partially fixes bug 38625
(https://bugs.freedesktop.org/show_bug.cgi?id=38625).  The remainder
of the fix is in the patch that follows.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 0d81b0e184)
2011-10-02 19:03:22 +02:00
Paul Berry
1895de7a32 Revert "glsl: Skip processing the first function's body in do_dead_functions()."
opt_dead_functions contained a shortcut to skip processing the first
function's body, based on the assumption that IR functions are
topologically sorted, with callees always coming before their callers
(therefore the first function cannot contain any calls).

This assumption turns out not to be true in general.  For example, the
following code snippet gets translated to IR that violates this
assumption:

    void f();
    void g();
    void f() { g(); }
    void g() { ... }

In practice, the shortcut didn't cause bugs because of a coincidence
of the circumstances in which opt_dead_functions is called:

(a) we do inlining right before dead function elimination, and
    inlining (when successful) eliminates all calls.

(b) for user-defined functions, inlining is always successful, because
    previous optimization passes (during compilation) have reduced
    them to a form that is eligible for inlining.

(c) the function that appears first in the IR can't possibly call a
    built-in function, because built-in functions are always emitted
    before the function that calls them.

It seems unnecessarily fragile to have opt_dead_functions depend on
these coincidences.  And the next patch in this series will break (c).
So I'm reverting the shortcut.  The consequence will be a slight
increase in link time for complex shaders.

This reverts commit c75427f4c8.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 482338842d)
2011-10-02 19:02:30 +02:00
Paul Berry
7dc636dd77 glsl: improve the accuracy of the atan(x,y) builtin function.
The previous formula for atan(x,y) returned a value of +/- pi whenever
|x|<0.0001, and used a formula based on atan(y/x) otherwise.  This
broke in cases where both x and y were small (e.g. atan(1e-5, 1e-5)).

This patch modifies the formula so that it returns a value of +/- pi
whenever |x|<1e-8*|y|, and uses the formula based on atan(y/x)
otherwise.
(cherry picked from commit b1b4ea0b36)
2011-10-02 19:00:31 +02:00
Paul Berry
e42b822fec glsl: improve the accuracy of the radians() builtin function
The constant used in the radians() function didn't have enough
precision, causing a relative error of 1.676e-5, which is far worse
than the precision of 32-bit floats.  This patch reduces the relative
error to 1.14e-9, which is the best we can do in 32 bits.

Fixes piglit tests {fs,vs}-radians-{float,vec2,vec3,vec4}.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit fe33c886a7)
2011-10-02 18:50:28 +02:00
Paul Berry
0501cee136 glsl: Lower break instructions when necessary at the end of a loop.
Normally lower_jumps.cpp doesn't need to lower a break instruction
that occurs at the end of a loop, because all back-ends can produce
proper GPU instructions for a break instruction in this "canonical"
location.  However, if other break instructions within the loop are
already being lowered, then a break instruction at the end of the loop
needs to be lowered too, since after the optimization is complete a
new conditional break will be inserted at the end of the loop.

Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps.  This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.

Fixes unit test test_lower_breaks_6.
(cherry picked from commit 067c9d7bd7)

Conflicts:

	src/glsl/lower_jumps.cpp
2011-10-02 18:48:27 +02:00
Paul Berry
38ae26b709 glsl: In lower_jumps.cpp, lower both branches of a conditional.
Previously, lower_jumps.cpp would break out of its loop after lowering
a jump instruction in just the then- or else-branch of a conditional,
and it would fail to lower a jump instruction occurring in the other
branch.

Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps.  This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.

Fixes unit test test_lower_returns_4.
(cherry picked from commit e71b4ab8a6)
2011-10-02 18:45:10 +02:00
Paul Berry
de798938d4 glsl: Use foreach_list in lower_jumps.cpp
The visitor class in lower_jumps.cpp never removes or replaces the
instruction being visited, but it frequently alters or removes the
instructions that follow it.  Therefore, to make sure the altered IR
is visited, it needs to iterate through exec_lists using foreach_list
rather than visit_exec_list().

Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps.  This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.

Also, certain invariants assumed by lower_jumps.cpp may fail to hold,
causing assertion failures.

Fixes unit tests test_lower_pulled_out_jump,
test_lower_unified_returns, test_lower_guarded_conditional_break,
test_lower_return_non_void_at_end_of_loop, and test_lower_returns_3.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 382cee91a4)
2011-10-02 18:44:50 +02:00
Paul Berry
acd2a03ffb glsl: lower unconditional returns and continues in loops.
Previously, lower_jumps.cpp would only lower return and continue
statements that appeared inside conditionals.  This patch makes it
lower unconditional returns and continue statements that occur inside
a loop.

Such unconditional flow control statements would be unlikely to be
explicitly coded by a reasonable user, however they might arise as a
result of other optimizations.

Without this patch, lower_jumps.cpp might not lower certain return and
continue statements, causing some backends to fail.

Fixes unit tests test_lower_return_void_at_end_of_loop and
test_remove_continue_at_end_of_loop.
(cherry picked from commit 03145ba655)

Conflicts:

	src/glsl/lower_jumps.cpp
2011-10-02 18:44:31 +02:00
Paul Berry
d1786cea1c glsl: Refactor logic for determining whether to lower return statements.
Previously, do_lower_jumps.cpp determined whether to lower return
statements in ir_lower_jumps_visitor::should_lower_jumps().  Moved
this logic to ir_lower_jumps_visitor::visit(ir_function_signature *),
so that it can be used in determining whether to lower a return
statement at the end of a function.
(cherry picked from commit dbaa2e627e)
2011-10-02 18:43:00 +02:00
Paul Berry
934c7a0661 glsl: Lower unconditional return statements.
Previously, lower_jumps.cpp only lowered return statements that
appeared inside of an if statement.

Without this patch, lower_jumps.cpp might not lower certain return
statements, causing some back-ends to fail (as in bug #36669).

Fixes unit test test_lower_returns_1.
(cherry picked from commit afc9a50fba)
2011-10-02 18:35:00 +02:00
Brian Paul
2ba0d0a5e8 mesa: add _NEW_CURRENT_ATTRIB in _mesa_program_state_flags()
If color material mode is enabled, constant buffer entries related
to the material coefficients will depend on glColor.  So add
_NEW_CURRENT_ATTRIB to the bitset returned for material-related
constants in _mesa_program_state_flags().

This fixes a bug exercised by the new piglit draw-arrays-colormaterial
test.

Note: This is a candidate for the 7.11 branch.
(cherry picked from commit 57169c4694)
2011-10-02 18:10:40 +02:00
Marek Olšák
1cf8f9599c r600g: add index_bias to index buffer bounds
This fixes ARB_draw_elements_base_vertex with max_index != ~0.

NOTE: This is a candidate for the 7.11 branch.
(cherry picked from commit 44afac04ea)
2011-10-02 18:10:19 +02:00
Brian Paul
2781baaa64 meta: fix broken sRGB mipmap generation
If we're generating a mipmap for an sRGB texture we need to bypass
sRGB->linear conversion.  Otherwise the destination mipmap level
(drawn with a textured quad) will have the wrong colors.
If we can't turn of sRGB->linear conversion (GL_EXT_texture_sRGB_decode)
we need to use the software fallback for mipmap generation.

Note: This is a candidate for the 7.11 branch.
(cherry picked from commit 1e939f5374)
2011-10-02 18:09:22 +02:00
Brian Paul
a74400ca30 mesa: fix PACK_COLOR_5551(), PACK_COLOR_1555() macros
The 1-bit alpha channel was incorrectly encoded.  Previously, any non-zero
alpha value for the ubyte alpha value would set A=1.  Instead, use the
most significant bit of the ubyte alpha to determine the A bit.  This is
consistent with the other channels and other OpenGL implementations.

Note: This is a candidate for the 7.11 branch.

Reviewed-by: Michel Dänzer <michel@daenzer.net>
(cherry picked from commit 4731a598f0)
2011-10-02 18:08:31 +02:00
Tom Stellard
a5e2074fdd r300/compiler: Fix regalloc for values with multiple writers
https://bugs.freedesktop.org/show_bug.cgi?id=40062
https://bugs.freedesktop.org/show_bug.cgi?id=36939

Note: This is a candidate for the 7.11 branch.
(applied diff manually from 2d1004d9aa)
2011-10-02 18:07:53 +02:00