v2: Do not wrap the code in ifdef HAVE_DRI3 (suggested by Keith)
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit eb2241f8a9)
This reverts commit a6860100b8.
Why this code didn't work in all circumstances is unknown and without a
working Ironlake simulator (which uses a different AUB format) we'll
probably never know, short of a lot of experimentation, and spending a
bunch of time to try to optimize a few instructions on Ironlake is not
time well spent.
Moreover, for mix(vec4, vec4, vec4) using the accumulator introduces a
dependence between the otherwise independent per-component calculations.
Not using the accumulator, even if it means an extra instruction per
component might be preferable. We don't know, we don't have data, and
we don't have the necessary register on Ironlake for shader_time to tell
us.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77707
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit c2c639ecf6)
This reverts commit 2dfbbeca50 with the
comment about MAC and implicit accumulator removed.
Why this code didn't work in all circumstances is unknown and without a
working Ironlake simulator (which uses a different AUB format) we'll
probably never know, short of a lot of experimentation, and spending a
bunch of time to try to optimize a few instructions on Ironlake is not
time well spent.
Moreover, for mix(vec4, vec4, vec4) using the accumulator introduces a
dependence between the otherwise independent per-component calculations.
Not using the accumulator, even if it means an extra instruction per
component might be preferable. We don't know, we don't have data, and
we don't have the necessary register on Ironlake for shader_time to tell
us.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77703
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit db42dd8952)
Note that predicated instructions with defs are still not supported
because transformation to SSA doesn't handle them yet.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 452a4151aa)
[imirkin: add logic to also clear the "regular" scissors]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 200382be85)
The glGetGraphicsResetStatusARB from ARB_robustness extension always
returns GUILTY_CONTEXT_RESET_ARB and never returns NO_ERROR for guilty
context with LOSE_CONTEXT_ON_RESET_ARB strategy. This is because Mesa
returns GUILTY_CONTEXT_RESET_ARB if batch_active !=0 whereas kernel
driver never reset batch_active and this variable always > 0 for guilty
context. The same behaviour also can be observed for batch_pending and
INNOCENT_CONTEXT_RESET_ARB.
But ARB_robustness spec says:
If a reset status other than NO_ERROR is returned and subsequent calls
return NO_ERROR, the context reset was encountered and completed. If a
reset status is repeatedly returned, the context may be in the process
of resetting.
8. How should the application react to a reset context event?
RESOLVED: For this extension, the application is expected to query the
reset status until NO_ERROR is returned. If a reset is encountered, at
least one *RESET* status will be returned. Once NO_ERROR is
encountered, the application can safely destroy the old context and
create a new one.
The main problem is the context may be in the process of resetting and
in this case a reset status should be repeatedly returned. But looks
like the kernel driver returns nonzero active/pending only if the
context reset has already been encountered and completed. For this
reason the *RESET* status cannot be repeatedly returned and should be
returned only once.
The reset_count and brw->reset_count variables can be used to control
that glGetGraphicsResetStatusARB returns *RESET* status only once for
each context. Note the i915 triggers reset_count twice which allows to
return correct reset count immediately after active/pending have been
incremented.
v2 (idr): Trivial reformatting of comments.
Signed-off-by: Pavel Popov <pavel.e.popov@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 8dc4a98c44)
glFramebufferRender(..., GL_DEPTH_STENCIL_ATTACHMENT, ..., 0) only
detached the depth buffer and not the stencil buffer.
Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=79115
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 846c715abb)
If the source renderbuffer has a depth > 0, then send a Z texcoord
which is set to the source attachment Z offset.
This fixes piglit's gl-3.2-layered-rendering-gl-layer-render with the
GL_TEXTURE_2D_MULTISAMPLE_ARRAY case test on i965/gen8.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 57876fee38)
brw_color_buffer_write_enabled depends on brw->fragment_program, which
means we have to listen to BRW_NEW_FRAGMENT_PROGRAM.
On most generations, this was only called from a function that already
subscribed. However, on Broadwell, we failed to listen to the necessary
event in the atom that emits 3DSTATE_PS_BLEND.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 746921cbb4)
I forgot to disable writemasking on the OR and MOV which set the render
target index and "source 0 alpha present to render target" bit.
Using get_element_ud is equivalent and avoids a line-wrap.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 7d3985ca6c)
_mesa_meta_setup_blit_shader() currently generates a fragment shader
which, irrespective of the number of draw buffers, writes the color
to only one 'out' variable. Current shader rely on an undefined
behavior and possibly works by chance.
From OpenGL 4.0 spec, page 256:
"If a fragment shader writes to gl_FragColor, DrawBuffers specifies a
set of draw buffers into which the single fragment color defined by
gl_FragColor is written. If a fragment shader writes to gl_FragData,
or a user-defined varying out variable, DrawBuffers specifies a set
of draw buffers into which each of the multiple output colors defined
by these variables are separately written. If a fragment shader writes
to none of gl_FragColor, gl_FragData, nor any user defined varying out
variables, the values of the fragment colors following shader execution
are undefined, and may differ for each fragment color."
OpenGL 4.4 spec, page 463, added an additional line in this section:
"If some, but not all user-defined output variables are written, the
values of fragment colors corresponding to unwritten variables are
similarly undefined."
V2: Write color output to gl_FragColor instead of writing to multiple
'out' variables. This'll avoid recompiling the shader every time
draw buffers count is updated.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 46737cebd3)
In commit 4be146b1, I neglected to add the new property to the strings
array. This leads to the string '(null)' to be printed instead when
converting a GS shader to text.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
(cherry picked from commit cdeb7004e0)
Make sure to normalize the z coordinates as well as the x/y ones when
there are mipmaps present. Fixes 3d mipmap generation, which now uses
the blit path.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
(cherry picked from commit 28360fcad7)
These instructions can come in either through IMUL_HI/UMUL_HI TGSI
opcodes, or from OP_DIV constant folding.
Also make sure that the constant foldings which delete the original
instruction still get counted as having done something.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
(cherry picked from commit d2a3de19c6)
Retrieving the high 32 bits of a signed multiply is rather annoying. It
appears that the simplest way to do this is to compute the absolute
value of the arguments, and perform a u32 x u32 -> u64 operation. If the
arguments' signs differ, then negate the result. Since there is no u64
support in the cvt instruction, we have the perform the 2's complement
negation "by hand".
This logic can come into use by the IMUL_HI instruction (very unlikely
to be seen), as well as from constant folding of division by a constant.
Fixes dolphin's divisions by 255.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
(cherry picked from commit d3a5cf052c)
This is a replacement for bd44ac8b5c
that should actually work.
Fixes Piglit's copyteximage-border on swrast, as well as one of
es3conform's packed_pixels_pixelstore test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78546
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77705
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 2ecc7268ba)
Separating the software fallbacks from the rest of the meta path (which
is usually hardware accelerated) gives callers better control over their
blitting options.
For example, i965 might want to try meta blit, hardware blits, then
swrast as a last resort. Splitting it makes that possible.
This updates all callers to maintain the existing behavior (even in the
few cases where it isn't desirable behavior - later patches can change
that).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 54540ea691)
These aren't necessary - all of the following code is predicated on mask
being non-zero, so no code will get executed anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Courtney Goeltzenleuchter <courtney@lunarg.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit d89ce333cc)
I don't have an ILK at hand but the fix should be trivial.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78872
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 21dddb22c1)
It's not properly implemented in the meta code, and we don't have time
to fix it for 10.2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 0b96d362bf)
UNION appears to expect that all of its sources are conditionally
defined. Otherwise it inserts an unpredicated mov instruction which
overwrites the desired result. This fixes tests that use UMUL_HI, and
much less directly, unsigned integer division by a constant, which uses
this functionality in a peephole pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
(cherry picked from commit 5b8f1a0f7c)
I think a3xx and later should support (it is part of GLES3), but this
isn't needed for the time being and still needs to be reversed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Gallium already gives us height==1 for these, so the texture state is
already setup correctly to emulate 1D textures as a Nx1 2D texture. We
just need to supply the .y coord.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
That was easy. Turns out it is just a matter of setting one bit.
Enable sampling from sRGB texture, and therefore enable GL 2.1 :-)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The loops for updating the multiple packed fields in SP_VS_OUT[] and
SP_VS_VPC_DST[] will zero out one register beyond the last that on
required. Which is normally not a problem (and is kinda convenient
when looking at cmdstream dumps) unless we have maximum (16) varyings.
Fix loop termination condition so that this does not happen.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to size input/output tables big enough for special inputs/
outputs (gl_Position, gl_FrontFacing, etc) which, while they don't
count towards the hw limit of 16 attributes or 16 varyings, we do
still need to track them all the same.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Hardware only supports 16. Which fd3_shader_variant properly reflected,
but the pipe cap did not, leading to array overflow (and shaders that
could not possibly work).
Also a bunch of asserts to make problems like this easier to see.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The KILL_IF opcode could potentially be merged in to the regular KILL
opcode function. It was a pain to do so, so I've left is separated
for cleanliness.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Adds a large sum of TGSI opcodes to the a3xx compiler.
For integer opcodes we have 28 opcodes added.
Adds 4 floating point compare opcodes
If GLSL 1.30 is enabled, this allows the GLSL 1.30 piglits to have a
completion amount of 432/641.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Real GPU queries need some infrastructure to track samples per tile and
accumulate the results. But fortunately this can be shared across GPU
generation.
See:
https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries
Signed-off-by: Rob Clark <robclark@freedesktop.org>