Commit graph

86267 commits

Author SHA1 Message Date
Nanley Chery
a6fb62a864 isl: Fix RenderTargetViewExtent for mipmapped 3D surfaces
Match the comment stated above the assignment.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-04 13:20:44 -08:00
Nanley Chery
b80c8ebc45 isl: Get rid of isl_surf_fill_state_info::level0_extent_px
This field is no longer needed.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-04 13:20:03 -08:00
Jason Ekstrand
d154a5ebd6 anv/cmd_buffer: Let the pipeline set StencilBufferWriteEnable on gen9 2016-03-04 12:23:01 -08:00
Jason Ekstrand
f374765ce6 anv/cmd_buffer: Mask stencil reference values 2016-03-04 12:22:32 -08:00
Jason Ekstrand
d61dcec64d anv/clear: Pull the stencil write mask from the pipeline
The stencil write mask wasn't getting set at all so we were using whatever
write mask happend to be left over by the application.
2016-03-04 12:03:00 -08:00
Jason Ekstrand
ec18fef88d anv/pipeline: Set StencilBufferWriteEnable from the pipeline
The hardware docs say that StencilBufferWriteEnable should only be set if
StencilTestEnable is set.  It seems reasonable to set them together.
2016-03-04 12:03:00 -08:00
Jason Ekstrand
fcd8e57185 anv/pipeline: More competent gen8 clipping 2016-03-04 12:03:00 -08:00
Jason Ekstrand
a8afd29653 anv/pipeline: Use the right provoking vertex for triangle fans 2016-03-04 12:03:00 -08:00
Jason Ekstrand
fa8539dd6b anv/pipeline: Respect pRasterizationState->depthBiasEnable 2016-03-04 12:03:00 -08:00
Matt Turner
1f862e923c i965/fs: Optimize float conversions of byte/word extract.
instructions in affected programs: 31535 -> 29966 (-4.98%)
   helped: 23

   cycles in affected programs: 272648 -> 266022 (-2.43%)
   helped: 14
   HURT: 1

The patch decreases the number of instructions in the two Unigine
programs by:

 #1721: 4374 -> 4155 instructions (-5.01%)
 #1706: 3582 -> 3363 instructions (-6.11%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Matt Turner
905ff86198 nir: Recognize open-coded extract_u16.
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Matt Turner
76289fbfa8 nir: Recognize open-coded extract_u8.
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:

   float((val >> 16u) & 0xffu)
   float((val >>  8u) & 0xffu)
   float((val >>  0u) & 0xffu)

Instead of shifting, masking, and type converting like this:

   shr(8)          g15<1>UD        g25<8,8,1>UD    0x00000010UD
   and(8)          g16<1>UD        g15<8,8,1>UD    0x000000ffUD
   mov(8)          g17<1>F         g16<8,8,1>UD

   shr(8)          g18<1>UD        g25<8,8,1>UD    0x00000008UD
   and(8)          g19<1>UD        g18<8,8,1>UD    0x000000ffUD
   mov(8)          g20<1>F         g19<8,8,1>UD

   and(8)          g21<1>UD        g25<8,8,1>UD    0x000000ffUD
   mov(8)          g22<1>F         g21<8,8,1>UD

i965 can simply extract a byte and convert to float in a single
instruction:

   mov(8)          g17<1>F         g25.2<32,8,4>UB
   mov(8)          g20<1>F         g25.1<32,8,4>UB
   mov(8)          g22<1>F         g25.0<32,8,4>UB

This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.

   instructions in affected programs: 28568 -> 27450 (-3.91%)
   helped: 7

   cycles in affected programs: 210076 -> 203144 (-3.30%)
   helped: 7

This patch decreases the number of instructions in the two Unigine
programs by:

 #1721: 4520 -> 4374 instructions (-3.23%)
 #1706: 3752 -> 3582 instructions (-4.53%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-04 11:52:34 -08:00
Kenneth Graunke
9d7faadd8a anv: Fix backwards shadow comparisons
sample_c is backwards from what GL and Vulkan expect.

See intel_state.c in i965.

v2: Drop unused vk_to_gen_compare_op.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-04 11:35:46 -08:00
George Kyriazis
01e92e7010 st/xlib: Hang off screen destructor off main XCloseDisplay() callback.
This resolves some order dependencies between the already existing
callback the newly created one.

Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-04 10:57:24 -07:00
George Kyriazis
51e562c3ea st/xlib: Support unlimited number of display connections
There is a limit of 10 display connections, which was a
problem for apps/tests that were continuously opening/closing display
connections.

This fix uses XAddExtension() and XESetCloseDisplay() to keep track
of the status of the display connections from the X server, freeing
mesa-related data as X displays get destroyed by the X server.

Poster child is the VTK "TimingTests"

Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-04 10:57:09 -07:00
Brian Paul
192ee9adb1 svga: add new command-buffer-size HUD query
To plot a graph of the command buffer size.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-04 07:57:41 -07:00
Brian Paul
1258f907f4 svga: add new svga_winsys_context::get_command_buffer_size()
To ask how large the current command buffer is.  Will be used for
a new GALLIUM_HUD graph.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-04 07:57:41 -07:00
Brian Paul
6fc8d90fa9 svga: reorder SVGA_QUERY_ switch cases to match declaration order
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-03-04 07:57:41 -07:00
Sinclair Yeh
f1410c5b91 svga: Force an RGBA view creation for an RGBA resource
glXCreatePixmap() may specify a GLX_TEXTURE_FORMAT_RGB_EXT format
for an RGBA resource, causing us to create an RGBX view for an
RGBA resource, a combination vgpu10 does not support.

When this is detected, change the request to create an RGBA view
instead.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-04 07:57:41 -07:00
Charmaine Lee
8366701f4c svga: fix an error in svga_texture_generate_mipmap
With this patch, make sure the shader resource view is properly created
before referencing it in the generate mipmap command.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-03-04 07:57:41 -07:00
Thomas Hellstrom
395c7b8fa1 winsys/svga: Increase the fence timeout
If running with a software renderer backend, the timeout may be
insufficient, and we don't want to release busy buffers too early.

In practice, SVGA gpu lockups are extremely rare.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-03-04 13:55:23 +01:00
Thomas Hellstrom
24ad7e16cd winsys/svga: Fix an uninitialized return value
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviwed-by: Brian Paul <brianp@vmware.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
2016-03-04 13:54:38 +01:00
Kenneth Graunke
9ec246796f i965: Set MaxFramebufferWidth/Height to 16384, not viewport.
dEQP-GLES31.functional.fbo.no_attachments.maximums.{all,height,size,width}
started hitting assertion failures when emitting SURFACE_STATE, after
commit e8fd60e789 where Samuel increased the maximum viewport size to
32768, from 16384.

MaxFramebufferWidth/Height were being set to the maximum viewport size,
but are actually limited by the SURFACE_STATE width/height field range,
which is 16384 on Gen7+ (where ARB_framebuffer_no_attachments is
exposed).  So, reduce these to 16384 explicitly.

Fixes assert fails in the above mentioned dEQP tests.  (Those tests
still fail, however.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-03 21:31:22 -08:00
Francisco Jerez
a6046d217d glsl: Improve the accuracy of the acos() approximation.
The adjusted polynomial coefficients come from the numerical
minimization of the L2 norm of the relative error.  The old
coefficients would give a maximum relative error of about 15000 ULP in
the neighborhood around acos(x) = 0, the new ones give a relative
error bounded by less than 2000 ULP in the same neighborhood.

Fixes four dEQP subtests:
dEQP-GLES31.functional.shaders.builtin_functions.precision.acos.
highp_compute.{scalar,vec2,vec3,vec4}

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-03 21:31:22 -08:00
Kenneth Graunke
2795fbcae3 glsl: Parameterize asin_expr() on the fit coefficients.
This will allow us to share the implementation while using different
polynomials for asin() and acos().

Francisco Jerez did this in the SPIR-V front-end; I'm merely porting
his idea to the GLSL world.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-03 21:31:22 -08:00
Kenneth Graunke
aa37cbdff7 mesa: Allow Get*() of several forgotten IsEnabled() pnames.
From section 6.2 ("State Tables") of the GL 2.1 specification
(the text also appears in the GL 3.0 and ES 3.1 specifications):
"However, state variables for which IsEnabled is listed as the query
 command can also be obtained using GetBooleanv, GetIntegerv, GetFloatv,
 and GetDoublev."

GL_DEBUG_OUTPUT, GL_DEBUG_OUTPUT_SYNCHRONOUS, and GL_FRAGMENT_SHADER_ATI
were missing from the glGet*() functions.  All other IsEnabled() pnames
look to be present, as far as I can tell.

Fixes 8 dEQP-GLES31.functional.debug.state_query subtests:
debug_output[_synchronous]_get{boolean,float,integer,integer64}.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-03 21:31:22 -08:00
Kenneth Graunke
b4b50b074b mesa: Make glGet queries initialize ctx->Debug when necessary.
dEQP-GLES31.functional.debug.state_query.debug_group_stack_depth_*
tries to call glGet on GL_DEBUG_GROUP_STACK_DEPTH right away, before
doing any other debug setup.  This should return 1.

However, because ctx->Debug wasn't allocated, we bailed and returned 0.

This patch removes the open-coded locking and switches the two glGet
functions to use _mesa_lock_debug_state(), which takes care of
allocating and initializing that state on the first time.  It also
conveniently takes care of unlocking on failure for us, so we don't
need to handle that in every caller.

Fixes dEQP-GLES31.functional.debug.state_query.debug_group_stack_depth_
{getboolean,getfloat,getinteger,getinteger64}.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-03-03 21:31:22 -08:00
Kenneth Graunke
3ed260f54c hack to make dota 2 menus work 2016-03-03 16:21:09 -08:00
Jason Ekstrand
56ba13c994 isl/surface_state: Set L2 bypass disable for certain BC* formats 2016-03-03 16:16:57 -08:00
Eduardo Lima Mitev
47392011c0 Update docs to advertise new support for ARB_internalformat_query2
Support in Mesa main and i965 has just been added.

v2: Include note in 'New Features' of docs/relnotes/11.3.0.html.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-03-03 22:19:35 +01:00
Kenneth Graunke
623ce595a9 anv: Compile shader stages in pipeline order.
Instead of the arbitrary order modules might be specified in.

Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:36:19 -08:00
Nanley Chery
8dddc3fb1e anv/meta: Delete unused functions
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:26:44 -08:00
Nanley Chery
d20f6abc85 anv/meta: Use blitter API for state-handling in Buffer Update/Copy
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:26:42 -08:00
Nanley Chery
318b67d157 anv/meta: Use blitter API in do_buffer_copy()
v2: Keep pitch in units of bytes (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:26:36 -08:00
Nanley Chery
96ff4d0679 anv/meta: Use blitter API in anv_CmdCopyImage()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:26:35 -08:00
Nanley Chery
9b6c95d46e anv/meta: Use blitter API for copies between Images and Buffers
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:25:20 -08:00
Nanley Chery
91640c34c6 anv/meta: Add function which copies between Buffers and Images
v2: Keep pitch in units of bytes (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:25:15 -08:00
Nanley Chery
61ad78d0d1 anv/meta: Add function to create anv_meta_blit2d_surf from anv_image
v2: Keep pitch in units of bytes (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:25:10 -08:00
Nanley Chery
2e9b08b9b8 anv/meta: Implement the blitter API functions
Most of the code in anv_meta_blit2d() is borrowed from do_buffer_copy().

Create an image and image view for each rectangle.
Note: For tiled RGB images, ISL will align the image's row_pitch up to
the nearest tile width.

v2 (Jason):
    Keep pitch in units of bytes
    Make src_format and dst_format variables
    s/dest/dst/ in every usage
v3: Fix dst_image width

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:25:04 -08:00
Nanley Chery
032bf172b4 anv/meta: Modify blitter API fields
Some fields are unnecessary. The variables "pitch" and "bs" are used
for consistency with ISL.

v2: Keep pitch in units of bytes (Jason)

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:53 -08:00
Jason Ekstrand
654f79a045 anv/meta: Add the beginnings of a blitter API
This API is designed to be an abstraction that sits between the VkCmdCopy
commands and the hardware.  The idea is that it is simple enough that it
*should* be implementable using the blitter but with enough extra data that
we can implement it with the 3-D pipeline efficiently.  One design
objective is to allow the user to supply enough information that we can
handle most blit operations with a single draw call even if they require
copying multiple rectangles.
2016-03-03 11:24:45 -08:00
Nanley Chery
d1e48b9945 anv/meta: Remove redundancies in do_buffer_copy()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:42 -08:00
Nanley Chery
cfe7036750 anv/meta: Replace copy_format w/ block size in do_buffer_copy()
This is a preparatory commit that will simplify the future usage of
this function.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:38 -08:00
Nanley Chery
d50ff250ec anv/meta: Add missing command to exit meta in anv_CmdUpdateBuffer()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:21 -08:00
Nanley Chery
1d9d90d9a6 anv/image: Create a linear image when requested
If a linear image is requested, the only possible result should be a
linearly-tiled surface.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:24:17 -08:00
Nanley Chery
091f1da902 isl: Don't filter tiling flags if a specific tiling bit is set
If a specific bit is set, the intention to create a surface with a
specific tiling format should be respected.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 11:23:40 -08:00
Nanley Chery
456f5b0314 isl: Add function to get intratile offsets from x/y offsets
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-03-03 10:56:15 -08:00
Jason Ekstrand
206414f92e anv/util: Fix vector resizing
It wasn't properly handling the fact that wrap-around in the source may not
translate to wrap-around in the destination.  This really needs unit tests.
2016-03-03 08:17:36 -08:00
Antia Puentes
4f028bfcc0 i965: Enable the ARB_internalformat_query2 extension
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:08 +01:00
Eduardo Lima Mitev
cbbdf8612d i965/formatquery: Add support for INTERNALFORMAT_PREFERRED query
This pname is tricky. The spec states that an internal format should be
returned, that is compatible with the passed internal format, and has
at least the same precision. There is no clear API to resolve this.

The closest we have (and what other drivers (i.e, NVidia proprietary) do,
is to return the same internal format given as parameter. But we validate
first that the passed internal format is supported by i965.

To check for support, we have the TextureFormatSupported map'. But
this map expects a 'mesa_format', which takes a format+typen. So, we must
first "come up" with a generic type that is suited for this internal format,
then get a mesa_format, and then do the validation.

The cleanest solution here is to add a method that does exactly what
the spec wants: a driver's preferred internal format from a given
internal format. But at this point we lack a clear view of what
defines this preference, and also there seems to be no API for it.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-03-03 15:14:08 +01:00