Commit graph

70885 commits

Author SHA1 Message Date
Grigori Goronzy
d15b32ebde clover: implement CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE
Work-group size should always be aligned to subgroup size; this is a
basic requirement, otherwise some work-items will be no-operation.

It might make sense to refine the value according to a kernel's
resource usage, but that's a possible optimization for the future.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-06-29 13:24:37 +02:00
Grigori Goronzy
249a9df7fc gallium: add PIPE_COMPUTE_CAP_SUBGROUP_SIZE
We need this to implement OpenCL's
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-06-29 13:24:22 +02:00
Neil Roberts
c0ca6c30ea i965: Don't try to print the GLSL IR if it has been freed
Since commit 104c8fc2c2 the GLSL IR will be freed if NIR is
being used. This was causing it to segfault if INTEL_DEBUG=wm is set.
This patch just makes it avoid dumping the GLSL IR in that case.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2015-06-29 11:33:34 +01:00
Emil Velikov
dd9ceb0219 docs: add news item and link release notes for mesa 10.6.1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-29 09:03:19 +01:00
Emil Velikov
24df6cd0f7 docs: Add sha256 checksums for the 10.6.1 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 6ff3ae8deb)
2015-06-29 09:01:04 +01:00
Emil Velikov
07158c508a Add release notes for the 10.6.1 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit a871e80fc6)
2015-06-29 09:01:00 +01:00
Kenneth Graunke
6218c68bec Revert "glsl: clone inputs and outputs during linking"
This reverts commit c2ff3485b3.

Ilia and I noticed a memory leak caused by this patch: at least with
fixed-function programs, we clone things using ProgramResourceList as
the context before reralloc makes it non-NULL.

I believe Tapani found other bugs with these patches, so I'm just going
to revert them for now and let him pursue them further.
2015-06-28 22:20:27 -07:00
Kenneth Graunke
cae701fc8e Revert "i965: Delete linked GLSL IR when using NIR."
This reverts commit 104c8fc2c2.
2015-06-28 22:17:09 -07:00
Ilia Mirkin
61912036d1 nv30: avoid leaking blit fp/vp
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-06-29 00:46:53 -04:00
Ilia Mirkin
b5622313ea nv40: enable base vertex
Still appears to have issues with negative indices less than -1M, but
that's a corner case of a corner case.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-06-29 00:46:45 -04:00
Kenneth Graunke
19a0ba130f i965/vs: Move compute_clip_distance() out of emit_urb_writes().
Legacy user clipping (using gl_Position or gl_ClipVertex) is handled by
turning those into the modern gl_ClipDistance equivalents.

This is unnecessary in Core Profile: if user clipping is enabled, but
the shader doesn't write the corresponding gl_ClipDistance entry,
results are undefined.  Hence, it is also unnecessary for geometry
shaders.

This patch moves the call up to run_vs().  This is equivalent for VS,
but removes the need to pass clip distances into emit_urb_writes().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2015-06-28 19:44:34 -07:00
Kenneth Graunke
17e8fca626 i965: Write at least some data in SIMD8 URB write messages.
According to the "URB SIMD8 Write > Write Data Payload" documentation,
"The write data payload can be between 1 and 8 message phases long."

Apparently, the simulator considers it an error if you issue an URB
SIMD8 message with only a header and no actual data to write.

v2: Try to put in a better PRM citation, now that the Broadwell docs
    actually exist (requested by Jordan).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-06-28 19:44:33 -07:00
Samuel Pitoiset
b4b4406e1e gallium/hud: prevent NULL pointer dereference with pipe_query functions
The HUD doesn't check if query_create() fails and it calls other
pipe_query functions with NULL pointer instead of a valid query object.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-06-28 09:49:03 +02:00
Mario Kleiner
a98600b0eb nouveau: Use dup fd as key in drm-winsys hash table to fix ZaphodHeads.
The dup'ed fd owned by the nouveau_screen for a device node
must also be used as key for the winsys hash table, instead
of using the original fd passed in for a screen, to make
multi-x-screen ZaphodHeads configurations work on nouveau.

The original fd's lifetime differs from that of the nouveau_screen stored
in the hash. The hash key is the fd, and in order to compare hash entries
we fstat them, so the fd must be around for as long as the screen is.

This is an extension of the fix in commit a59f2bb1 (nouveau: dup fd
before passing it to device).

Cc: "10.3 10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-06-28 01:11:38 -04:00
Mike Stroyan
2a210b797e meta: Only change and restore viewport 0 in mesa meta mode
The meta code was setting a default depth range for all viewports
and 'restoring' all viewports to depth range values saved from viewport 0.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-06-27 11:29:56 -07:00
Dave Airlie
556dd4af76 radeonsi: add support for geometry shader invocations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-27 00:24:30 +01:00
Dave Airlie
7e5064360c radeonsi: add support for viewport array (v3)
This isn't pretty and I'd suggest it the pm4 interface builder
could be tweaked to do this more efficently, but I'd need
guidance on how that would look.

This seems to pass the few piglit tests I threw at it.

v2: handle passing layer/viewport index to fragment shader.
fix crash in blit changes,
add support to io_get_unique_index for layer/viewport index
update docs.
v3: avoid looking up viewport index and layer in es (Marek).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-27 00:24:07 +01:00
Kenneth Graunke
35d8379304 i965/fs: Fix ir_txs in emit_texture_gen4_simd16().
We were not emitting the LOD, which led to message lengths of 1 instead
of 3.  Setting has_lod makes us emit the LOD, but I had to make changes
to avoid emitting the non-existent coordinate as well.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91022
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-06-26 15:57:03 -07:00
Ilia Mirkin
ad62ec8316 nv50/ir: propagate modifier to right arg when const-folding mad
An immediate has to be the second arg of an ADD operation. However we
were mistakenly propagating the modifier of the non-folded value to the
folded immediate argument.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91117
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-06-26 18:42:29 -04:00
Boyan Ding
052b3d4e2f egl_dri2: Remove trailing whitespaces
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-06-26 17:05:21 +00:00
Neil Roberts
3cf90bb183 i965/skl: Fix aligning mt->total_width to the block size
brw_miptree_layout_2d tries to ensure that mt->total_width is a
multiple of the compressed block size, presumably because it wouldn't
be possible to make an image that has a fraction of a block. However
it was doing this by aligning mt->total_width to align_w. Previously
align_w has been used as a shortcut for getting the block width
because before Gen9 the block width was always equal to the alignment.
Commit 4ab8d59a2 tried to fix these cases to use the block width
instead of the alignment but it missed this case.

I think in practice this probably won't make any difference because
the buffer for the texture will be allocated to be large enough to
contain the entire pitch and libdrm aligns the pitch to the tile width
anyway. However I think the patch is worth having to make the
intention clearer.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2015-06-26 17:02:22 +01:00
Matt Turner
404a90b827 mesa: Enable subdir-objects globally.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-26 12:55:25 +01:00
Emil Velikov
229450520a mesa: fold duplicated GL/GL_CORE/GLES3 entry in get_hash_params.py
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-26 12:55:25 +01:00
Chia-I Wu
7de85694fa ilo: define ILO_IMAGE_MAX_LEVEL_COUNT
Define ILO_IMAGE_MAX_LEVEL_COUNT for ilo_image and remove unnecessary header
includes.
2015-06-26 13:45:28 +08:00
Chia-I Wu
cbdc26aa3f ilo: replace pipe_format by gen_surface_format
Replace pipe_format by gen_surface_format in ilo_image.  Change how depth
format is specified in ilo_state_zs.
2015-06-26 13:45:28 +08:00
Chia-I Wu
2ee95f6d64 ilo: always use the specified image format
Move silent promotion of PIPE_FORMAT_ETC1_RGB8 or combined depth/stencil out
of core.
2015-06-26 13:45:28 +08:00
Chia-I Wu
dc2e92b2d3 ilo: replace pipe_texture_target by gen_surface_type
Replace pipe_texture_target by gen_surface_type in ilo_image.  Change how
GEN6_SURFTYPE_CUBE is specified in ilo_state_surface and ilo_state_zs.
2015-06-26 13:45:28 +08:00
Chia-I Wu
934e4a469f ilo: initialize ilo_image from ilo_image_info
Convert pipe_resource to ilo_image_info for image initialization.
2015-06-26 13:45:28 +08:00
Chia-I Wu
f825fe8e13 ilo: remove ilo_image_disable_aux()
Fail resource creation when aux bo allocation fails.
2015-06-26 13:45:28 +08:00
Chia-I Wu
07acf9cb16 ilo: improve SURFTYPE_BUFFER validations
Reorganize the validations to make them more systematic.
2015-06-26 13:45:27 +08:00
Chia-I Wu
9871646c13 ilo: remove ilo_buffer
Since the addition of ilo_vma, it was used only to pad a bo for sampling
engine surfaces.  Replace it entirely with these functions

  ilo_state_surface_buffer_size()
  ilo_state_vertex_buffer_size()
  ilo_state_index_buffer_size()
  ilo_state_sol_buffer_size()
2015-06-26 13:45:27 +08:00
Chia-I Wu
36d107e92c ilo: introduce ilo_vma
This cleans up the code a bit and makes ilo_state_vector_resource_renamed()
simpler and more robust.  It also allows a single bo to back mulitple VMAs.
2015-06-26 13:45:27 +08:00
Iago Toral Quiroga
fbba25bba0 mesa: remove unnecessary checks in _mesa_readpixels_needs_slow_path
readpixels_can_use_memcpy will later call _mesa_format_matches_format_and_type
which does much tighter checks than these to decide if we can use
memcpy for readpixels.

Also, the checks do not seem to be extensive enough anyway, since we are
checking for signed/unsigned conversion only when the framebuffer has integers,
but the same checks could be done for other types anyway, since as long as
there is a signed/unsigned conversion we can't memcpy.

No regressions observed on i965/llvmpipe.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-06-26 07:42:47 +02:00
Jason Ekstrand
316206ee9e i965/vec4_live_variables: Do liveness analysis bottom-to-top
From Muchnick's Advanced Compiler Design and Implementation:

"To determine which variables are live at each point in a flowgraph, we
perform a backward data-flow analysis"

Previously, we were walking the blocks forwards and updating the livein and
then the liveout.  However, the livein calculation depends on the liveout
and the liveout depends on the successor blocks.  The net result is that it
takes one full iteration to go from liveout to livein and then another
full iteration to propagate to the predecessors.  This works out to an
O(n^2) computation where n is the number of blocks.  If we run things in
the other order, it's O(nl) where l is the maximum loop depth which is
practically bounded by 3.

In b2c6ba0c4b, we made this same change in
the FS backend to great effect.  Might as well keep it consistent and make
the same change for vec4.  Also, this took the time to run the test:

ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1

from 6:49.62 to 3:31.40 on Timothy Arceri's machine.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-25 16:42:20 -07:00
Ben Widawsky
c1151b18f2 i965/skl: Use more compact hiz dimensions
gen8 had some special restrictions which don't seem to carry over to gen9.
Quoting the spec for SKL:
"The Z_Height and Z_Width values must equal those present in
3DSTATE_DEPTH_BUFFER incremented by one."

This fixes nothing in piglit (and regresses nothing).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2015-06-25 14:17:02 -07:00
Marek Olšák
101a73846b radeonsi: don't fail in si_shader_io_get_unique_index
Trivial. Picked from my tessellation branch.
2015-06-25 15:05:56 +02:00
Kenneth Graunke
c97105ee12 i965: Drop brw->depthstencil.stencil_offset from gen8_depth_state.c.
This is always 0 - only brw_workaround_depthstencil_alignment ever sets
it, and that doesn't run on Gen6+.  My initial Broadwell depth state
commit had this mistake.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2015-06-25 02:18:51 -07:00
Kenneth Graunke
6026f7e8fb nir: Recognize max(min(a, 1.0), 0.0) as fsat(a).
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).

shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs:     11928 -> 11670 (-2.16%)
helped:                                64
HURT:                                  0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2015-06-25 02:12:32 -07:00
Marek Olšák
77a78c65f8 softpipe,llvmpipe: fix PIPE_SHADER_CAP_MAX_INPUTS value
PIPE_MAX_SHADER_INPUTS was recently bumped to 80 because of tessellation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91099
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91101

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-06-25 09:00:23 +02:00
Ben Widawsky
d1663ccb4c i965/bxt: Add basic Broxton infrastructure
The thread counts and URB information are all speculative numbers that were
based on some CHV numbers at the time.

v2:
Originally this patch had PCI IDs. I've moved that to a new patch at the end of
the series.
Remove is_cherryview hack.
Add PCI ids. These match the ones defined in the kernel. The only one tested by
us is 0x0a84.
Capitalize the hex string (Mark)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: "Lecluse, Philippe" <Philippe.Lecluse@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-06-24 16:37:12 -07:00
Ian Romanick
9f261dc18d radeon: Advertise correct GL_QUERY_COUNTER_BITS/GL_SAMPLES_PASSED value
Commit b765119c changed the default value of all the counter bits to
64.  However, older hardware only has 32 counter bits.

This has only been build-tested.  We don't have any tests that verify
the advertised value against implementation behavior, so I don't know
what additional testing could be done.

NOTE: It appears that many Gallium drivers (at least r300 and i915g)
have the same problem, but I don't see a way for the state-tracker to
determine the counter size.  Marek says, "For Gallium, a new PIPE_CAP or
new get_xxx_param function will be needed."

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
2015-06-24 16:33:32 -07:00
Jason Ekstrand
b2c6ba0c4b i965/fs_live_variables: Do liveness analysis bottom-to-top
From Muchnick's Advanced Compiler Design and Implementation:

"To determine which variables are live at each point in a flowgraph, we
perform a backward data-flow analysis"

Previously, we were walking the blocks forwards and updating the livein and
then the liveout.  However, the livein calculation depends on the liveout
and the liveout depends on the successor blocks.  The net result is that it
takes one full iteration to go from liveout to livein and then another
full iteration to propagate to the predecessors.  This works out to an
O(n^2) computation where n is the number of blocks.  If we run things in
the other order, it's O(nl) where l is the maximum loop depth which is
practically bounded by 3.

On my HSW desktop, one particular shadertoy test gets a 20% improvement in
compile times:

N           Min           Max        Median           Avg        Stddev
x  10        15.965        16.884        16.026       16.1822    0.34736846
+  10        12.813        13.052        12.876       12.8891    0.06913666
Difference at 95.0% confidence
        -3.2931 +/- 0.235316
        -20.3501% +/- 1.45417%
        (Student's t, pooled s = 0.250444)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-06-24 13:11:30 -07:00
Tapani Pälli
104c8fc2c2 i965: Delete linked GLSL IR when using NIR.
This is based on Kenneth's patch to delete 'most of the IR'. Due to
linker changes to clone variables, we can now free all of IR.

Saves 58MB of memory when replaying a Dota 2 trace on Broadwell.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-06-24 12:03:41 -07:00
Tapani Pälli
c2ff3485b3 glsl: clone inputs and outputs during linking
This increases memory pressure during linking but makes it easier
for backend to free IR after it is not needed anymore.

v2: use resource list as ralloc context in case of relink (Kenneth)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
2015-06-24 12:01:21 -07:00
Chris Wilson
4b35ab9bdb i965: Rename intel_emit* to reflect their new location in brw_pipe_control
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-06-24 10:35:04 -07:00
Chris Wilson
9d4b9f1e0c i965: Transplant PIPE_CONTROL routines to brw_pipe_control
Start trimming the fat from intel_batchbuffer.c. First by moving the set
of routines for emitting PIPE_CONTROLS (along with the lore concerning
hardware workarounds) to a separate brw_pipe_control.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-06-24 10:35:04 -07:00
Kenneth Graunke
147cdb53ec nir: Use a switch statement for detecting move-like operations.
Suggested by Jason Ekstrand.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2015-06-24 10:35:04 -07:00
Brian Paul
e31bce4041 svga: silence warnings about unexpected shader type
Trivial.
2015-06-24 10:42:19 -06:00
Brian Paul
c1de7df6d4 st/mesa: remove unneeded pipe_surface_release() in st_render_texture()
This caused us to always free the pipe_surface for the renderbuffer.
The subsequent call to st_update_renderbuffer_surface() would typically
just recreate it.  Remove the call to pipe_surface_release() and let
st_update_renderbuffer_surface() take care of freeing the old surface
if it needs to be replaced (because of change to mipmap level, etc).

This can save quite a few calls to pipe_context::create_surface() and
surface_destroy().

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-24 07:14:56 -06:00
Emil Velikov
a552c897ca st/wgl: add stw_nopfuncs.h to the sources lists
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-24 13:43:44 +01:00