Commit graph

77476 commits

Author SHA1 Message Date
Jordan Justen
de65d4dcaf anv: Fix build without VALGRIND
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-06 15:54:51 -08:00
Jason Ekstrand
5bbf060ece i965/compiler: Enable more lowering in NIR
We don't need these for GLSL or ARB, but we need them for SPIR-V
2016-01-06 15:30:53 -08:00
Jason Ekstrand
573351cb0f nir/algebraic: Add more lowering
This commit adds lowering options for the following opcodes:

 - nir_op_fmod
 - nir_op_bitfield_insert
 - nir_op_uadd_carry
 - nir_op_usub_borrow
2016-01-06 15:30:53 -08:00
Jason Ekstrand
1f503603d3 nir/opcodes: Fix the folding expression for usub_borrow 2016-01-06 15:30:53 -08:00
Jason Ekstrand
22804de110 nir/spirv: Properly implement Modf 2016-01-06 15:30:53 -08:00
Jason Ekstrand
1f3593d8a1 nir/builder: Add a helper for storing to a deref 2016-01-06 15:30:53 -08:00
Sarah Sharp
39c41be50d mesa: Add KBL PCI IDs and platform information.
Add PCI IDs for the Intel Kabylake platforms.  The IDs are taken
directly from the Linux kernel patches, which are under review:

http://lists.freedesktop.org/archives/intel-gfx/2015-October/078967.html
http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=kbl-upstream-v2

The Kabylake PCI IDs taken from the kernel are rearranged to be in order
of GT type, then PCI ID.

Please note that if this patch is backported, the following fixes will
need to be added before this patch:

commit 28ed1e08e8 "i965/skl: Remove early platform support"
commit c1e38ad370 "i965/skl: Use larger URB size where available."

Thanks to Ben for fixing a bug around setting urb.size, and being
patient with my questions about what the various fields mean.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (KBL-GT2)
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
2016-01-06 15:11:00 -08:00
Sinclair Yeh
0819287f56 svga: Rename SVGA_HINT_FLAG_DRAW_EMITTED
Rename SVGA_HINT_FLAG_DRAW_EMITTED to SVGA_HINT_FLAG_CAN_PRE_FLUSH
because preemptive flush can be unblocked by more commands than
draw.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 16:04:45 -07:00
Sinclair Yeh
9ccc716534 svga: allow preemptive flushing on DMA, update, and readback commands
The existing code effectively turns off preemptive flushing for all
but the regions used for draws.  This turns out to be overly
restrictive as some memory regions, e.g. GMR, may never get a draw
when used as a DMA upload staging area, causing problems for apps
that upload a large amount of textures, e.g. Unigine Heaven.

This patch fixes the Unigine Heaven memory allocation error and
has been verified to not cause a regression in the previous extended
retina display issue.

Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 16:03:33 -07:00
Charmaine Lee
b074a5b02d svga: skip vertex attribute instruction with zero usage_mask
In emit_input_declarations(), we are skipping declarations for those
registers that are not being used. But in emit_vertex_attrib_instructions(),
we are still emitting instructions to tweak the vertex attributes even if
they are not being used. This causes an assert in the backend because an
input register is not declared in the shader. This patch fixes the problem
by skipping the instruction if the vertex attribute is not being used.
Changes in this patch is originated from the code snippet from Jose as
suggested in bug 1530161.

Tested with piglit, Heaven, Turbine, glretrace.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 16:01:38 -07:00
Brian Paul
b59fad8478 st/mesa: minor clean-ups in st_atom.c
Remove useless comment.  Reformat code.
2016-01-06 15:53:47 -07:00
Brian Paul
85444ab08b st/mesa: replace bitmap size checks with assertion
The _mesa_Bitmap() caller already checks for zero-sized bitmaps.
2016-01-06 15:53:47 -07:00
Brian Paul
18038b9fd6 st/mesa: check texture target in allocate_full_mipmap()
Some kinds of textures never have mipmaps.  3D textures seldom have
mipmaps.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:47 -07:00
Brian Paul
c032ae85ee st/mesa: move mipmap allocation check logic into a function
Better readability and easier to extend.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
0d39b5fc3b main: s/GLuint/GLbitfield for state bitmasks
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
c81ddc2092 vbo: s/GLuint/GLbitfield/ for state bitmasks
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
3c0521cd0f st/mesa: use GLbitfield in st_state_flags, add comments
Use GLbitfield instead of GLuint to be consistent with other variables.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
4cd1bd46ed s/GLuint/GLbitfield/ for st_invalidate_state() parameter
To match dd_function_table::UpdateState().

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
2cc52801c0 st/mesa: be more careful about state validation in st_Bitmap()
If the only dirty state is mesa's _NEW_PROGRAM_CONSTANTS flag, we can
skip state validation before drawing a bitmap since that state doesn't
effect bitmap rendering.

This further increases the performance of the ipers demo on llvmpipe
to about what it was before commit 36c93a6fae.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
b6bcf08641 st/mesa: move bitmap cache flushing out of state validation
Just do it where needed (before drawing, clearing, etc).

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
c28d72a347 st/mesa: check state->mesa in early return check in st_validate_state()
We were checking the dirty->st flags but not the dirty->mesa flags.
When we took the early return, we didn't clear the dirty->mesa flags
so the next time we called st_validate_state() we'd often flush the
glBitmap cache.  And since st_validate_state() is called from
st_Bitmap(), it meant we flushed the bitmap cache for every glBitmap()
call.

This change seems to recover most of the performance loss observed
with the ipers demo on llvmpipe since commit commit 36c93a6fae.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2016-01-06 15:53:46 -07:00
Brian Paul
c75d00e054 st/mesa: protect debug printf() with a conditional instead of comment 2016-01-06 15:53:46 -07:00
Brian Paul
72d6bbca5b st/mesa: fix comment indentation in st_flush_bitmap_cache() 2016-01-06 15:53:46 -07:00
Timothy Arceri
e58be8ac0e glsl: fix varying slot allocation for blocks and structs with explicit locations
Previously each member was being counted as using a single slot,
count_attribute_slots() fixes the count for array and struct members.

Also don't assign a negitive to the unsigned expl_location variable.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-07 09:44:32 +11:00
Timothy Arceri
47dde2bd45 glsl: don't try adding built-ins to explicit locations bitmask
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:26 +11:00
Timothy Arceri
ac6e2c2056 glsl: fix overlapping of varying locations for arrays and structs
Previously we were only reserving a single location for arrays and
structs.

We also didn't take into account implicit locations clashing with
explicit locations when assigning locations for their arrays or
structs.

This patch fixes both issues.

V5: fix regression for patch inputs/outputs in tessellation shaders
V4: just use count_attribute_slots() to get the number of slots,
also calculate the correct number of slots to reserve for gs and
tess stages by making use of the new get_varying_type() helper.
V3: handle arrays of structs
V2: also fix for arrays of arrays and structs.

Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:20 +11:00
Timothy Arceri
5907a02ab6 glsl: create helper to remove outer vertex index array used by some stages
This will be used in the following patch for calculating array sizes correctly
when reserving explicit varying locations.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:16 +11:00
Timothy Arceri
30991d7389 glsl: remove unused varyings before packing them
Previously we would pack varyings before trying to remove them, this
relied on the packing pass not packing varyings with a location of -1
to avoid packing varyings that should be removed.
However this meant unused varyings with an explicit location would be
packed before they could be removed when we enable packing of them in a
later patch.

V2: fix regression in V1 removing unused varyings in multi-stage SSO,
fix regression with single stage programs.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-07 09:06:12 +11:00
Krzysztof Sobiecki
0d7477a289 gallium/r600: Replace ALIGN_DIVUP with DIV_ROUND_UP
ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality.
Replacing it with DIV_ROUND_UP eliminates this problems.

Signed-off-by: Krzysztof A. Sobiecki <sobkas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-06 16:09:12 -05:00
Eric Anholt
bbd29f1375 vc4: Fix driver build from last minute rebase fix.
I had the driver all tested for the last series, and in my last build I
noticed that get_swizzled_channel was unused now, and removed
it... apparently without testing to find that I removed the wrong channel
swizzle function.
2016-01-06 12:49:45 -08:00
Eric Anholt
25aa436e86 vc4: Optimize out a comparison for bcsel based on an ALU comparison
We routinely have code like:

	vec1 ssa_220 = fge ssa_104, ssa_61
	vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105

and we would compare fge's args and choose between ~0 and 0 to generate
ssa_220, then compare ssa_220 to 0 and choose between bcsel's args.
Instead, try to notice the pattern and compare between fge's args to
select between bcsel's args.

total instructions in shared programs: 88019 -> 87574 (-0.51%)
instructions in affected programs:     9985 -> 9540 (-4.46%)
total estimated cycles in shared programs: 245752 -> 245237 (-0.21%)
estimated cycles in affected programs:     17232 -> 16717 (-2.99%)
2016-01-06 12:43:09 -08:00
Eric Anholt
7a9eb76786 vc4: Add missing sRGB decode to texel fetches.
We only see txf on MSAA textures, currently, and apparently this didn't
impact any of our piglit tests.
2016-01-06 12:43:09 -08:00
Eric Anholt
f01ca9eeda vc4: Add support for GL_ARB_texture_swizzle.
We already had the code supporting it, since it's needed for the depth
mode when doing shadow comparisons.
2016-01-06 12:43:09 -08:00
Eric Anholt
12519a972f vc4: Use NIR texture lowering for texture swizzling.
We can't use its other features currently (mostly because we don't want
Newton-Raphson on rcps for texture coordinates), but it gets us started.

This eliminates some comparisons with constants in GLB2.7 and ETQW traces
at the QIR level by moving the comparisons into NIR, where they get
constant-folded out.

instructions in affected programs:     165 -> 156 (-5.45%)
total uniforms in shared programs: 32087 -> 32085 (-0.01%)
total estimated cycles in shared programs: 245762 -> 245752 (-0.00%)
estimated cycles in affected programs:     461 -> 451 (-2.17%)
2016-01-06 12:43:08 -08:00
Eric Anholt
71db7d3dc5 vc4: Replace the SSA-style SEL operators with conditional MOVs.
I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers.  By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.

total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs:     39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs:     162887 -> 162343 (-0.33%)
2016-01-06 12:39:51 -08:00
Eric Anholt
0a89f307f9 vc4: Don't try the SF coalescing unless it's on a def.
If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.
2016-01-06 12:39:27 -08:00
Chad Versace
8284786c5d anv/gen9: Teach gen9_image_view_init() about 1D surface qpitch
QPitch is usually expressed as rows of surface elements (where a surface
element is an compression block or a single surface sample. Skylake 1D
is an outlier; there QPitch is expressed as individual surface
elements.
2016-01-06 09:38:57 -08:00
Chad Versace
e05b307942 isl: Add isl_surf_get_array_pitch_el()
Will be needed to program SurfaceQPitch for Skylake 1D arrays.
2016-01-06 09:38:57 -08:00
Chad Versace
c1e890541e isl/gen9: Support ISL_DIM_LAYOUT_GEN9_1D 2016-01-06 09:38:57 -08:00
Chad Versace
eea2d4d059 isl: Don't align phys_slice0_sa.width twice
It's already aligned to the format's block width. Don't align it again
in isl_calc_row_pitch().
2016-01-06 09:38:57 -08:00
Chad Versace
39d043f94a isl: Fix the documented units of isl_surf::row_pitch
It's the pitch between surface elements, not between surface samples.
2016-01-06 09:38:57 -08:00
Chad Versace
dcb9c11dc7 anv/gen9: Fix oob lookup of surface halign, valign
For 1D surfaces and for surfaces with Yf or Ys tiling, the hardware
ignores SurfaceVerticalAlignment and SurfaceHorizontalAlignment.
Moreover, the anv_halign[] and anv_valign[] lookup tables may not even
contain the surface's actual alignment values. So don't do the lookup
for those surfaces.
2016-01-06 09:38:57 -08:00
Chad Versace
94566d9b68 anv/meta: Teach meta how to blit from a 1D image
Meta needed a VkShader with a 1D sampler type.
2016-01-06 09:38:57 -08:00
Edward O'Callaghan
1953cee6d7 gallium/drivers/svga: Use unsigned for loop index
Fix a 's/unsigned int/unsigned/' consistency case while here.

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
8e2a8ec731 gallium/drivers/r600: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
76a7d6f412 gallium/drivers/ilo: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
5071c192cc gallium: Use unsigned for loop index
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
bfabd5e74a gallium/drivers: Remove unnecessary semicolons
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Edward O'Callaghan
67d4b4b28c gallium: Remove unnecessary semicolons
Fix silly issue with MSVC case fall-though support to need
a extra 'break;'

Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-01-06 08:04:03 -07:00
Oded Gabbay
9d59b9d00c llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8
This patch converts the SSE-optimized lp_rast_triangle_32_3_16()
to VMX/VSX.

I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.

                      FPS/Score
 Name            Before     After    Delta
------------------------------------------------
openarena        16.35      16.7     2.14%
xonotic          4.707      4.97     5.57%

glmark2 didn't show a significant (more than 1%) difference.

v2: Make sure code is build only on POWER8 LE machine

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-06 14:54:16 +02:00