Commit graph

98447 commits

Author SHA1 Message Date
Jordan Justen
7cf1037d5a main, glsl: Add UniformDataDefaults which stores uniform defaults
The ARB_get_program_binary extension requires that uniform values in a
program be restored to their initial value just after linking.

This patch saves off the initial values just after linking. When the
program is restored by glProgramBinary, we can use this to copy the
initial value of uniforms into UniformDataSlots.

V2 (Timothy Arceri):
 - Store UniformDataDefaults only when serializing GLSL as this
   is what we want for both disk cache and ARB_get_program_binary.
   This saves us having to come back later and reset the Uniforms
   on program binary restores.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-12-08 16:44:35 +11:00
Jordan Justen
ebd9e789c4 glsl: Split out shader program serialization
This will allow us to use the program serialization to implement
ARB_get_program_binary.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-12-08 16:44:35 +11:00
Jordan Justen
219628c118 include: Add GL_MESA_program_binary_formats to GL/GLES2 ext.h files
Thus was merged into the OpenGL Registry in version
667c5a253781834b40a6ae9eb19d05af4542cfe1.

Ref: https://github.com/KhronosGroup/OpenGL-Registry/pull/127
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-08 16:44:35 +11:00
Jordan Justen
0c48487893 mesa: add GL_PROGRAM_BINARY_FORMAT_MESA enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-12-08 16:44:35 +11:00
Francisco Jerez
4d1959e693 intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.
This addresses a long-standing back-end compiler bug that could lead
to cross-channel data corruption in loops executed non-uniformly.  In
some cases live variables extending through a loop divergence point
(e.g. a non-uniform break) into a convergence point (e.g. the end of
the loop) wouldn't be considered live along all physical control flow
paths the SIMD thread could possibly have taken in between due to some
channels remaining in the loop for additional iterations.

This patch fixes the problem by extending the CFG with physical edges
that don't exist in the idealized non-vectorized program, but
represent valid control flow paths the SIMD EU may take due to the
divergence of logical threads.  This makes sense because the i965 IR
is explicitly SIMD, and it's not uncommon for instructions to have an
influence on neighboring channels (e.g. a force_writemask_all header
setup), so the behavior of the SIMD thread as a whole needs to be
considered.

No changes in shader-db.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-12-07 18:27:05 -08:00
Francisco Jerez
9355116bda intel/fs: Don't let undefined values prevent copy propagation.
This makes the dataflow propagation logic of the copy propagation pass
more intelligent in cases where the destination of a copy is known to
be undefined for some incoming CFG edges, building upon the
definedness information provided by the last patch.  Helps a few
programs, and avoids a handful shader-db regressions from the next
patch.

shader-db results on ILK:

  total instructions in shared programs: 6541547 -> 6541523 (-0.00%)
  instructions in affected programs: 360 -> 336 (-6.67%)
  helped: 8
  HURT: 0

  LOST:   0
  GAINED: 10

shader-db results on BDW:

  total instructions in shared programs: 8174323 -> 8173882 (-0.01%)
  instructions in affected programs: 7730 -> 7289 (-5.71%)
  helped: 5
  HURT: 2

  LOST:   0
  GAINED: 4

shader-db results on SKL:

  total instructions in shared programs: 8185669 -> 8184598 (-0.01%)
  instructions in affected programs: 10364 -> 9293 (-10.33%)
  helped: 5
  HURT: 2

  LOST:   0
  GAINED: 2

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-07 18:27:04 -08:00
Francisco Jerez
c3c1aa5aeb intel/fs: Restrict live intervals to the subset possibly reachable from any definition.
Currently the liveness analysis pass would extend a live interval up
to the top of the program when no unconditional and complete
definition of the variable is found that dominates all of its uses.

This can lead to a serious performance problem in shaders containing
many partial writes, like scalar arithmetic, FP64 and soon FP16
operations.  The number of oversize live intervals in such workloads
can cause the compilation time of the shader to explode because of the
worse than quadratic behavior of the register allocator and scheduler
when running out of registers, and it can also cause the running time
of the shader to explode due to the amount of spilling it leads to,
which is orders of magnitude slower than GRF memory.

This patch fixes it by computing the intersection of our current live
intervals with the subset of the program that can possibly be reached
from any definition of the variable.  Extending the storage allocation
of the variable beyond that is pretty useless because its value is
guaranteed to be undefined at a point that cannot be reached from any
definition.

According to Jason, this improves performance of the subgroup Vulkan
CTS tests significantly (e.g. the runtime of the dvec4 broadcast test
improves by nearly 50x).

No significant change in the running time of shader-db (with 5%
statistical significance).

shader-db results on IVB:

  total cycles in shared programs: 61108780 -> 60932856 (-0.29%)
  cycles in affected programs: 16335482 -> 16159558 (-1.08%)
  helped: 5121
  HURT: 4347

  total spills in shared programs: 1309 -> 1288 (-1.60%)
  spills in affected programs: 249 -> 228 (-8.43%)
  helped: 3
  HURT: 0

  total fills in shared programs: 1652 -> 1597 (-3.33%)
  fills in affected programs: 262 -> 207 (-20.99%)
  helped: 4
  HURT: 0

  LOST:   2
  GAINED: 209

shader-db results on BDW:

  total cycles in shared programs: 67617262 -> 67361220 (-0.38%)
  cycles in affected programs: 23397142 -> 23141100 (-1.09%)
  helped: 8045
  HURT: 6488

  total spills in shared programs: 1456 -> 1252 (-14.01%)
  spills in affected programs: 465 -> 261 (-43.87%)
  helped: 3
  HURT: 0

  total fills in shared programs: 1720 -> 1465 (-14.83%)
  fills in affected programs: 471 -> 216 (-54.14%)
  helped: 4
  HURT: 0

  LOST:   2
  GAINED: 162

shader-db results on SKL:

  total cycles in shared programs: 65436248 -> 65245186 (-0.29%)
  cycles in affected programs: 22560936 -> 22369874 (-0.85%)
  helped: 8457
  HURT: 6247

  total spills in shared programs: 437 -> 437 (0.00%)
  spills in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total fills in shared programs: 870 -> 854 (-1.84%)
  fills in affected programs: 16 -> 0
  helped: 1
  HURT: 0

  LOST:   0
  GAINED: 107

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-07 18:27:04 -08:00
Francisco Jerez
acf98ff933 intel/fs: Teach instruction scheduler about GRF bank conflict cycles.
This should allow the post-RA scheduler to do a slightly better job at
hiding latency in presence of instructions incurring bank conflicts.
The main purpuse of this patch is not to improve performance though,
but to get conflict cycles to show up in shader-db statistics in order
to make sure that regressions in the bank conflict mitigation pass
don't go unnoticed.

Acked-by: Matt Turner <mattst88@gmail.com>
2017-12-07 15:56:49 -08:00
Francisco Jerez
af2c320190 intel/fs: Implement GRF bank conflict mitigation pass.
Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput.  This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation.  It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.

In a shader-db run on SKL this helps roughly 46k shaders:

   total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
   conflicts in affected programs: 816222 -> 407702 (-50.05%)
   helped: 46234
   HURT: 72

The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.

On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies.  E.g. for a shader-db run on SNB:

   total conflicts in shared programs: 944636 -> 623185 (-34.03%)
   conflicts in affected programs: 853258 -> 531807 (-37.67%)
   helped: 31052
   HURT: 19

And on BDW:

   total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
   conflicts in affected programs: 1179787 -> 748933 (-36.52%)
   helped: 47592
   HURT: 70

On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.

NOTE: This patch intentionally disregards some i965 coding conventions
      for the sake of reviewability.  This is addressed by the next
      squash patch which introduces an amount of (for the most part
      boring) boilerplate that might distract reviewers from the
      non-trivial algorithmic details of the pass.

The following patch is squashed in:

SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.

Acked-by: Matt Turner <mattst88@gmail.com>
2017-12-07 15:56:06 -08:00
Dylan Baker
c34b53f133 meson: Fix building gallium media targets with gallium-xlib glx
To demonstrate this bug run meson with the options:
-Ddri-drivers= -Dglx=gallium-xlib

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-12-07 10:22:27 -08:00
Dylan Baker
2adc3817c6 meson: Add lmsensors to gallium libgl-xlib target.
Fixes: 5e71efef44 ("meson: Add lmsensors support")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-12-07 10:20:58 -08:00
Eric Engestrom
4cba39331d meson: add dep_thread to every lib that includes threads.h
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-12-07 17:29:42 +00:00
Eric Engestrom
f0337f0f70 meson: fix pl111 dependency on vc4
src/gallium/winsys/pl111/drm/libpl111winsys.a(pl111_drm_winsys.c.o): In function `pl111_drm_screen_create':
pl111_drm_winsys.c:(.text+0x33): undefined reference to `vc4_drm_screen_create_renderonly'

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-12-07 17:21:03 +00:00
Samuel Pitoiset
5f81a43535 radv: use a faster version for nir_op_pack_half_2x16
This patch is ported from RadeonSI and it has two effects.

It fixes a rendering issue which affects F1 2017 and Dawn
of War 3 (Vega only) because LLVM was ending up by generating
the new v_mad_mix_{hi,lo} instructions which appear to be
buggy in some way. Not sure if Mesa is generating something
wrong or if the issue is in LLVM only. Anyway, that explains why
the DOW3 issue can't be reproduced with GL on Vega.

It also improves performance because v_cvt_pkrtz_f16 is faster,
and because I guess the rounding mode behaviour is similar between
GL and VK, we can use it. About performance, it improves Talos
by +3/4% but I don't see any other impacts.

No CTS regressions on Polaris.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-12-07 17:21:50 +01:00
Alejandro Piñeiro
25e56b2eba mesa/spirv: move and rename nir_spirv_supported_capabilities
To avoid any vulkan driver to include the GL mtypes.h. Renamed as
eventually this could be used by drivers not using nir.

v2: remove compiler/spirv/spirv.h from mtypes (Alejandro)
v3: added the definition at compiler/shader_info.h (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-07 17:15:11 +01:00
Vadym Shovkoplias
b2490a326c util/disk_cache: Remove unneeded free() on always null string
At this point dc_job->cache_item_metadata.keys always equals
NULL, so call to free() is useless

Fixes: b86ecea344 ("util/disk_cache: write cache item metadata to disk")
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-12-07 11:50:41 +00:00
Samuel Iglesias Gonsálvez
392638d6b5 spirv: fix bug when OpSpecConstantOp calls a conversion
In that case, nir_eval_const_opcode() will evaluate the conversion
but as it was using destination's bit_size, the resulting
value was just a cast of the source constant value. By passing the
source's bit size, it does the conversion properly.

Fixes:

dEQP-VK.spirv_assembly.instruction.*.opspecconstantop.*convert*

v2:
- Remove invalid conversion op cases.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-07 10:19:34 +01:00
Samuel Iglesias Gonsálvez
67ec314347 spirv: allow specialization constants with bitsize different than 32 bits
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-07 10:19:34 +01:00
James Legg
947470d10b nir/opcodes: Fix constant-folding of bitfield_insert
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104119
CC: <mesa-stable@lists.freedesktop.org>
CC: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-12-07 08:59:36 +00:00
Alex Smith
8fda98c4f1 radv: Add LLVM version to the device name string
Allows apps to determine the LLVM version so that they can decide
whether or not to enable workarounds for LLVM issues.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-07 08:58:34 +00:00
Alejandro Piñeiro
be2c434308 mesa: remove set_entry from forward type declarations
This type was used at gl_sync_object, but it is not used anymore.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-12-07 09:01:58 +01:00
Kenneth Graunke
8705ed13e3 meta: Fix ClearTexture with GL_DEPTH_COMPONENT.
We only handled unpacking for GL_DEPTH_STENCIL formats.

Cemu was hitting _mesa_problem() for an unsupported format in
_mesa_unpack_float_32_uint_24_8_depth_stencil_row(), because the
format was depth-only, rather than depth-stencil.

Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94739
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103966
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-12-06 20:35:46 -08:00
Kenneth Graunke
d6d16c0218 meta: Initialize depth/clear values on declaration.
This helps avoid compiler warningss in the next commit - everything
was initialized, but it wasn't obvious to static analysis.

Suggested-by: Tapani Pälli <tapani.palli@intel.com>
2017-12-06 20:30:24 -08:00
Timothy Arceri
9d53ccccb2 glsl: get correct member type when processing xfb ifc arrays
This fixes a crash in:

KHR-GL45.enhanced_layouts.xfb_block_stride

Fixes: 0822517936 "glsl: add helper to process xfb qualifiers during linking"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-12-07 15:22:23 +11:00
Gert Wollny
6c268ea79a r600/sb: do not convert if-blocks that contain indirect array access
If an array is accessed within an if block, then currently it is not known
whether the value in the address register is involved in the evaluation of the
if condition, and converting the if condition may actually result in
out-of-bounds array access. Consequently, if blocks that contain indirect array
access should not be converted.

Fixes piglits on r600/BARTS:
spec/glsl-1.10/execution/variable-indexing/
  vs-output-array-float-index-wr
  vs-output-array-vec3-index-wr
  vs-output-array-vec4-index-wr

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-07 09:48:41 +10:00
Dave Airlie
81683c3d42 r600: add support for compute grid/block sizes. (v2)
We just pass these in from outside in a constant buffer.

The shader side stores them once they are accessed once.

v2: fix to not use a temp_reg.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:21:09 +00:00
Dave Airlie
4525cdb751 r600: handle image/buffer sizes correctly.
This adds support to compute for the resq workarounds (buffer/cube sizes)

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:21:06 +00:00
Dave Airlie
f51458637c r600/compute: add support for emitting compute image/buffer atoms
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:21:02 +00:00
Dave Airlie
a5a50d9c89 r600/compute: handle atomic counters in compute state.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:20:58 +00:00
Dave Airlie
c82934f212 r600/compute: add support for TGSI compute shaders. (v1.1)
This add paths to handle TGSI compute shaders and shader selection.

It also avoids emitting certain things on tgsi paths,
CBs, vertex buffers, config reg init (not required).

v1.1: fix rat mask calc

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:20:53 +00:00
Dave Airlie
08dc205c61 r600/shader: add compute support to shader assembler
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:20:50 +00:00
Dave Airlie
7b8e1c089d r600/texture: drop lowering 1d/2d images to linear.
This appears to cause hangs with compute images. Unless
we can find more specifics, just don't do this for now.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-06 23:20:20 +00:00
Alejandro Piñeiro
0398b31d1d mesa: define nir_spirv_supported_capabilities
Until now it was part of spirv_to_nir_options. But it will be used on
the implementation of ARB_gl_spirv and ARB_spirv_extensions, and added
to the OpenGL context, as a way to save what SPIR-V capabilities the
current OpenGL implementation supports.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2017-12-06 22:25:52 +01:00
Fredrik Höglund
5e1cb16768 anv: fix a case statement in GetMemoryFdPropertiesKHR
The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.

Fixes: ab18e8e59b ("anv: Implement VK_EXT_external_memory_dma_buf")
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 20:04:39 +01:00
Fredrik Höglund
b055045378 radv: fix a case statement in GetMemoryFdPropertiesKHR
The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.

Fixes: 546e747867 ("radv: Implement VK_EXT_external_memory_dma_buf")
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-12-06 20:04:39 +01:00
Eric Engestrom
31d403160f meson: fix keyword argument in declare_dependency()
`declare_dependency()` takes `compile_args`, not `c_args`.
It was correct in all the other `declare_dependency()` from that commit.

Fixes: 0bbecc5a85 "meson: define driver dependencies"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-12-06 18:31:33 +00:00
Emil Velikov
526945f7dc i965: include brw_pipe_control.h in the tarball
Fixes: bfe0f3a702 ("i965: Move PIPE_CONTROL defines and prototypes to
brw_pipe_control.h.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-12-06 17:33:57 +00:00
Emil Velikov
e964e01fdd mesa: document _mesa_extension_override_* variables
Currently there are no users of these outside of extensions.c.
Provide some information why they exist and how to use them.

Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
2017-12-06 17:31:53 +00:00
Emil Velikov
d7ba4f41f9 docs: annotate MESA_program_debug as obsolete
It has been obsolete for years - state it explicitly.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-06 17:31:53 +00:00
George Kyriazis
bc75adcb1e swr/scons: Fix another intermittent build failure
gen_BackendPixelRate*.cpp depends on gen_ar_eventhandler.hpp.
Fix missing dependency.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-12-06 11:04:02 -06:00
Marek Olšák
4038db72d4 radeonsi: make const and stream uploaders allocate read-only memory
and anything that clones these uploaders, like u_threaded_context.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-06 15:19:02 +01:00
Marek Olšák
7a6643fb4c radeonsi: use a separate allocator for fine fences
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-06 15:19:02 +01:00
Marek Olšák
3e1287caef radeonsi/gfx9: make shader binaries use read-only memory
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-06 15:19:02 +01:00
Marek Olšák
fef51ebcea winsys/amdgpu: make IBs use read-only memory
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-06 15:19:02 +01:00
Marek Olšák
ba59064409 radeonsi: print the buffer list for CHECK_VM
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-06 15:19:02 +01:00
Marek Olšák
010214b403 radeonsi: allow DMABUF exports for local buffers
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-12-06 15:19:02 +01:00
Nicolai Hähnle
20ccb51ffc radeonsi: always place sparse buffers in VRAM
Together with "radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check",
this ensures that sparse buffers are placed in VRAM.

Noticed by an assertion that started triggering with commit d4fac1e1d7
("gallium/radeon: enable suballocations for VRAM with no CPU access")

Fixes KHR-GL45.sparse_buffer_tests.BufferStorageTest in debug builds.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-06 11:19:00 +01:00
Nicolai Hähnle
5e2962c949 radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check
The flag is on the pipe_resource, not the r600_resource.

I don't see an obvious bug related to this, but it could potentially lead
to suboptimal placement of some resources.

Fixes: a41587433c ("gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-12-06 11:18:14 +01:00
Jose Maria Casanova Crespo
a1e257a5bf i965/fs: Use untyped_surface_read for 16-bit load_ssbo
SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.

But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_surface_read message to read pairs of 16-bit components using only
one message. Once each pair is read it is unshuffled to return the proper
16-bit components. vec3 case is assimilated to vec4 but the 4th component
is ignored.

16-bit scalars are read using one byte_scattered_read message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
    Rework optimization using unshuffle 16 reads (Chema Casanova)
v3: Use W and D types insead of HF and F in shuffle to avoid rounding
    erros (Jason Ekstrand)
    Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)
v4: Use subscript insead of chaging type and stride  (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Jose Maria Casanova Crespo
ce2e572c4c i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg
Currently, we use byte-scattered write messages for storing 16-bit
into an SSBO. This is because untyped surface messages have a fixed
32-bit size.

This patch optimizes these 16-bit writes by combining 2 values (e.g,
two consecutive components aligned with 32-bits) into a 32-bit register,
packing the two 16-bit words.

16-bit single component values will continue to use byte-scattered
write messages. The same will happens when the first consecutive
component is not aligned 32-bits.

This optimization reduces the number of SEND messages used for storing
16-bit values potentially by 2 or 4, which cuts down execution time
significantly because byte-scattered writes are an expensive
operation as they only write a component for message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
    Rework optimization using shuffle 16 write and enable writes
    of 16bit vec4 with only one message of 32-bits. (Chema Casanova)
v3: - Fix coding style (Eduardo Lima)
    - Reorganize code to avoid duplication. (Jason Ekstrand)
    - Include new comments to explain the length calculations to
      fix alignment issues of components. (Jason Ekstrand)
    - Fix issues with writemask yz with 16-bit writes. (Jason Ektrand)
v4: (Jason Ekstrand)
    - Reorganize 64-bit ssbo-writes to avoid using slots_per_component.
    - Comment about why suffle is needed when using byte_scattered_write.

Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00