Find a file
Francisco Jerez 37b647b979 i965: Don't tell the hardware about our UAV access.
The hardware documentation relating to the UAV HW-assisted coherency
mechanism and UAV access enable bits is scarce and sometimes
contradictory, and there's quite some guesswork behind this commit, so
let me summarize the background first: HSW and later hardware have
infrastructure to support a stricter form of data coherency between
shader invocations from separate primitives.  The mechanism is
controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and
_PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on
the 3DPRIMITIVE command.

Regardless of whether "UAV Coherency Required" is set, the hardware
fixed-function units will increment a per-stage semaphore for each
request received if "Accesses UAV" is set for the same or any lower
stage.  An implicit DC flush is emitted by the lowermost stage with
"Accesses UAV" set once it's done processing the request, this also
happens regardless of the value of "UAV Coherency Required".  The
completion of the DC flush will cause the same stage and all previous
ones to decrement the semaphore, marking the UAV accesses for the
primitive as coherent with L3.

The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline
stall before any threads are dispatched for the first FF stage with
"Accesses UAV" set until the semaphore is cleared for the same stage.
Effectively this guarantees that UAV memory accesses performed by
previous primitives from any stage will be strictly ordered (and
thanks to the implicit DC flush visible in memory) with UAV accesses
from the following primitives.

None of this is required by the usual image, atomic counter and SSBO
GL APIs which have very relaxed cross-primitive coherency and ordering
requirements, so we don't actually ever set the "UAV Coherency
Required" bit -- Ordering with respect to shader invocations from
previous stages on the same primitive where there is a data dependency
is of course already guaranteed as the spec requires, regardless of
this mechanism being enabled.  We do set the "Accesses UAV" bits
though since my commit ac7664e493 (which
this patch partially reverts), mainly because of comments like the
following from the BDW PRM:

> 3DSTATE_GS
>[...]
> 12 Accesses UAV
>    Format: Enable
>    This field must be set when GS has a UAV access.

There are similar comments in the documentation for the other
3DSTATE_*S commands.  The "must" part is misleading and unjustified
AFAIK.  Most of the "Accesses UAV" bits don't seem to have any side
effects other than the implicit DC flushes and the related
book-keeping in anticipation for a subsequent primitive with "UAV
Coherency Required" set, so in most cases they are unnecessary and may
incur a performance penalty.  There is an exception though.  On Gen8+
the PS_EXTRA UAV access bit influences the calculation of the PS
UAV-only and ThreadDispatchEnable signals which on previous
generations were set explicitly by the driver, so we cannot always
avoid enabling it on the PS stage.

The primary motivation for this change is that in fact the hardware
coherency mechanism is buggy and will cause a rather non-deterministic
hang on Gen8 when VS is the only stage with "Accesses UAV" set and the
processing of a request terminates immediately after the implicit DC
flush is sent for a previous primitive with no additional vertices
being emitted for the second primitive, what will cause the hardware
to skip sending a second DC flush and cause the VS to stall
indefinitely waiting for a response from the DC (BDWGFX HSD 1912017).
This hardware bug can be reproduced on current master with the
spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit
subtest (if you have the patience to run it a few dozen times).

The proposed workaround is to insert CS STALLs speculatively between
3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage
only.  Because this would affect one of the hottest paths in the
driver and likely decrease performance even further due to the
unnecessary serialization, and because we don't actually need the
implicit DC flushes, it seems better to just disable them.

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5346c11670)
2015-10-21 14:23:20 +01:00
bin bugzilla_mesa.sh: sort the bugs list by number 2015-07-13 20:02:09 +01:00
docs docs: add sha256 checksums for 11.0.3 2015-10-10 17:02:46 +01:00
doxygen doxygen: Remove doxygen_sqlite3.db with 'make clean' 2015-07-11 20:48:25 +01:00
include GL: update glext to svn 31811 2015-08-20 18:42:03 +10:00
m4 configure.ac: move AC_MSG_RESULT reporting back into the m4 macro 2015-03-24 20:49:32 +00:00
scons scons: Always define __STDC_LIMIT_MACROS. 2015-08-15 01:44:33 -07:00
src i965: Don't tell the hardware about our UAV access. 2015-10-21 14:23:20 +01:00
.dir-locals.el dir-locals.el: Don't set variables for non-programming modes 2015-02-02 12:02:55 +00:00
.gitattributes Disable autocrlf for Visual Studio project files. 2008-02-28 12:34:01 +09:00
.gitignore mesa: add .mesa-install-links files to gitignore 2015-04-17 15:24:14 -04:00
Android.common.mk android: Always define __STDC_LIMIT_MACROS. 2015-09-11 19:19:31 +01:00
Android.mk egl: android: remove DRM_GRALLOC_TOP hack 2015-07-22 16:35:27 +01:00
autogen.sh autogen.sh: pass --force to autoreconf, quote ORIGDIR 2015-03-11 23:28:26 +00:00
CleanSpec.mk android: Depend on gallium_dri from EGL, instead of linking in gallium. 2015-06-09 11:38:45 -07:00
common.py common.py: Fix PEP 8 issues. 2015-03-16 22:55:08 -07:00
configure.ac configure.ac: Add support to enable read-only text segment on x86. 2015-09-23 21:07:03 +01:00
install-gallium-links.mk targets/radeonsi/vdpau: convert to static/shared pipe-drivers 2014-06-22 23:06:01 +01:00
install-lib-links.mk install-lib-links: remove the .install-lib-links file 2015-02-24 15:33:25 +00:00
Makefile.am automake: build all drivers but vc4 during distcheck 2015-08-22 11:23:58 +01:00
SConstruct scons: Don't use bundled C99 headers for VS 2013. 2014-05-02 22:04:46 +01:00
VERSION Update version to 11.0.3 2015-10-10 16:17:51 +01:00

File: docs/README.WIN32

Last updated: 21 June 2013


Quick Start
----- -----

Windows drivers are build with SCons.  Makefiles or Visual Studio projects are
no longer shipped or supported.

Run

  scons libgl-gdi

to build gallium based GDI driver.

This will work both with MSVS or Mingw.


Windows Drivers
------- -------

At this time, only the gallium GDI driver is known to work.

Source code also exists in the tree for other drivers in
src/mesa/drivers/windows, but the status of this code is unknown.

Recipe
------

Building on windows requires several open-source packages. These are
steps that work as of this writing.

- install python 2.7
- install scons (latest)
- install mingw, flex, and bison
- install pywin32 from here: http://www.lfd.uci.edu/~gohlke/pythonlibs
  get pywin32-218.4.win-amd64-py2.7.exe
- install git
- download mesa from git
  see http://www.mesa3d.org/repository.html
- run scons

General
-------

After building, you can copy the above DLL files to a place in your
PATH such as $SystemRoot/SYSTEM32.  If you don't like putting things
in a system directory, place them in the same directory as the
executable(s).  Be careful about accidentially overwriting files of
the same name in the SYSTEM32 directory.

The DLL files are built so that the external entry points use the
stdcall calling convention.

Static LIB files are not built.  The LIB files that are built with are
the linker import files associated with the DLL files.

The si-glu sources are used to build the GLU libs.  This was done
mainly to get the better tessellator code.

If you have a Windows-related build problem or question, please post
to the mesa-dev or mesa-users list.