Commit graph

91544 commits

Author SHA1 Message Date
Tim Rowley
b228d2db18 swr: [rasterizer core] Implement SIMD16 GS and STREAMOUT
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
5830a0a6f8 swr: [rasterizer archrast] Add additional API events
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
d2759c1eb3 swr: [rasterizer core/scripts] Autogen backend initialization function(s)
Autogen functions that instantiates different BackendPixelRate templates.
Functions get split into separate files after reaching a user defined
threshold (currently 512 per file) to speed up compilation.

This change will enable the addition of more template flags in the pixel
back end.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
2c820d22cf swr: [rasterizer core] backend.h declares gBackendPixelRateTable
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
50d491e22d swr: [rasterizer core] Finish SIMD16 PA OPT including tesselation
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
9d3442575f swr: [rasterizer core] Finish SIMD16 PA OPT except tesselation
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Tim Rowley
7b94e5e1fa swr: [rasterizer core] Support sparse numa id values on all OSes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-03-20 18:04:53 -05:00
Kenneth Graunke
5e29af5f77 i965: Skip register write detection when possible.
Detecting register write support by trial and error introduces a
stall at screen creation time, which it would be nice to avoid.
Certain command parser versions guarantee this will work (see the
giant comment in intelInitScreen2 below, or a few commits ago):

- Ivybridge: version >= 1 (kernel v3.16)
- Baytrail:  version >= 2 (kernel v3.19)
- Haswell:   version >= 7 (kernel v4.8)

For simplicity, we don't bother with version 1 in this patch.

This assumes that the user hasn't disabled aliasing PPGTT via a kernel
command line parameter.  Don't do that - you're only breaking things.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-20 15:58:05 -07:00
Kenneth Graunke
31693a13f8 i965: Set screen->cmd_parser_version to 0 if we can't write registers.
If we can't write registers, then the effective command parser version
is 0 - it may exist, but it's not usefully enabling anything.

See kernel commit 1ca3712ca3429a617ed6c5f87718e4f6fe4ae0c6 (in v4.8)
where the kernel starts doing this for us.  This makes us do more or
less the same thing on older kernels.

This should preserve a bit of sanity by allowing us to perform a
screen->cmd_parser_version > N check to determine that we really can
use the features promised by command parser version N.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-20 15:58:05 -07:00
Kenneth Graunke
4a2ad6b145 i965: Document the sad story of the kernel command parser.
This should help us figure out the complexities of which kernel
versions we need to get various features on various platforms.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-20 15:58:05 -07:00
Kenneth Graunke
9b324e4dca i965: Fall back to GL 4.2/4.3 on Haswell if the kernel isn't new enough.
In commit d2590eb65f I enabled GL 4.5
on Haswell...but failed to check if we could do indirect compute
shader dispatch...and query buffer objects.

Indirect compute shader dispatch requires command parser version 5
(kernel commit 7b9748cb513a6bef4af87b79f0da3ff7e8b56cd8, which is in
Linux v4.4).  On earlier kernels we would have disabled
ARB_compute_shader, which is a mandatory part of OpenGL 4.3+.

Query buffer objects currently require MI_MATH and MI_LOAD_REGISTER_REG,
which mean command parser version 7 (Linux v4.8).  On earlier kernels
we would have disabled ARB_query_buffer_object, which is a mandatory
part of OpenGL 4.4+.

The new version support looks like:

- Kernel 4.1 and older => OpenGL 3.3
- Kernel 4.2-4.3       => OpenGL 4.2
- Kernel 4.4-4.7       => OpenGL 4.3
- Kernel 4.8+          => OpenGL 4.5

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-03-20 15:58:05 -07:00
Constantine Kharlamov
99d400b78f r600g/sb: Fix memory leak by reworking uses list (rebased)
The author is Heiko Przybyl(CC'ing), the patch is rebased on top of Bartosz Tomczyk's one per Dieter Nützel's comment.
Tested-by: Constantine Charlamov <Hi-Angel@yandex.ru>

v2: Resend the patch again through git-email. The prev. rebase was sent
through Thunderbird, which screwed up tab characters, making the patch
not apply.

--------------
When fixing the stalls on evergreen I introduced leaking of the useinfo
structure(s). Sorry. Instead of allocating a new object to hold 3 values
where only one is actually used, rework the list to just store the node
pointer. Thus no allocating and deallocation is needed. Since use_info
and use_kind aren't used anywhere, drop them and reduce code complexity.
This might also save some small amount of cycles.

Thanks to Bartosz Tomczyk for finding the bug.

Reported-by: Bartosz Tomczyk <bartosz.tomczyk86 at gmail.com <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Signed-off-by: Heiko Przybyl <lil_tux at web.de <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Supersedes: https://patchwork.freedesktop.org/patch/135852
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-03-20 23:23:50 +01:00
Marek Olšák
827ae79b2c radeonsi: check the IR type before waiting for a compute compilation fence
This should fix OpenCL getting stuck.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100288
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-03-20 23:17:14 +01:00
Kenneth Graunke
4084083124 aubinator: Move the guts of decode_group() to decoder.c.
This lets us use it outside of the aubinator binary itself.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
aa1ef0b984 aubinator: Drop spec parameter to decode_group().
No longer necessary - the iterator gets it from the group.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
b2c0c1d9a5 aubinator: Make the iterator store a pointer to structure descriptions.
When the iterator encounters a structure field, it now looks up the
gen_group for that structure definition and saves a pointer to it.

This lets us drop a lot of ridiculous code in the caller, which looked
at item->value (<struct NAME dword>), strtok'd the structure name back
out, and looked it up itself.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
a1aa78cb45 aubinator: Track the current field's starting dword offset.
The iterator code already computed this value, then we stored it in
the structure name, strtok'd it back out, and also manually computed
it when printing dword headers.

Just put the value in the struct and use it.  Way simpler.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
e6f7357cab aubinator: Drop decode_structure() helper.
It made more sense when decode_group() took a bunch of extra options,
but now that there's only one...we may as well pass 0 and call it a day.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
a8d4184b00 aubinator: Drop unused print_dword_headers flag.
I added this flag in 65a9d5eabb but
it was completely unused.  Both callers appear to have printed dword
headers, so we can just drop the flag and continue doing it
unconditionally.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
7f21cb56b8 aubinator: Store a pointer from gen_group back to gen_spec.
When decoding a structure field within a group, we may want to look up
that structure type.  Having a gen_spec pointer makes it easy to do so.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Kenneth Graunke
2c6c760a4b aubinator: Store enum textual name in iter->value.
gen_field_iterator_next() produces a string representing the value of
the field.  For enum values, it also produced a separate "description"
string containing the textual name of the enum.

The only caller of this function combines the two, printing enums as
"<numeric value> (<texture enum name>)".  We may as well just store
that in item->value directly, eliminating the description field, and
a layer of wrapping.

v2: Use non-overlapping source and destination strings in snprintf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-03-20 11:20:51 -07:00
Julien Isorce
a6e2124402 si_descriptor: move velems nullity check before dereference
CID 1399479: Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking velems suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 18:01:51 +00:00
Julien Isorce
521860b2a9 radeon_drm_bo: explicitly check return value of drmCommandWriteRead
CID 1313492

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 18:01:51 +00:00
Julien Isorce
dac124466a si_pipe: remove nullity check after dereference
sscreen cannot be NULL

CID 1354483

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 18:01:41 +00:00
Julien Isorce
ce27b27c38 radeon: initialize hole variable before calling container_of
Like in a few other places in that radeon_drm_bo.c file.

CID 715739.

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 16:47:31 +00:00
Nanley Chery
7c50f9903f intel: Correct the BDW surface state size
The PRMs state that this packet is 16 DWORDS long. Ensure that the last
three DWORDS are zeroed as required by the hardware when allocating a
null surface state.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2017-03-20 09:43:44 -07:00
Bartosz Tomczyk
f4b23589da r600g: Fix out of bounds access
fc_sp variable should indicate number of elements in
fc_stack array, but fc_sp was increased at beginning of fc_pushlevel
function. It leads to situation where idx=0 was never used, and last
32 element was stored outside fs_stack array.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 17:32:53 +01:00
Constantine Kharlamov
f9190f3e65 r600g: update sb documentation
v2: s/r600/r600g in the title

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-03-20 17:11:15 +01:00
Constantine Kharlamov
64cbbd2888 r600g: make condition clearer
The second check in the old code looked pretty much unreachable, esp.
because it's not obvious that "max_entries" could be zero. To find out
that it was intentional I had to run some checks, and to dig into
the old versions of the file.

So, rewrite the check to make the intention clear.

v2: s/r600/r600g in the title, and per Dieter Nützel's comment wrap
lines of condition.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-03-20 17:11:15 +01:00
Emil Velikov
36e029d356 docs: add news item and link release notes for 13.0.6/17.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-20 14:25:18 +00:00
Emil Velikov
54fd78f637 docs: add sha256 checksums for 17.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 9b66351f5b)
2017-03-20 14:20:32 +00:00
Emil Velikov
887ad468b5 docs: add release notes for 17.0.2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 373d88a711)
2017-03-20 14:20:31 +00:00
Emil Velikov
9bad99742f docs: add sha256 checksums for 13.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 879d24c497)
2017-03-20 14:20:26 +00:00
Emil Velikov
0babb9e091 docs: add release notes for 13.0.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit fcef88d13a)
2017-03-20 14:20:25 +00:00
Xu,Randy
57595cb073 anv/genX: Solve the vkCreateGraphicsPipelines crash
The crash is due to NULL pColorBlendState, which is legal if the
pipeline has rasterization disabled or if the subpass of the render pass
the pipeline is created against does not use any color attachments.

Test: Sample subpasses from LunarG can run without crash

Signed-off-by: Xu,Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
2017-03-20 08:31:18 +02:00
Dave Airlie
e70e7cc7ff radv: fix logic for when to flush on multiple CS emission
The current code evaluated to always true, we only want to flush
on the first submit. Rename the variable to do_flush, and only
emit on the first iteration.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-20 14:17:43 +10:00
Jason Ekstrand
fcca6a83cd spirv: Implement IsInf using an integer comparison
Since we already do fabs on the one source, we're guaranteed to get
positive infinity if we get any infinity at all.  Since +inf only has
one IEEE 754 representation, we can use an integer comparison and avoid
all of the ordered/unordered issues.

Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-03-20 14:08:19 +10:00
Dave Airlie
e0208949d1 radv/meta: fix image clears for r4g4 format.
This just uses an 8-bit clear and packs the values.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-20 13:41:31 +10:00
Dave Airlie
10c2b588c4 Revert "radv: fallback to an in-memory cache when no pipline cache is provided"
This reverts commit 2845a108a9.

This break VK-GL-CTS randomly.
./deqp-vk --deqp-case=dEQP-VK.texture.filtering.3d.formats.r4g4b4a4*

bounces around here from 6/6 to 3/6 or 4/6 to hanging.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-20 13:41:31 +10:00
Timothy Arceri
72fa447d45 mesa: disable glthread when glNewList() is called
glNewList() swaps dispatch tables, and we don't have anything in
place to handle that in glthread.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-03-20 10:22:20 +11:00
Dave Airlie
d06e168b87 radv: fix primitive reset index emission
This was meant to be checking the index type to get the correct
index not the last emitted one. This fixes:
dEQP-VK.pipeline.input_assembly.primitive_restart.index_type_uint32.triangle_strip_with_adjacency

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-20 08:47:03 +10:00
Grazvydas Ignotas
274aaa331c util/disk_cache: check rename result
I haven't seen this causing problems in practice, but for correctness
we should also check if rename succeeded to avoid breaking accounting
and leaving a .tmp file behind.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-20 08:24:46 +11:00
Grazvydas Ignotas
67911fa4b8 util/disk_cache: delete .tmp if target exists
At the time of target file check, .tmp file is already created and file
lock is held, so we should remove the .tmp, like in other error paths.

With this, piglit no longer leaves large amount of empty .tmp files
behind, which waste directory entries and may interfere with eviction.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-20 08:24:38 +11:00
Grazvydas Ignotas
bd93cea691 util/disk_cache: fix stored_keys index
It seems there is a bug because:
- 20 bytes are compared, but only 1 byte stored_keys step is used
- entries can overlap each other by 19 bytes
- index_mmap is ~1.3M in size, but only first 64K is used

With this fix for Deus Ex:
- startup time (from launch to Feral logo): ~38s -> ~16s
- disk_cache_has_key() hit rate: ~50% -> ~96%

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-20 08:14:31 +11:00
Ilia Mirkin
663e7c25f5 nv30: create uploader after pipe->screen is set
Fixes crashes after recent upload rework.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-19 01:24:06 -04:00
Ilia Mirkin
0e9232dbcc nv50,nvc0: enable TEX_LZ and TXF_LZ
There should be minimal gain, if any, for nvc0, but nv50 may end up
noticing more often that the lod argument is uniform. This, in turn,
will remove the need for some unnecessary transformations, which were
being hit due to the checks being done pre-ssa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-03-18 20:37:52 -04:00
Ilia Mirkin
dab88e9af7 st/mesa: set result writemask based on ir type
This prevents textureQueryLevels, which maps as LODQ, from ending up
with a xyzw writemask, which is illegal.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100061
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-03-18 20:16:45 -04:00
Karol Herbst
09f16de7e6 nvc0/ir: treat FMA like MAD for operand propagation
Helps mainly Feral-ported games, due to their use of fma()

shader-db changes:
total instructions in shared programs : 3901147 -> 3842505 (-1.50%)
total gprs used in shared programs    : 471258 -> 467359 (-0.83%)
total local used in shared programs   : 27405 -> 27361 (-0.16%)
total bytes used in shared programs   : 35749888 -> 35214176 (-1.50%)

                local        gpr       inst      bytes
    helped          17        1829        4091        4091
      hurt           4          44           3           3

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-03-18 20:15:45 -04:00
Alan Swanson
a7eb7984bf util/disk_cache: pass predicate functions file stats directly (v4)
Since switching to LRU eviction the only user of these predicate
functions now resolves directory entry stats itself so pass them
directly saving calling fstat and strlen twice (and the
expensive strlen is skipped entirely if access time is newer).

v2: Update for empty cache dir detection changes
v3: Fix passing string length to predicate with the +1 for NULL
    termination and also pass sb as pointer
v4: Missed ampersand for passing sb as pointer

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2017-03-18 14:32:57 +11:00
Timothy Arceri
bf8bc6190e glsl: use set for copy propagation kills
Previously each time we saw a variable we just created a duplicate
entry in the list. This is particularly bad for loops were we add
everything twice, and then throw nested loops into the mix and the
list was growing expoentially.

This stops the glsl-vs-unroll-explosion test which has 16 nested
loops from reaching the tests mem usage limit in this pass. The
test now hits the mem limit in opt_copy_propagation_elements()
instead.

I suspect this was also part of the reason this pass can be so
slow with some shaders.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2017-03-18 14:21:09 +11:00