Commit graph

108832 commits

Author SHA1 Message Date
Brian Paul
a2689ebcd6 nir: no-op C99 _Pragma() with MSVC
This fixes a build failure on MSVC.

BTW, it looks like clang supports _Pragma() but I don't know if it
understands the "gcc unroll N" directive.

Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-11-23 10:34:24 -07:00
Michel Zou
02d63ee5a4 disk_cache_get_function_timestamp: check for dladdr
instead of dlopen

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-11-23 12:01:11 +01:00
Marek Olšák
ad40715f35 nir/serialize: support any num_components for remaining instructions
Only NPOT vectors greater than vec4 use the extra uint32.

This is for instructions that share the dest code.
load_const and undef already support 1-16 in the header.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
c028449c01 nir/serialize: use 3 unused bits in intrinsic for packed_const_indices
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
3d44aed09e nir/serialize: don't serialize redundant nir_intrinsic_instr::num_components
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
a2df670b14 nir/serialize: serialize writemask for vec8 and vec16
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
a5c5388234 nir/serialize: serialize swizzles for vec8 and vec16
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
f1a48d54ea nir/serialize: reuse the writemask field for 2 src X swizzles of SSA ALU
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
487a495cc0 nir/serialize: remove up to 3 consecutive equal ALU instruction headers
vec4 scalarized ALUs typically have 4 equal instruction headers, so remove
the last 3.

There are no bits left in the ALU header for more flags, so future
extensions of NIR will have to use something like instr_type == 15
to describe more complex ALU instructions.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
c3fa9de2a9 nir/serialize: try to pack both deref array src into 32 bits
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
ed6b01d5e0 nir/serialize: cleanup - fold nir_deref_type_var cases into switches
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
a0cd67d292 nir/serialize: try to put deref->var index into the unused bits of the header
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
ca201bfe70 nir/serialize: don't serialize mode for deref non-cast instructions
It can be derived from src and var. This frees 10 bits in the header
that will be used later.

"mode" is moved in the structure, because those bits will be used for
something else later.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
2286340fde nir/serialize: don't store deref types if not needed
- type_cast: deduplicate types if the last one is the same
- derive the type from the parent for other derefs

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
70a7f85149 nir/serialize: try to pack two alu srcs into 1 uint32
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
ef4630cf4f nir/serialize: pack nir_intrinsic_instr::const_index[] better
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
d3346b275a nir/serialize: pack 1-component constants into 20 bits if possible
The majority of constants can be packed like this.

v2: - use enum for the packing encoding,
    - trim packed_value to 20 bits add 1 bit to last_component,
      which simplifies a later commit

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
75f7c38863 nir/serialize: pack load_const with non-64-bit constants better
v2: use blob_write_uint8/16

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
a572ba673b nir/serialize: try to store a diff in var data locations instead of var data
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
c8314678ee nir/serialize: deduplicate serialized var types by reusing the last unique one
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
545415f45f nir/serialize: don't serialize var->data for temporaries
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
c358c2b2bf nir/serialize: pack src better and limit the object count to 1M from 1G
We need to limit the object count to 1M to free 10 bits for the src
modifiers.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
35655865cb nir/serialize: pack instructions better
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Marek Olšák
4fe1d7822b util/blob: add 8-bit and 16-bit reads and writes
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-23 00:02:10 -05:00
Markus Wick
dba903ed0b drirc: Enable glthread for dolphin/citra/yuzu.
Dolphin: 75 fps -> 88 fps - Super Mario Galaxy
Citra:   81 fps -> 91 fps - A Link Between Worlds
Yuzu:    21 fps -> 27 fps - Super Mario Odyssey

Dolphin still has many syncs because of glFenceSync and glClientWaitSync.
Moving them to the dispatcher thread might yield another speedup.

Yuzu uses a compatible profile by default. This benchmark used the variable
MESA_GL_VERSION_OVERRIDE=4.5FC to overwrite this behavior.

This profilation was done on a mobile i7-8550U CPU with i965.

Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-11-22 15:29:29 -05:00
Markus Wick
f4c61d422d mesa/glthread: Implement ARB_multi_bind.
Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-11-22 15:29:07 -05:00
Rhys Perry
517728477c aco: fix waitcnts for barriers at block ends
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: d1b9deee ('aco: improve waitcnt insertion around loops')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-11-22 19:56:31 +00:00
Zebediah Figura
a3c8bc10aa Revert "draw: revert using correct order for prim decomposition."
This reverts commit f97b731c82.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/250

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-11-22 20:37:42 +01:00
Kenneth Graunke
acd36e488d iris: Change keybox parenting
For temporary lookups, just allocate out of the NULL ralloc context,
so we don't have to edit the linked list of ralloc children to add it
and then immediately remove it again.

When uploading a new shader, allocate the keybox off the shader, so
if we delete the shader the keybox also goes away.  Less manual cleanup.
2019-11-22 09:50:59 -08:00
Ian Romanick
ca353285cb nir/range_analysis: Make sure the table validation only occurs once
All of the tables are static const, so they only need to be validated
once.  As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass.  Even with
the help of the previous commit, this does not always occur.

-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-22 08:16:06 -08:00
Ian Romanick
ccefce46cb nir/range-analysis: Add pragmas to help loop unrolling
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions.  It seems none of this happens even at -O3.

Adding the pragmas helps encourage loop unrolling at some optimization
levels.  I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.

-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5

v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-22 08:16:06 -08:00
Danylo Piliaiev
25a00b449f glsl: Add varyings to "zero-init of uninitialized vars" workaround
Varyings are similar to already handled cases. And "glsl_zero_init"
name of the workaround already looks like it should include varyings.

The issue was observed in GiMark subtest from GpuTest.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-22 15:25:56 +00:00
Alyssa Rosenzweig
4c43b354c3 pan/midgard: Use lower_tex_without_implicit_lod
Just a bit of cleanup. lower_tex can do this lowering for us, which
should also eliminate some special cases (one less thing to fix if we
ever need texturing in tess/geom/etc, perhaps?)

Closes #2133

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 08:38:57 -05:00
Christian Gmeiner
47c7c4263c etnaviv: use a more self-explanatory param name
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-22 10:47:13 +00:00
Christian Gmeiner
a949fa9d5d etnaviv: drop not used config_out function param
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-11-22 10:47:13 +00:00
Alyssa Rosenzweig
2e14fe6490 panfrost: Add lcra.c to Android.mk
This was forgotten.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig
bda2bb31b1 pan/midgard: Enable LOD lowering only on buggy chips
T720 and earlier need this workaround, so check the quirk before
lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig
68c2c7962a pan/midgard: Describe quirk MIDGARD_BROKEN_LOD
Corresponds to errata #10471, applies to T6xx and T720. Fixed in T760.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig
d32d4acf68 pan/midgard: Add LOD bias/clamp lowering
We fetch the info with the new intrinsic and lower with ALU ops for txl
instructions, which seemingly correspond to "TEXGRD" instructions (what
we call textureLod).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig
4e07e7b232 pan/midgard: Implement load_sampler_lod_paramaters_pan
We can stuff this information in as parametrized system values, like we
currently do texture size and SSBO addresses.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig
deaebc82a7 nir: Add load_sampler_lod_paramaters_pan intrinsic
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Markus Wick
b1156ecdf2 mapi/glapi: Generate sizeof() helpers instead of fixed sizes.
Generating a source code with a fixed size leads to issues with plattform dependent types.
We either hard code 4 or 8 bytes there, and both are wrong on the other plattform.
So this patch solves this issue by generating eg sizeof(GLsizeiptr), which is valid both
on 32 and on 64 bit plattforms.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-11-21 22:52:55 -05:00
Ian Romanick
e51eda99df intel/fs: Disable conditional discard optimization on Gen4 and Gen5
The CMP instruction on Gen4 and Gen5 generates one bit (the LSB) of
valid data and 31 bits of junk.  Results of comparisons that are used as
Boolean values need to have a fixup applied to generate the proper 0/~0
values.

Calling fs_visitor::nir_emit_alu with need_dest=false prevents the fixup
code from being generated.  This results in a sequence like:

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
(+f0.1) or.z.f0.1(16) null<1>UD g4<8,8,1>UD     g8<8,8,1>UD

instead of

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
        or(16) g4<1>UD g4<8,8,1>UD     g8<8,8,1>UD
(+f0.1) and.z.f0.1(16) null<1>UD g4<8,8,1>UD     1UD

I examined a couple of the shaders hurt by this change, and ALL of them
would have been affected by this bug. :(

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1836
Fixes: 0ba9497e66 ("intel/fs: Improve discard_if code generation")

Iron Lake
total instructions in shared programs: 8122757 -> 8122957 (<.01%)
instructions in affected programs: 8307 -> 8507 (2.41%)
helped: 0
HURT: 100
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.67% x̄: 2.81% x̃: 2.76%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.58% 3.03%
Instructions are HURT.

total cycles in shared programs: 188510100 -> 188510376 (<.01%)
cycles in affected programs: 76018 -> 76294 (0.36%)
helped: 0
HURT: 55
HURT stats (abs)   min: 2 max: 12 x̄: 5.02 x̃: 4
HURT stats (rel)   min: 0.07% max: 3.75% x̄: 0.86% x̃: 0.56%
95% mean confidence interval for cycles value: 4.33 5.71
95% mean confidence interval for cycles %-change: 0.60% 1.12%
Cycles are HURT.

GM45
total instructions in shared programs: 4994403 -> 4994503 (<.01%)
instructions in affected programs: 4212 -> 4312 (2.37%)
helped: 0
HURT: 50
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.25% x̄: 2.76% x̃: 2.72%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.45% 3.07%
Instructions are HURT.

total cycles in shared programs: 128928750 -> 128928982 (<.01%)
cycles in affected programs: 67442 -> 67674 (0.34%)
helped: 0
HURT: 47
HURT stats (abs)   min: 2 max: 12 x̄: 4.94 x̃: 4
HURT stats (rel)   min: 0.09% max: 3.75% x̄: 0.75% x̃: 0.53%
95% mean confidence interval for cycles value: 4.19 5.68
95% mean confidence interval for cycles %-change: 0.50% 1.00%
Cycles are HURT.
2019-11-21 16:40:50 -08:00
Marek Olšák
0b1452ffdd nir/serialize: do ctx = {0} instead of manual initializations
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-21 18:49:57 -05:00
Marek Olšák
ff71fae440 nir: strip as we serialize to remove the nir_shader_clone call
Serializing stripped NIR is faster now.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-21 18:49:57 -05:00
Christian Gmeiner
8acaab1aa7 etnaviv: add drm-shim
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-21 22:56:04 +00:00
Eric Engestrom
609a6ae23e vk_util: drop duplicate formats in vk_format_map[]
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-21 22:52:40 +00:00
Jonathan Marek
773d640efa turnip: implement UBWC
This enables UBWC for everything except 3D textures.

It breaks many image_to_image copies but those aren't important and it can
be worked around later (image_to_image copy needs to be done in two steps,
decode from the source format and then encode to the destination format).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-21 22:21:57 +00:00
Jonathan Marek
91fd83d142 freedreno/regs: update UBWC related bits
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-21 22:21:57 +00:00
Vinson Lee
6613a4a029 swr: Fix build with llvm-10.0.
Fix build error after llvm-10.0 commit 1dfede3122ee ("Move
CodeGenFileType enum to Support/CodeGen.h").

../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp: In member function ‘void JitManager::DumpAsm(llvm::Function*, const char*)’:
../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp:428:45: error: ‘CGFT_AssemblyFile’ is not a member of ‘llvm::TargetMachine’
             *pMPasses, filestream, nullptr, TargetMachine::CGFT_AssemblyFile);
                                             ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
2019-11-21 13:20:08 -08:00