Commit graph

4201 commits

Author SHA1 Message Date
Jason Ekstrand
0b6da5bac6 intel/fs/copy-prop: Bump the hash table size to 64
While the number of ACPs is generally not huge compared to the number of
blocks, 16 does seem a bit small.  Bumping it to 64 takes the execution
time of the piglit vs-isnan-dvec test from about 1:18.1 on an unoptimized
debug build (what we run in CI) with NIR_VALIDATE=0 to about 1:16.2.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-10 09:10:17 -05:00
Caio Marcelo de Oliveira Filho
f7d53fffa2 anv: Remove special allocation for anv_push_constants
The key reason for that mechanism is gone: all the extra optional data
that could be in the anv_push_constants was moved elsewhere.  At this
point, just put anv_push_constants directly in anv_cmd_state (part of
anv_cmd_buffer).

v2: Remove a NULL check we don't need anymore in
    anv_cmd_buffer_push_constants().  (Lionel)
    Fix size we consider for valid push params.  (Lionel)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-09 19:01:14 -07:00
Lionel Landwerlin
f2f6ac1c08 anv: Use corresponding type from the vector allocation
We didn't notice this issue much because the 2 struct share a similar
layout, expect for the additional fields...

We run into that issue in Anv :

==15236== Invalid write of size 8
==15236==    at 0x8CF3939C: anv_state_table_expand_range (anv_allocator.c:211)
==15236==    by 0x8CF394D5: anv_state_table_grow (anv_allocator.c:264)
==15236==    by 0x8CF3967E: anv_state_table_add (anv_allocator.c:312)
==15236==    by 0x8CF3B13C: anv_state_pool_alloc_no_vg (anv_allocator.c:1167)
==15236==    by 0x8CF3B2B0: anv_state_pool_alloc (anv_allocator.c:1190)
==15236==    by 0x8CF60871: alloc_surface_state (anv_image.c:1122)
==15236==    by 0x8CF61FF9: anv_CreateImageView (anv_image.c:1519)
==15236==    by 0x8BCBD2ED: vkCreateImageView (trampoline.c:1358)
==15236==  Address 0x8994ef10 is 0 bytes after a block of size 128 alloc'd
==15236==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15236==    by 0x8D2578E6: u_vector_init (u_vector.c:47)
==15236==    by 0x8CF3929A: anv_state_table_init (anv_allocator.c:168)
==15236==    by 0x8CF3A99A: anv_state_pool_init (anv_allocator.c:921)
==15236==    by 0x8CF56517: anv_CreateDevice (anv_device.c:1909)
==15236==    by 0x8BCB4FBA: terminator_CreateDevice (loader.c:6073)
==15236==    by 0x8DD2CB3D: ??? (in /home/djdeath/.steam/ubuntu12_64/libVkLayer_steam_fossilize.so)
==15236==    by 0x8DF4D241: vkCreateDevice (in /home/djdeath/.steam/ubuntu12_64/steamoverlayvulkanlayer.so)
==15236==    by 0x8BCB35C6: loader_create_device_chain (loader.c:5449)
==15236==    by 0x8BCBC230: vkCreateDevice (trampoline.c:838)

v2: Rename mmap_cleanups to avoid confusion (Caio)

v3: s/fail_mmap_cleanups/fail_cleanups/ (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110648
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-09 21:57:26 +01:00
Eric Engestrom
6c6af0c8b0 i965_asm: avoid free()ing uninitialized pointers
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-09 10:03:15 +00:00
Eric Engestrom
51597eca84 i965_asm: fix memleak
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-09 10:03:15 +00:00
Lionel Landwerlin
43596e5f34 anv: fix use after free
Once mem->bo is removed from the cache, it is likely to be freed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b80930a6fe ("anv: add support for VK_EXT_memory_budget")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-08 12:02:13 +01:00
Lionel Landwerlin
a07d06f103 anv: rework queries writes to ensure ordering memory writes
We use a mix of MI & PIPE_CONTROL commands to write our queries' data
(results & availability). Those commands' memory write order is not
guaranteed with regard to their order in the command stream, unless CS
stalls are inserted between them. This is problematic for 2 reasons :

   1. We copy results from the device using MI commands even though
      the values are generated from PIPE_CONTROL, meaning we could
      copy unlanded values into the results and then copy the
      availability that is inconsistent with the values.

   2. We allow the user to poll on the availability values of the
      query pool from the CPU. If the availability lands in memory
      before the values then we could return invalid values.

This change does 2 things to address this problem :

      - We use either PIPE_CONTROL or MI commands to write both
        queries values and availability, so that the ordering of the
        memory writes guarantees that if availability is visible,
        results are also visible.

      - For the occlusion & timestamp queries we apply a CS stall
        before copying the results on the device, to ensure copying
        with MI commands see the correct values of previous
        PIPE_CONTROL writes of availability (required by the Vulkan
        spec).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-08 09:49:09 +00:00
Matt Turner
e8c74a1e16
intel/compiler: Unset flag reg when FB write is not predicated
In the FS IR we pretend that the instruction is predicated with (+f0.1)
just for flag dependency tracking purposes. Since the instruction
doesn't support predication before Haswell, we unset the predicate so we
should also unset the flag register so that we can round-trip the
disassembly.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
5d7a9e0811
intel/disasm: Disassemble immediate value properly for dim
On haswell, for dim instruction we encode immediate float value operand
into double float,

v2: Fix comment (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
6c83a68ebc
intel/disasm: Disassemble JIP offset for while
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
9db616e8a2
intel/compiler: Replicate 16 bit immediate value correctly
For the W or UW (signed or unsigned word) source types, the 16-bit value
must be replicated in both the low and high words of the 32-bit
immediate value.

v2: Fix replication in other places as well
V3: fix a few nits (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
5211159b5b
intel/compiler: Print quad value in hex format
Print quad value same as unsigned quad so that we can distinguish in
between quater control disassembled values for e.g 1/2/3[Q] and
immediate quad value for e.g 1Q. This allows round-tripping through the
assembler/disassembler.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
4e828bb48a
intel/tools: Add unit tests for assembler
v1: Pass executable object from meson to test(Dylan Baker)
v2: Ignore generated output files from git status(Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-07 14:33:48 -07:00
Mika Kuoppala
1fb5ce0a11
intel/tools: Initialize offset correctly for i965_asm
If we leave offset uninitialized, access to store
will be random depending on stack value and can
segfault.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Mika Kuoppala
85da1194ec
intel/tools: Add meson pthread dependancy for i965_asm
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-07 14:33:48 -07:00
Sagar Ghuge
70308a5a8a
intel/tools: New i965 instruction assembler tool
Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who
mentored me through out this project.

v2: Fix memory leaks and naming convention (Caio)
v3: Fix meson changes (Dylan Baker)
v4: Fix usage options (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/141
2019-05-07 14:33:38 -07:00
Samuel Iglesias Gonsálvez
bc66cebc0d anv: fix alphaToCoverage when there is no color attachment
There are tests in CTS for alpha to coverage without a color attachment
that are failing. This happens because we remove the shader color
outputs when we don't have a valid color attachment for them, but when
alpha to coverage is enabled we still want to preserve the the output
at location 0 since we need the alpha component. In that case we will
also need to create a null render target for RT 0.

v2:
  - We already create a null rt when we don't have any, so reuse that
    for this case (Jason)
  - Simplify the code a bit (Iago)

v3:
  - Take alpha to coverage from the key and don't tie this to depth-only
    rendering only, we want the same behavior if we have multiple render
    targets but the one at location 0 is not used. (Jason).
  - Rewrite commit message (Iago)

v4:
  - Make sure we take into account the array length of the shader outputs,
    which we were no handling correctly either and make sure we also
    create null render targets for any invalid array entries too.

v5:
  - Simplify removal of unused outputs by using rt_used[] so we don't have
    to special case alpha to coverage there too.

Fixes the following CTS tests:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.*

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 09:35:47 +02:00
Ian Romanick
c866500525 intel/compiler: Don't always require precise lowering of flrp
No changes on any other Intel platforms.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8164367 -> 8135551 (-0.35%)
instructions in affected programs: 3271235 -> 3242419 (-0.88%)
helped: 13636
HURT: 90
helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1
helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97%
HURT stats (abs)   min: 1 max: 4 x̄: 1.80 x̃: 2
HURT stats (rel)   min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78%
95% mean confidence interval for instructions value: -2.13 -2.07
95% mean confidence interval for instructions %-change: -1.16% -1.13%
Instructions are helped.

total cycles in shared programs: 188719974 -> 188586222 (-0.07%)
cycles in affected programs: 70415766 -> 70282014 (-0.19%)
helped: 12563
HURT: 515
helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6
helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27%
HURT stats (abs)   min: 2 max: 54 x̄: 6.07 x̃: 4
HURT stats (rel)   min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08%
95% mean confidence interval for cycles value: -10.56 -9.90
95% mean confidence interval for cycles %-change: -0.47% -0.45%
Cycles are helped.

LOST:   0
GAINED: 13

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Ian Romanick
dd7135d55d intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5
Previously lower_flrp32 was only set for vertex shaders.  Fragment
shaders performed a(1-c)+bc lowering during code generation.

The shaders with loops hurt are SIMD8 and SIMD16 shaders for a
text-identical fragment shader.

v2: Rebase on 26391cceaa ("intel/compiler: Lower ffma on Gen4 and
Gen5").

v3: Rebase on a004e95dd7 ("radeonsi/nir: create si_nir_opts() helper")

Iron Lake
total instructions in shared programs: 8211385 -> 8185974 (-0.31%)
instructions in affected programs: 2503898 -> 2478487 (-1.01%)
helped: 9936
HURT: 921
helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11%
HURT stats (abs)   min: 1 max: 12 x̄: 3.24 x̃: 2
HURT stats (rel)   min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89%
95% mean confidence interval for instructions value: -2.43 -2.25
95% mean confidence interval for instructions %-change: -1.41% -1.33%
Instructions are helped.

total cycles in shared programs: 188523186 -> 188401198 (-0.06%)
cycles in affected programs: 71541604 -> 71419616 (-0.17%)
helped: 11649
HURT: 1871
helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6
helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 13.38 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17%
95% mean confidence interval for cycles value: -9.42 -8.63
95% mean confidence interval for cycles %-change: -0.54% -0.50%
Cycles are helped.

total loops in shared programs: 852 -> 856 (0.47%)
loops in affected programs: 0 -> 4
helped: 0
HURT: 4
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00%
95% mean confidence interval for loops value: 1.00 1.00
95% mean confidence interval for loops %-change: 0.00% 0.00%
Loops are HURT.

LOST:   3
GAINED: 12

GM45
total instructions in shared programs: 5046407 -> 5033694 (-0.25%)
instructions in affected programs: 1303584 -> 1290871 (-0.98%)
helped: 5010
HURT: 464
helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2
helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08%
HURT stats (abs)   min: 1 max: 75 x̄: 3.39 x̃: 2
HURT stats (rel)   min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87%
95% mean confidence interval for instructions value: -2.45 -2.20
95% mean confidence interval for instructions %-change: -1.40% -1.28%
Instructions are helped.

total cycles in shared programs: 128889476 -> 128812366 (-0.06%)
cycles in affected programs: 44845402 -> 44768292 (-0.17%)
helped: 6079
HURT: 940
helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8
helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25%
HURT stats (abs)   min: 2 max: 138 x̄: 16.01 x̃: 8
HURT stats (rel)   min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17%
95% mean confidence interval for cycles value: -11.63 -10.34
95% mean confidence interval for cycles %-change: -0.58% -0.52%
Cycles are helped.

total loops in shared programs: 633 -> 635 (0.32%)
loops in affected programs: 0 -> 2
helped: 0
HURT: 2

total spills in shared programs: 60 -> 69 (15.00%)
spills in affected programs: 54 -> 63 (16.67%)
helped: 0
HURT: 1

total fills in shared programs: 92 -> 105 (14.13%)
fills in affected programs: 80 -> 93 (16.25%)
helped: 0
HURT: 1

LOST:   15
GAINED: 15

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
2019-05-06 22:52:29 -07:00
Ian Romanick
d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Christian Gmeiner
4e110eca42 nir: nir_shader_compiler_options: drop native_integers
Driver which do not support native integers should use a lowering
pass to go from integers to floats.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:52 +02:00
Jason Ekstrand
30fa15e36b anv,i965: Stop warning about incomplete gen11 support
Both drivers are feature-complete and should be running more-or-less at
perf at this point.  Drop the warning.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-05-03 22:57:35 +00:00
Lionel Landwerlin
80dc78407d anv: fix crash when application does not provide push constants
Found while running Talos Principle.

As far as I can tell running a draw call with a pipeline having push
constants without the application having called vkCmdPushConstants
gives undefined push constant values.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
2019-05-03 10:21:40 +01:00
Caio Marcelo de Oliveira Filho
aa675cef5e intel/fs: Assert when brw_fs_nir sees a nir_deref_instr
Since 09f1de97a7 "anv,i965: Lower away image derefs in the driver"
the backend compiler is not expected to handle any derefs, so let's
assert on it.

This helps identifying problems when a deref is not lowered and
"leaks" into the backend compiler.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-02 23:25:30 -07:00
Jason Ekstrand
be7e9870d6 anv: Stop including POS in FS input limits
It is an input but it comes in as part of the shader payload and doesn't
count towards the limits.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-02 18:56:51 -05:00
Eric Engestrom
b80930a6fe anv: add support for VK_EXT_memory_budget
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-30 15:40:33 +00:00
Juan A. Suarez Romero
8d621e8ff7 anv: enable descriptor indexing capabilities
This enables the remaining capabilities in SPV_EXT_descriptor_indexing.

Fixes: 6e230d7607 "anv: Implement VK_EXT_descriptor_indexing"

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-30 09:23:46 +02:00
Rafael Antognolli
9175c7058e intel/blorp: Make blorp update the clear color in gen11.
Hardware docs say that Gen11 requires the use of two MI_ATOMICs of size
QWORD when updating the clear color. The second MI_ATOMIC also needs CS
Stall and Return Data Control set.

v2: Remove include of srgb header (Lionel)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-29 21:19:59 +00:00
Rafael Antognolli
f8c3f408a6 intel/genxml: Update MI_ATOMIC genxml definition.
Change some of the single bit fields to booleans, and add an enum with
the definition of the ATOMIC_OPCODE.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-29 21:19:59 +00:00
Jordan Justen
38ffd7ce79 intel/genxml: Support base-16 in value & start fields in gen_sort_tags.py
With python's int(), if the optional second parameter is 0, then
python will support the 0x prefix for hex numbers.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-29 21:19:58 +00:00
Plamena Manolova
232c0f6489 isl: Set ClearColorConversionEnable.
The ClearColorConversionEnable bit needs to be set
for GEN11 when inderect clear colors are used.

Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-04-29 21:19:58 +00:00
Eric Engestrom
7ca8ba199f delete autotools .gitignore files
One special case, `src/util/xmlpool/.gitignore` is not entirely deleted,
as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`).

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-04-29 21:17:19 +00:00
Lionel Landwerlin
9628631a38 Revert "anv: limit URB reconfigurations when using blorp"
In commit 0d46e404 ("anv: limit URB reconfigurations when using
blorp") we tried to limit the number of URB reconfiguration by
checking if the last allocation is large enough to fit the blorp
dispatch.

We used the last bound pipeline to compare the allocation. The problem
with this is that the pipeline is bound but its commands might not
have been emitted into the command buffer yet.

Let's just revert commit 0d46e40467
since it didn't seem to yield any performance improvement.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0d46e404 ("anv: limit URB reconfigurations when using blorp")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110535
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-29 11:41:27 +00:00
Kenneth Graunke
9dcf90d7ba intel/fs: Don't emit empty ELSE blocks.
While we can clean this up later, it's trivial to not generate the
stupid code in the first place, which saves some optimization work.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-28 22:36:09 -07:00
Tapani Pälli
376c3e8f87 anv: expose VK_EXT_queue_family_foreign on Android
VK_ANDROID_external_memory_android_hardware_buffer requires this
extension. It is safe to enable it since currently aux usage is
disabled for ahw buffers.

Fixes following dEQP extension dependency test on Android:
   dEQP-VK.api.info.device#extensions

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-29 07:31:02 +03:00
Jason Ekstrand
934f178341 anv/descriptor_set: Don't fully destroy sets in pool destroy/reset
In 105002bd2d, we fixed a memory leak bug where we weren't properly
destroying descriptor when destroying/resetting a descriptor pool.
However, the only real leak that happened was that we we take a
reference to the descriptor set layout in the descriptor set and we
weren't dropping our reference.  Everything else in the descriptor set
is tied to the pool itself and doesn't need to be freed on a per-set
basis.  This commit changes the destroy/reset functions to only bother
walking the list of sets to unref the layouts and otherwise we just
assume that the whole-pool destroy/reset takes care of the rest.

Now that we're doing more non-trivial things with descriptor sets such
as allocating things with util_vma_heap, per-set destruction is starting
to show up on perf traces.  This takes reset back to where it's supposed
to be as a cheap whole-pool operation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-26 05:40:28 +00:00
Jason Ekstrand
baf4802e3e anv: Better handle 32-byte alignment of descriptor set buffers
In c520f4dec9, we chose to align the sizes of descriptor set buffers to
32 bytes.  We have to align the descriptor set buffer to 32B so that
it's valid for using with push constants.  We align the size as well so
we don't leave lots of holes with util_vma_heap_alloc.  Unfortunately,
we were only aligning it for alloc and not for free so we were still
creating piles of holes when we delete descriptor sets.  This causes
terrible perf for the allocator once we've deleted piles of descriptor
sets.

This commit reworks the code so that we align the descriptor set buffer
size to 32B for both alloc and free.  The result is that it takes the
new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds.

Fixes: c520f4dec9 "anv: Add a concept of a descriptor buffer"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-26 05:40:28 +00:00
Caio Marcelo de Oliveira Filho
055f6281d4 intel/fs: Don't handle texop_tex for shaders without implicit LOD
These will be lowered by nir_lower_tex() with the
lower_tex_when_implicit_lod_not_supported, so don't need the extra
handling here.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-25 12:13:06 -07:00
Topi Pohjolainen
ff642fb0e6 intel/compiler/fs/icl: Use dummy masked urb write for tess eval
One cannot write the URB arbitrarily and therefore the message
has to be carefully constructed. The clever tricks originate
from Kenneth and Jason, I'm just writing the patch.

Fixes GPU hangs on ICL with Vulkan CTS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2019-04-25 22:00:43 +03:00
Juan A. Suarez Romero
b06ae53606 Revert "intel/compiler: split is_partial_write() into two variants"
This reverts commit 40b3abb4d1.

It is not clear that this commit was entirely correct, and unfortunately
it was pushed by error.

CC: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-25 09:19:10 +02:00
Lionel Landwerlin
f15409ee55 i965: fix icelake performance query enabling
This was a rebase issue which lost of change to a file moved from i965
to src/intel/perf.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 134e750e16 ("i965: extract performance query metrics")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-25 01:11:54 +00:00
Dave Airlie
3323cf08f0 intel/compiler: fix uninit non-static variable. (v2)
Pointed out by coverity.

v2: init nir_locals also.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-25 06:06:57 +10:00
Rafael Antognolli
f2041d2a92 intel/isl: Resize clear color buffer to full cacheline
Fixes MCS fast clear gpu hangs with Vulkan CTS on ICL in CI.

v2 (Nanley): In the title s/Align/Resize/

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-04-24 08:56:42 +03:00
Jason Ekstrand
45957c05b0 anv/descriptor_set: Properly align descriptor buffer to a page
Instead of aligning and then taking inline uniforms into account, we
need to take inline uniforms into account and then align to a page.
Otherwise, we may not be aligned to a page and allocation may fail.

Fixes: 43f40dc7cb "anv: Implement VK_EXT_inline_uniform_block"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-24 05:40:27 +00:00
Jason Ekstrand
3d33c13eca anv/descriptor_set: Only vma_heap_finish if we have a descriptor buffer
Fixes: 7bb34ecff9 "anv: release memory allocated by bo_heap when..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-24 05:40:27 +00:00
Jason Ekstrand
0bc1942c9d anv/descriptor_set: Destroy sets before pool finalization
Fixes: 105002bd2d "anv: destroy descriptor sets when pool gets..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-24 05:40:27 +00:00
Jason Ekstrand
6be603edf7 anv/descriptor_set: Unlink sets from the pool in set_destroy
anv_descriptor_pool_free_set is called on the clean-up path of
anv_descriptor_set_create and the set may not have been added to the
pool's list of sets yet.  While we're here, we move adding it to that
list into set_create for symmetry.

Fixes: 105002bd2d "anv: destroy descriptor sets when pool gets..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-24 05:40:27 +00:00
Ian Romanick
21223acf7d intel/fs: Fix D to W conversion in opt_combine_constants
Found by GCC warning:

src/intel/compiler/brw_fs_combine_constants.cpp: In function ‘bool needs_negate(const fs_reg*, const imm*)’:
src/intel/compiler/brw_fs_combine_constants.cpp:306:34: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
       return ((reg->d & 0xffffu) < 0) != (imm->w < 0);
               ~~~~~~~~~~~~~~~~~~~^~~

The result of the bit-and is a 32-bit value with the top bits all zero.
This will never be < 0.  Instead of masking off the bits, just cast to
int16_t and let the compiler handle the actual conversion.

Fixes: e64be391dd ("intel/compiler: generalize the combine constants pass")
Cc: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-23 19:48:33 -07:00
Ian Romanick
26391cceaa intel/compiler: Lower ffma on Gen4 and Gen5
flrp32 is also a 3-source instruction, but there is another pending
series that handles that for Gen4 and Gen5.

v2: Rebase on "intel/compiler: Don't have sepearate, per-Gen
nir_options"

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-23 17:50:28 -07:00
Ian Romanick
fd1fa9afc7 intel/compiler: Don't have sepearate, per-Gen nir_options
Instead, just have separate scalar vs. vector nir_options and do
per-Gen "fix ups".

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-23 17:50:16 -07:00