Commit graph

3672 commits

Author SHA1 Message Date
Jason Ekstrand
19c608fe43 intel/blorp: Be more conservative about copying clear colors
In 92eb5bbc68 we attempted to avoid copying clear colors whenever
we weren't doing a resolve.  However, this broke MSAA resolves because
we need the clear color in the source.  This patch makes blorp much more
conservative such that it only avoids the clear color copy if either
aux_usage == NONE or it's explicitly doing a fast-clear.

Fixes: 92eb5bbc68 "intel/blorp: Only copy clear color when doing..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107728
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-01-04 17:57:43 -06:00
Lionel Landwerlin
da634a4acb intel/blorp: emit VF caching workaround before 3DSTATE_VERTEX_BUFFERS
Probably no difference but it's nice to have i965 & blorp emit things
in the same order.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-04 11:18:51 +00:00
Timothy Arceri
50de3f80a8 nir: rename nir_link_constant_varyings() nir_link_opt_varyings()
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Erik Faye-Lund
86089a7316 anv/autotools: make sure tests link with -msse2
Without this, I get the following error when building the tests with
autotools on i686:

---8<---
src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration]
       __builtin_ia32_clflush(p);
       ^~~~~~~~~~~~~~~~~~~~~~
       __builtin_ia32_pause
src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration]
    __builtin_ia32_mfence();
    ^~~~~~~~~~~~~~~~~~~~~
    __builtin_ia32_fnclex
---8<---

The erros are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c

This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Eric Engeström <eric@engestrom.ch>
2018-12-31 17:28:21 +01:00
Erik Faye-Lund
89679e18a9 anv/meson: make sure tests link with -msse2
Without this, I get the following error when building the tests using
meson on i686:

---8<---
In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46,
                 from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26:
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration]
       __builtin_ia32_clflush(p);
       ^~~~~~~~~~~~~~~~~~~~~~
       __builtin_ia32_pause
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration]
    __builtin_ia32_mfence();
    ^~~~~~~~~~~~~~~~~~~~~
    __builtin_ia32_fnclex
---8<---

The errors are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c

This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>
2018-12-31 17:27:33 +01:00
Lionel Landwerlin
f7bccf6ab4 intel/aub_viewer: highlight true booleans
Useful to spot PIPE_CONTROL flags.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-12-28 16:48:46 +00:00
Lionel Landwerlin
6ba61ea391 intel/aub_viewer: fold binding/sampler table items
Makes things easier to read rather than a long block of text.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-12-28 16:48:43 +00:00
Lionel Landwerlin
7ab8c80625 intel/aub_viewer: fix shader view
Not decoding the shader at the right offset.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-12-28 16:48:40 +00:00
Lionel Landwerlin
f3ed4a058d intel/aub_viewer: print address of missing shader
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-12-28 16:48:21 +00:00
Lionel Landwerlin
0382e11989 intel/aub_viewer: fixup 0x address prefix
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-12-28 16:48:18 +00:00
Lionel Landwerlin
8e2fda411a intel/aub_viewer: fix shader get_bo
Instruction addresses are always in ppgtt space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-12-28 16:48:08 +00:00
Lionel Landwerlin
e2ae5f2f0a anv: don't do partial resolve on layer > 0
We've made the choice not to use fast clears on layer > 0 with
multilayer images. This is partly because we would need to store
multiple clear colors for each layer, making the existing memory
layout, already including aux surfaces, fast clear color, image state,
etc... even more complex.

Partial resolves are the operations transfering the clear colors into
the auxiliary buffers. This operation is currently implemented in
Blorp by loading the clear color from the image's BO, into a shader
that then samples from the auxiliary buffer and writes the color only
if it isn't there already.

The problem here is that because we store only one clear color for all
layers and it is used for partial resolves. If you trigger a partial
clear on a layer > 0, then you're likely to deal with a color that is
not what you actually want. In the particular issues below, we have
multiple layers, each cleared with a different color but the partial
resolve just writes the wrong color into the auxiliary buffers for
layers > 0.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108910
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911
Cc: mesa-stable@lists.freedesktop.org
2018-12-24 09:42:46 +00:00
Iago Toral Quiroga
d6110d4d54 intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs
The former expects to see SSA-only things, but the latter injects registers.

The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.

Fixes: 11dc130779 "nir: Add a bool to int32 lowering pass"
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-20 08:02:44 +01:00
Tapani Pälli
3627c9efff anv/android: turn on VK_ANDROID_external_memory_android_hardware_buffer
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:42 +02:00
Tapani Pälli
3dc424a4f4 anv: ignore VkSamplerYcbcrConversion on non-yuv formats
This fulfills a requirement for clients that want to utilize same
code path for images with external formats (VK_FORMAT_UNDEFINED) and
"regular" RGBA images where format is known. This is similar to how
OES_EGL_image_external works.

To support this, we allow color conversion samplers for non-YUV
formats but skip setting up conversion when format does not have
can_ycbcr flag set.

v2: add comment and bundle can_ycbcr to the existing break
    condition (Lionel)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
a7b7772cfb anv: support VkSamplerYcbcrConversionInfo in vkCreateImageView
If a conversion struct was passed, then initialize view using
format from the conversion structure.

v2: use vk_format directly from the anv_format struct
v3: added some assertions (Lionel)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
bb0721aea4 anv: add VkFormat field as part of anv_format
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
c070b0e25f anv: support VkExternalFormatANDROID in vkCreateSamplerYcbcrConversion
If external format is used, we store the external format identifier in
conversion to be used later when creating VkImageView.

v2: rebase to b43f955037 changes
v3: added assert, ignore components when creating external
    format conversion (Lionel)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
f1654fa7e3 anv/android: support creating images from external format
Since we don't know the exact format at creation time, some initialization
is done only when bound with memory in vkBindImageMemory.

v2: demand dedicated allocation in vkGetImageMemoryRequirements2 if
    image has external format

v3: refactor prepare_ahw_image, support vkBindImageMemory2,
    calculate stride correctly for rgb(x) surfaces, rename as
    'resolve_ahw_image'

v4: rebase to b43f955037 changes
v5: add some assertions to verify input correctness (Lionel)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
517103abf1 anv/android: add ahardwarebuffer external memory properties
v2: have separate memory properties for android, set usage
    flags for buffers correctly

v3: code cleanup (Jason)
    + limit maxArrayLayers to 1 for AHardwareBuffer based images

v4: rebase to b43f955037 changes

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
c79a528d2b anv/android: support import/export of AHardwareBuffer objects
v2: add support for non-image buffers (AHARDWAREBUFFER_FORMAT_BLOB)
v3: properly handle usage bits when creating from image
v4: refactor, code cleanup (Jason)
v5: rebase to b43f955037 changes,
    initialize bo flags as ANV_BO_EXTERNAL (Lionel)
v6: add assert that anv_bo_cache_import succeeds, add comment
    about multi-bo support to clarify current implementation (Lionel)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
5c65c60d6c anv: refactor, remove else block in AllocateMemory
This makes it cleaner to introduce more cases where we import memory
from different types of external memory buffers.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
884fc90fde anv: add anv_ahw_usage_from_vk_usage helper function
v2: rebase to b43f955037 changes

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
1e6a44400a anv/android: add GetAndroidHardwareBufferPropertiesANDROID
Use the anv_format address in formats table as implementation-defined
external format identifier for now. When adding YUV format support this
might need to change.

v2: code cleanup (Jason)
v3: set anv_format address as identifier
v4: setup suggestedYcbcrModel and suggested[X|Y]ChromaOffset
    as expected for HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL
v5: set linear tiling for GPU_DATA_BUFFER usage, add comment
    about multi-bo support to clarify current implementation (Lionel)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
aa94e01bfe anv: add from/to helpers with android and vulkan formats
v2: handle R8G8B8X8 as R8G8B8_UNORM (Jason)
v3: add HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL, we make it define
    for now to avoid direct dependency to minigbm headers

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
c1f15a0a1a anv: make anv_get_image_format_features public
This will be utilized later by GetAndroidHardwareBufferPropertiesANDROID.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
8a469fd335 anv: refactor make_surface to use data from anv_image
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Tapani Pälli
2a98e5bbb9 anv: add create_flags as part of anv_image
This will make it possible for next patch to rip
anv_image_create_info out from make_surface function.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-19 09:38:41 +02:00
Jason Ekstrand
3feda3cf35 anv: Bump the patch version to 96
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-18 09:40:46 -06:00
Ian Romanick
af07141b33 intel/compiler: More peephole_select for pre-Gen6
No shader-db changes on any Gen6+ platform.

All of the shaders with cycles hurt by more than ~2% are from Master of
Orion.  All of the shaders have instructions helped.  It looks like the
pass enables some control flow to be converted to bcsels, then the
scheduler does dumb things.  These are new shaders (just added before
doing this shader-db run), so there's probably some low-hanging fruit.

Iron Lake
total instructions in shared programs: 8214327 -> 8213684 (<.01%)
instructions in affected programs: 84469 -> 83826 (-0.76%)
helped: 114
HURT: 26
helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61%
95% mean confidence interval for instructions value: -5.87 -3.32
95% mean confidence interval for instructions %-change: -2.32% -1.17%
Instructions are helped.

total cycles in shared programs: 187736850 -> 187749314 (<.01%)
cycles in affected programs: 506750 -> 519214 (2.46%)
helped: 104
HURT: 36
helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16
helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63%
HURT stats (abs)   min: 4 max: 1402 x̄: 409.67 x̃: 40
HURT stats (rel)   min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39%
95% mean confidence interval for cycles value: 28.32 149.74
95% mean confidence interval for cycles %-change: -0.07% 1.61%
Inconclusive result (%-change mean confidence interval includes 0).

GM45
total instructions in shared programs: 5044014 -> 5043652 (<.01%)
instructions in affected programs: 46751 -> 46389 (-0.77%)
helped: 63
HURT: 13
helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52%
95% mean confidence interval for instructions value: -6.54 -2.99
95% mean confidence interval for instructions %-change: -3.04% -1.28%
Instructions are helped.

total cycles in shared programs: 128143042 -> 128150188 (<.01%)
cycles in affected programs: 324564 -> 331710 (2.20%)
helped: 57
HURT: 19
helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32
helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81%
HURT stats (abs)   min: 10 max: 1400 x̄: 468.21 x̃: 60
HURT stats (rel)   min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70%
95% mean confidence interval for cycles value: 6.90 181.15
95% mean confidence interval for cycles %-change: -0.52% 1.59%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
378f996771 nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
8fb8ebfbb0 intel/compiler: More peephole select
Shader-db results:

The one shader hurt for instructions is a compute shader that had both
spills and fills hurt.

v2: Fix typo in comment noticed by Caio.

v3: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15072761 -> 15047884 (-0.17%)
instructions in affected programs: 895539 -> 870662 (-2.78%)
helped: 3623
HURT: 1
helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4
helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20%
HURT stats (abs)   min: 92 max: 92 x̄: 92.00 x̃: 92
HURT stats (rel)   min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92%
95% mean confidence interval for instructions value: -7.10 -6.63
95% mean confidence interval for instructions %-change: -4.03% -3.82%
Instructions are helped.

total cycles in shared programs: 369738930 -> 369535732 (-0.05%)
cycles in affected programs: 68027851 -> 67824653 (-0.30%)
helped: 2609
HURT: 1035
helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77
helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47%
HURT stats (abs)   min: 1 max: 33336 x̄: 261.04 x̃: 20
HURT stats (rel)   min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47%
95% mean confidence interval for cycles value: -96.43 -15.09
95% mean confidence interval for cycles %-change: -6.07% -5.36%
Cycles are helped.

total spills in shared programs: 10158 -> 10159 (<.01%)
spills in affected programs: 166 -> 167 (0.60%)
helped: 1
HURT: 1

total fills in shared programs: 22105 -> 22116 (0.05%)
fills in affected programs: 837 -> 848 (1.31%)
helped: 4
HURT: 1

Ivy Bridge
total instructions in shared programs: 12021190 -> 11990256 (-0.26%)
instructions in affected programs: 910561 -> 879627 (-3.40%)
helped: 3344
HURT: 18
helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6
helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31%
HURT stats (abs)   min: 2 max: 20 x̄: 7.89 x̃: 6
HURT stats (rel)   min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70%
95% mean confidence interval for instructions value: -9.49 -8.91
95% mean confidence interval for instructions %-change: -5.32% -4.98%
Instructions are helped.

total cycles in shared programs: 179077826 -> 178570196 (-0.28%)
cycles in affected programs: 63205667 -> 62698037 (-0.80%)
helped: 2767
HURT: 620
helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88
helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09%
HURT stats (abs)   min: 1 max: 31255 x̄: 152.27 x̃: 11
HURT stats (rel)   min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58%
95% mean confidence interval for cycles value: -173.94 -125.81
95% mean confidence interval for cycles %-change: -7.68% -6.97%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10852569 -> 10843758 (-0.08%)
instructions in affected programs: 235803 -> 226992 (-3.74%)
helped: 800
HURT: 0
helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8
helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36%
95% mean confidence interval for instructions value: -11.93 -10.10
95% mean confidence interval for instructions %-change: -4.99% -4.39%
Instructions are helped.

total cycles in shared programs: 154732047 -> 154608941 (-0.08%)
cycles in affected programs: 4063110 -> 3940004 (-3.03%)
helped: 606
HURT: 253
helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62
helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81%
HURT stats (abs)   min: 1 max: 1966 x̄: 59.36 x̃: 11
HURT stats (rel)   min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67%
95% mean confidence interval for cycles value: -170.49 -116.13
95% mean confidence interval for cycles %-change: -2.61% -1.65%
Cycles are helped.

No change on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
09b7e1d8e4 nir/opt_peephole_select: Don't try to remove flow control around indirect loads
That flow control may be trying to avoid invalid loads.  On at least
some platforms, those loads can also be expensive.

No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").

v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select.  Suggested
by Rob.  See also the big comment in src/intel/compiler/brw_nir.c.

v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).

v4: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
4cd1a0be76 i965/vec4: Propagate conditional modifiers from more compares to other compares
If there is a CMP.NZ that compares a single component (via a .zzzz
swizzle, for example) with 0, it can propagate its conditional modifier
back to a previous CMP that writes only that component.  The specific
case that I saw was:

    cmp.l.f0(8)     g42<1>.xF       g61<4>.xF       (abs)g18<4>.zF
    ...
    cmp.nz.f0(8)    null<1>D        g42<4>.xD       0D

In this case we can just delete the second CMP.

No changes on Broadwell or Skylake because they do not use the vec4
backend.  Also no changes on GM45 or Iron Lake.

Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10856676 -> 10852569 (-0.04%)
instructions in affected programs: 228322 -> 224215 (-1.80%)
helped: 1331
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4
helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83%
95% mean confidence interval for instructions value: -3.19 -2.99
95% mean confidence interval for instructions %-change: -1.93% -1.83%
Instructions are helped.

total cycles in shared programs: 154788865 -> 154732047 (-0.04%)
cycles in affected programs: 2485892 -> 2429074 (-2.29%)
helped: 1097
HURT: 59
helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64
helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22%
HURT stats (abs)   min: 2 max: 16 x̄: 3.02 x̃: 2
HURT stats (rel)   min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71%
95% mean confidence interval for cycles value: -51.04 -47.26
95% mean confidence interval for cycles %-change: -3.40% -3.07%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
9a83c3d3b3 i965/fs: Eliminate unary op on operand of compare-with-zero
The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and
scalar parts. In the VS part, adding the new pattern was not
helpful. The pattern that is removed is really old, and it has been
handled by NIR for ages.

All Gen7+ platforms had similar results. (Broadwell shown)
total instructions in shared programs: 14715715 -> 14715709 (<.01%)
instructions in affected programs: 474 -> 468 (-1.27%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.40% -1.15%
Instructions are helped.

total cycles in shared programs: 559569911 -> 559569809 (<.01%)
cycles in affected programs: 5963 -> 5861 (-1.71%)
helped: 6
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85%
95% mean confidence interval for cycles value: -18.15 -15.85
95% mean confidence interval for cycles %-change: -1.95% -1.51%
Cycles are helped.

Iron Lake and Sandy Bridge had similar results. (Iron Lake shown)
total instructions in shared programs: 7780915 -> 7780913 (<.01%)
instructions in affected programs: 246 -> 244 (-0.81%)
helped: 2
HURT: 0

total cycles in shared programs: 177876108 -> 177876106 (<.01%)
cycles in affected programs: 3636 -> 3634 (-0.06%)
helped: 1
HURT: 0

GM45
total instructions in shared programs: 4799152 -> 4799151 (<.01%)
instructions in affected programs: 126 -> 125 (-0.79%)
helped: 1
HURT: 0

total cycles in shared programs: 122052654 -> 122052652 (<.01%)
cycles in affected programs: 3640 -> 3638 (-0.05%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
440c051340 i965/vec4/dce: Don't narrow the write mask if the flags are used
In an instruction sequence like

            cmp(8).ge.f0.0 vgrf17:D, vgrf2.xxxx:D, vgrf9.xxxx:D
    (+f0.0) sel(8) vgrf1:UD, vgrf8.xyzw:UD, vgrf1.xyzw:UD

The other fields of vgrf17 may be unused, but the CMP still needs to
generate the other flag bits.

To my surprise, nothing in shader-db or any test suite appears to hit
this.  However, I have a change to brw_vec4_cmod_propagation that
creates cases where this can happen.  This fix prevents a couple dozen
regressions in that patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5df88c20 ("i965/vec4: Rewrite dead code elimination to use live in/out.")
2018-12-17 13:47:06 -08:00
Ian Romanick
111bcc8d02 i965/vec4: Silence unused parameter warnings in vec4 compiler tests
src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::dst_reg* copy_propagation_vec4_visitor::make_reg_for_system_value(int)’:
src/intel/compiler/test_vec4_copy_propagation.cpp:57:51: warning: unused parameter ‘location’ [-Wunused-parameter]
    virtual dst_reg *make_reg_for_system_value(int location)
                                                   ^~~~~~~~
src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual void copy_propagation_vec4_visitor::emit_urb_write_header(int)’:
src/intel/compiler/test_vec4_copy_propagation.cpp:77:43: warning: unused parameter ‘mrf’ [-Wunused-parameter]
    virtual void emit_urb_write_header(int mrf)
                                           ^~~
src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::vec4_instruction* copy_propagation_vec4_visitor::emit_urb_write_opcode(bool)’:
src/intel/compiler/test_vec4_copy_propagation.cpp:82:57: warning: unused parameter ‘complete’ [-Wunused-parameter]
    virtual vec4_instruction *emit_urb_write_opcode(bool complete)
                                                         ^~~~~~~~
src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::dst_reg* register_coalesce_vec4_visitor::make_reg_for_system_value(int)’:
src/intel/compiler/test_vec4_register_coalesce.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter]
    virtual dst_reg *make_reg_for_system_value(int location)
                                                   ^~~~~~~~
src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual void register_coalesce_vec4_visitor::emit_urb_write_header(int)’:
src/intel/compiler/test_vec4_register_coalesce.cpp:80:43: warning: unused parameter ‘mrf’ [-Wunused-parameter]
    virtual void emit_urb_write_header(int mrf)
                                           ^~~
src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::vec4_instruction* register_coalesce_vec4_visitor::emit_urb_write_opcode(bool)’:
src/intel/compiler/test_vec4_register_coalesce.cpp:85:57: warning: unused parameter ‘complete’ [-Wunused-parameter]
    virtual vec4_instruction *emit_urb_write_opcode(bool complete)
                                                         ^~~~~~~~
src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::dst_reg* cmod_propagation_vec4_visitor::make_reg_for_system_value(int)’:
src/intel/compiler/test_vec4_cmod_propagation.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter]
    virtual dst_reg *make_reg_for_system_value(int location)
                                                   ^~~~~~~~
src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual void cmod_propagation_vec4_visitor::emit_urb_write_header(int)’:
src/intel/compiler/test_vec4_cmod_propagation.cpp:85:43: warning: unused parameter ‘mrf’ [-Wunused-parameter]
    virtual void emit_urb_write_header(int mrf)
                                           ^~~
src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::vec4_instruction* cmod_propagation_vec4_visitor::emit_urb_write_opcode(bool)’:
src/intel/compiler/test_vec4_cmod_propagation.cpp:90:57: warning: unused parameter ‘complete’ [-Wunused-parameter]
    virtual vec4_instruction *emit_urb_write_opcode(bool complete)
                                                         ^~~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Jason Ekstrand
cae373117c anv,radv: Re-enable VK_EXT_pci_bus_info
Now at version 2 with the fixed header.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 10:42:35 -06:00
Jason Ekstrand
11dc130779 nir: Add a bool to int32 lowering pass
We also enable it in all of the NIR drivers.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand
80e8dfe9de nir: Rename Boolean-related opcodes to include 32 in the name
This is a squash of a bunch of individual changes:

    nir/builder: Generate 32-bit bool opcodes transparently

    nir/algebraic: Remap Boolean opcodes to the 32-bit variant

    Use 32-bit opcodes in the NIR producers and optimizations

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

     Use 32-bit opcodes in the NIR back-ends

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Rafael Antognolli
019a92ffa4 intel/genxml: Add register for object preemption.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-12-14 09:40:27 -08:00
Kenneth Graunke
0b44644ca6 genxml: Consistently use a numeric "MOCS" field
When we first started using genxml, we decided to represent MOCS as an
actual structure, and pack values.  However, in many places, it was more
convenient to use a numeric value rather than treating it as a struct,
so we added secondary setters in a bunch of places as well.

We were not entirely consistent, either.  Some places only had one.
Gen6 had both kinds of setters for STATE_BASE_ADDRESS, but newer gens
only had the struct-based setters.  The names were sometimes "Constant
Buffer Object Control State" instead of "Memory", making it harder to
find.  Many had prefixes like "Vertex Buffer MOCS"...in a vertex buffer
packet...which is a bit redundant.

On modern hardware, MOCS is simply an index into a table, but we were
still carrying around the structure with an "Index to MOCS Table" field,
in addition to the direct numeric setters.  This is clunky - we really
just want a number on new hardware.

This patch eliminates the struct-based setters, and makes the numeric
setters be consistently called "MOCS".  We leave the struct definition
around on Gen7-8 for reference purposes, but it is unused.

v2: Drop bonus "Depth Buffer MOCS" fields on Gen7.5 and Gen9

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2018-12-14 00:44:54 -08:00
Eric Anholt
4407e688cd nir: Move intel's half-float image store lowering to to nir_format.h.
I needed the same function for v3d.  This was originally in d3e046e76c
("nir: Pull some of intel's image load/store format conversion to
nir_format.h") before we made am istake about simplifying the function.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-13 12:24:26 -08:00
Eric Anholt
3a417a044e Revert "intel: Simplify the half-float packing in image load/store lowering."
This reverts commit 06fbcd2cd5.
nir_pack_half_2x16_split *isn't* vectorizable, it's 1-component only, thus
why we had this split-scalar code in the first place.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-13 12:24:24 -08:00
Jason Ekstrand
9ebc00f32e i965: Enable nir_opt_idiv_const for 32 and 64-bit integers
The pass should work for all bit sizes but it's less clear that the
extra instructions are worth it on small integers.  Also, the hardware
doesn't do mul_high on anything other than 32-bit integers and, absent
any decent mechanism for testing the pass on 8 and 16-bit types, it's
probably best to just leave it disabled for now.

Shader-db results on Sky Lake:

    total instructions in shared programs: 15105795 -> 15111403 (0.04%)
    instructions in affected programs: 72774 -> 78382 (7.71%)
    helped: 0
    HURT: 265

Note that hurt here actually means helped because we're getting rid of
integer quotient operations (which are a send on some platforms!) and
replacing them with fairly cheap ALU ops.

Reviewed-by: Ian Romanick ian.d.romanick@intel.com
2018-12-13 17:49:48 +00:00
Jason Ekstrand
455ec7327d i965/vec4: Implement nir_op_uadd_sat
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
2018-12-13 17:49:48 +00:00
Ian Romanick
e639d39faf i965/fs: Implement nir_op_uadd_sat
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-13 17:49:48 +00:00
Eric Anholt
06fbcd2cd5 intel: Simplify the half-float packing in image load/store lowering.
This was noted by Jason in review when I tried to make a helper for the
old path.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-12 16:09:48 -08:00
Eric Anholt
d3e046e76c nir: Pull some of intel's image load/store format conversion to nir_format.h
I needed the same functions for v3d.  Note that the color value in the
Intel lowering has already been cut down to image.chans num_components.

v2: Drop the half float one, since it was a 1-liner after cleanup.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-12 16:09:43 -08:00
Jason Ekstrand
5749c0ebc4 intel/blorp: Assert that we don't re-layout a compressed surface
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-12 08:32:32 -06:00