Commit graph

4610 commits

Author SHA1 Message Date
Lionel Landwerlin
da2d67fc3b anv: gem-stubs: return a valid fd got anv_gem_userptr()
Fixes invalid close(-1) in the unit tests.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-09-25 22:02:51 +03:00
Andres Gomez
5e87f48f1d i965/fs: set rounding mode when emitting the flrp instruction
flrp was forgotten when already adding the rounding mode for other
instructions.

Fixes: ba1e25e1aa ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2019-09-24 12:06:59 +03:00
Andres Gomez
6f1468c371 i965/fs: add a comment about how the rounding mode in fmul is set
After
1711bf6cf2 ("intel/fs: Generate better code for fsign multiplied by a value"),
the conflicts resolution for setting the rounding mode after the
fused fmul and fsign optimization is non obvious.

Basically, the optimization doesn't really result in a MUL, or any
other operation which would need to have the rounding mode set. Hence,
we set it just before the actual MUL in the treatment of fmul.

Fixes: ba1e25e1aa ("i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions")
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2019-09-24 11:24:15 +03:00
Kenneth Graunke
b9e93db208 intel: Increase Gen11 compute shader scratch IDs to 64.
From the MEDIA_VFE_STATE docs:

   "Starting with this configuration, the Maximum Number of Threads must
    be set to (#EU * 8) for GPGPU dispatches.

    Although there are only 7 threads per EU in the configuration, the
    FFTID is calculated as if there are 8 threads per EU, which in turn
    requires a larger amount of Scratch Space to be allocated by the
    driver."

It's pretty clear that we need to increase this for scratch address
calculations, because the FFTID has a certain bit-pattern.  The quote
above seems to indicate that we should increase the actual thread count
programmed in MEDIA_VFE_STATE as well, but we think the intention is to
only bump the scratch space.

Fixes GPU hangs in Bioshock Infinite and Synmark's CSDof on Icelake 8x8.

Fixes: 5ac804bd9a ("intel: Add a preliminary device for Ice Lake")
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-09-23 16:59:40 -07:00
Kenneth Graunke
50c0dd8621 Revert "intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM"
This reverts commit 729de1488f.

It turns out that, although the register is in the logical context,
it isn't whitelisted, so we can't actually write it from userspace
batch buffers.  The write just becomes a noop, which is why we saw
no performance changes.

I manually whitelisted it, and still observed no performance gains, but
it did regress KHR-GL46.texture_cube_map_array.color_depth_attachments
on the iris driver.  So we might need to fix something before enabling
this.  To prevent it randomly getting turned on should the kernel ever
whitelist this register, we revert the patch for now.
2019-09-23 16:31:23 -07:00
Kenneth Graunke
8489206e9d intel/genxml: Stop manually scrubbing 'α' -> "alpha"
'α' has never appeared in any genxml files, so there's no need to
replace it with the word "alpha".

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-09-23 20:24:54 +00:00
Kenneth Graunke
aa7ac32976 isl: Drop WaDisableSamplerL2BypassForTextureCompressedFormats on Gen11
Gen11 doesn't require us to bypass the L2 cache for BC* images anymore.

The documentation is a bit hard to follow on this point, but the Windows
driver clearly only applies this workaround on Gen9, and their commit
history indicates that this was an intentional change to drop the
workaround for Gen11+.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-09-20 15:35:17 -07:00
Jason Ekstrand
7d861ab812 anv: Advertise VK_KHR_shader_subgroup_extended_types
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-20 18:02:15 +00:00
Jason Ekstrand
03255da225 intel/fs: Do 8-bit subgroup scan operations in 16 bits
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-20 18:02:15 +00:00
Jason Ekstrand
651725f7a1 intel/fs: Allow CLUSTER_BROADCAST to do type conversion
We can't really handle it in the little-core 64-bit case but it's not
really needed there.  Where we really want this is for when we need to
do 16 -> 8-bit conversions.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-20 18:02:15 +00:00
Jason Ekstrand
3515c0e9cf intel/fs: Allow UB, B, and HF types in brw_nir_reduction_op_identity
Because byte immediates aren't a thing on GEN hardware, we return a
signed or unsigned word immediate in the byte case.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-20 18:02:15 +00:00
Paulo Zanoni
10532c6831 intel/fs: don't forget the stride at generate_shuffle
During generate_shuffle(), when we use byte sized registers we end up
with a destination stride of 2. We don't take the stride into
consideration when selecting the group offset for the last MOV
operation, which means we end up moving things to the wrong place,
leaving the last few channels untouched. Take the destination stride
in consideration so we don't miss the last channels.

v2: Assert this is not necessary for the IVB special case (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-20 10:57:05 -07:00
Jason Ekstrand
dae33052db util/rb_tree: Reverse the order of comparison functions
The new order matches that of the comparison functions accepted by the C
standard library qsort() functions.  Being consistent with qsort will
hopefully help avoid developer confusion.

The only current user of the red-black tree is aub_mem.c which is pretty
easy to fix up.

Reviewed-by: Lionel Landwerlin <lionel.g.lndwerlin@intel.com>
2019-09-20 17:37:25 +00:00
Eric Engestrom
3c1a24de07 anv: implement ICD interface v4
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-09-20 08:31:58 +00:00
Eric Engestrom
19db95e78e anv: split instance dispatch table
This effectively breaks the instance dispatch table in 2 with entry
points using a physical device as first argument getting their own
dispatch table.

As a result we now have to check instance & physical device dispatch
table instead of just the instance dispatch table before.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-09-20 08:31:58 +00:00
Jason Ekstrand
0c4e89ad5b Move blob from compiler/ to util/
There's nothing whatsoever compiler-specific about it other than that's
currently where it's used.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-19 19:56:22 +00:00
Caio Marcelo de Oliveira Filho
fa080f03d3 intel/fs: Add Fall-through comment
Reviewed-by: Andres Gomez <agomez@igalia.com>
2019-09-19 10:02:16 -07:00
Arcady Goldmints-Orlov
5ec5fecc26 anv: fix descriptor limits on gen8
Later generations support bindless for samplers, images, and buffers and
thus per-stage descriptors are not limited by the binding table size.
However, gen8 doesn't support bindless images and thus needs to report a
lower per-stage limit so that all combinations of descriptors that fit
within the advertised limits are reported as supported by
vkGetDescriptorSetLayoutSupport.

Fixes test dEQP-VK.api.maintenance3_check.descriptor_set
Fixes: 79fb0d27f3 ("anv: Implement SSBOs bindings with GPU addresses in the descriptor BO")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-19 09:10:40 -05:00
Paulo Zanoni
8e614c7a29 intel/fs: fix SHADER_OPCODE_CLUSTER_BROADCAST for SIMD32
The current code can create functions with a width of 32, which is not
supported by our hardware. Add some code to simplify how we express
what we want and prevent such cases.

For some unknown reason, all the tests I could run seem to work even
with these unsupported MOVs.

Fixes: b0858c1cc6 "intel/fs: Add a couple of simple helper opcodes"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-19 02:48:27 +00:00
Paulo Zanoni
c99df52873 intel/fs: the maximum supported stride width is 16
There are cases where we try to generate registers with a stride of
32, while the hardware maximum is just 16. This happens, for example,
when using 8 bit integers on SIMD32. This results in a crash because
the variable 'width' has a value of 32:

../../src/intel/compiler/brw_reg.h:550: brw_reg brw_vecn_reg(unsigned
int, brw_reg_file, unsigned int, unsigned int): Assertion `!"Invalid
register width"' failed.

This change prevents the crash and makes the tests pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-19 02:48:27 +00:00
Paulo Zanoni
cebf447d16 intel/fs: roll the loop with the <0,1,0> additions in emit_scan()
IMHO the code is easier to understand this way, being explicit that
we're doing exactly the same thing every time.

No functional changes.

v2: Adjust the loop breaking condition (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-19 02:47:17 +00:00
Paulo Zanoni
d9ddf5076d intel/fs: make scan/reduce work with SIMD32 when it fits 2 registers
When dealing with uint16_t and uint8_t on SIMD32 we can do all the
operations using just 2 registers, so we don't hit the recursion at
the beginning of emit_scan(). Because of that, we need to actually
compute scan/reduce for channels 31:16.

v2: Still missed instructions (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-09-19 02:47:17 +00:00
Kenneth Graunke
0e4a75f917 intel/compiler: Record whether any pull constant loads occur
I would like for iris to be able to avoid setting up SURFACE_STATE
for UBOs in the common case where all constants are pushed.

Unfortunately, we don't know up front whether everything will be
pushed: the backend is allowed to demote pushed UBOs to pull loads
fairly late in the process.  This is probably desirable though, as
we'd like the backend to be able to re-pull pushed data to break up
long live ranges in response to register pressure.

Here we simply add a "are there any pull loads at all" boolean to
prog_data, which is a bit crude but at least allows us to skip work
in the common "everything pushed" case.  We could skip more work by
tracking exactly which UBO surfaces are pulled in a bitmask, but I
wanted to avoid bringing back the old mark_surface_used() mechanism.

Finer-grained tracking could allow us to skip a bit more work when
multiple UBOs are in use and /some/ are 100% pushed, but others are
accessed via pulls.  However, I'm not sure how common this is and
it would save at most 4 pull descriptors, so we defer that for now.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-18 15:44:22 -07:00
Kenneth Graunke
f76a724e06 intel/compiler: Set "Null Render Target" ex_desc bit on Gen11
When there are no color regions (i.e. a depth only pass), we can set
the "Null Render Target" bit in the Gen11 RT write extended message
descriptor to indicate that it should behave as if it's writing to a
null render target, without the need for a binding table entry.

This lets drivers avoid setting up that null RT binding table entry,
but more importantly means the HW doesn't actually have to bother
looking up the surface state.

Together with the next patch, this improves performance in Car Chase on
an Icelake 8x8 (locked to 700Mhz) by 0.0445526% +/- 0.0132736% (n=832).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 14:27:51 -07:00
Samuel Iglesias Gonsálvez
f5dd6dfe01 anv: enable VK_KHR_shader_float_controls and SPV_KHR_float_controls
This adds support for
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT_CONTROLS_PROPERTIES_KHR and
enables de Vulkan and SPIR-V extensions.

Also, notice that this includes the updates applied to the
VkPhysicalDeviceFloatControlsPropertiesKHR structure in the extension
VK_KHR_shader_float_controls v4 and Vulkan 1.1.116.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez
9b07020a4f i965/fs: add support for shader float control to remove_extra_rounding_modes()
The remove_extra_rounding_modes() optimization will remove duplicated
rounding mode changes.

v2:
- Fix bug in the rounding mode change (Alejandro).

v3:
- Fix rounding modes.

v4:
- Updated to renamed shader info member and enum values (Andres).

v5:
- Simplify flags logic operations (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez
9bd88d10d8 i965/fs: set rounding mode when emitting nir_op_f2f32 or nir_op_f2f16
v2:
- Consider nir_op_f2f16 case too (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez
ba1e25e1aa i965/fs: set rounding mode when emitting fadd, fmul and ffma instructions
v2:
- Updated to renamed shader info member (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez
9da56ffc52 i965/fs: add emit_shader_float_controls_execution_mode() and aux functions
We need this function to emit code that setups the control register
later with the defined execution mode for the shader. Therefore, we
emit it as the first instruction.

v2:
- Fix bug in setting the default mode mask in brw_rnd_mode_from_nir().
- Fix support for rounding modes in brw_rnd_mode_from_nir().

v3:
- Updated to renamed shader info member and enum values (Andres).

v4:
- Add actual emission as first instruction of emit_nir_code (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez
8a6507b6fe i965/fs/generator: add new opcode to set float controls modes in control register
Before this commit, we had only FPRoundingMode decoration (the per
instruction one) that is applied during the SPIR-V handling. In
vtn_alu we find out the rounding mode, and generate the code
accordingly that later will be used to look for the respective
nir_op_f2f16_{rtz,rtne}.

Per-instruction gets prioritized because we make them explicit
conversions (with RTZ or RTNE nir opcodes) and they will override the
default execution mode defined with float controls. However, we need
to come back to the mode defined by float controls after the execution
of the FP Rounding instruction.

Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be
used to set the default rounding mode and denorms treatment in the
whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be
used as prioritized rounding mode in a per-instruction basis.

v2:
- Fix bug in defining BRW_CR0_FP_MODE_MASK.

v3:
- Update comment (Caio).

v4:
- Split the patch into the helper and the new opcode (this
  one) (Caio).

v5:
- Add an explanation on the actual purpose and priority of the newly
  introduced opcode in the commit log (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez
28da9558f5 i965/fs/generator: refactor rounding mode helper in preparation for float controls
v2:
- Fix bug in defining BRW_CR0_FP_MODE_MASK.

v3:
- Update comment (Caio).

v4:
- Split the patch into the helper (this one) and the new
  opcode (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:19 +03:00
Samuel Iglesias Gonsálvez
cdace5b0c6 i965/fs/nir: add nir_op_unpack_half_2x16_split_*_flush_to_zero
The denorm mode is set in the control register, no need to do
something else.

v2:
- Add an assert to make sure that we realize if this assumption is
  broken in the future (Caio).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:18 +03:00
Samuel Iglesias Gonsálvez
3c474f8513 intel/nir: do not apply the fsin and fcos trig workarounds for consts
If we have fsin or fcos trigonometric operations with constant values
as inputs, we will multiply the result by 0.99997 in
brw_nir_apply_trig_workarounds, making the result wrong.

Adjusting the rules so they do not apply to const values we let a
later constant fold to deal with it.

v2:
- Do not early constant fold but only apply the trig workaround for
  non constants (Caio).
- Add fixes tag to commit log (Caio).

Fixes: bfd17c76c1 "i965: Port INTEL_PRECISE_TRIG=1 to NIR."
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-17 23:39:18 +03:00
Sergii Romantsov
2bfcf04345 nir/large_constants: pass after lowering copy_deref
v2: by J.Ekstrand suggestion moved lowering of large
    constants after lowering of copy_deref is done.

CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
2019-09-16 11:23:48 +00:00
Lionel Landwerlin
0616b7ac90 vulkan: add vk_x11_strict_image_count option
This option strictly allocate the minImageCount given by the
application at swapchain creation.

This works around application that do not deal with the fact that the
implementation allocates more images than the minimum specified.

v2: Add values in default drirc (Bas)

v3: specify engine name/version (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
2019-09-15 15:37:02 +03:00
Lionel Landwerlin
04dc6074cf driconfig: add a new engine name/version parameter
Vulkan applications can register with the following structure :

typedef struct VkApplicationInfo {
    VkStructureType    sType;
    const void*        pNext;
    const char*        pApplicationName;
    uint32_t           applicationVersion;
    const char*        pEngineName;
    uint32_t           engineVersion;
    uint32_t           apiVersion;
} VkApplicationInfo;

This enables the Vulkan implementations to apply workarounds based off
matching this description.

Here we add a new parameter for matching the driconfig options with
the following :

    <device driver="anv">
        <application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42">
            <option name="blaaah" value="true" />
        </application>
    </device>

v2: switch engine name match to use regexps

v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric)

v4: Add missing bit that went to the following commit (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
2019-09-15 15:37:02 +03:00
Jason Ekstrand
acfa2340e6 intel/fs: Handle UNDEF in split_virtual_grfs
When the UNDEF instruction was added, we didn't do anything special in
split_virtual_grfs.  This mean that anything with an UNDEF wasn't
getting split which causes problems for the compiler.  Among other
things, it makes RA harder because things are in bigger chunks.  It also
meant that dvec4s weren't getting split which means that they are larger
than the maximum register size.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 14959202 -> 14960035 (<.01%)
    instructions in affected programs: 96197 -> 97030 (0.87%)
    helped: 140
    HURT: 128
    helped stats (abs) min: 1 max: 17 x̄: 1.62 x̃: 1
    helped stats (rel) min: 0.09% max: 6.15% x̄: 0.65% x̃: 0.45%
    HURT stats (abs)   min: 1 max: 825 x̄: 8.28 x̃: 1
    HURT stats (rel)   min: 0.13% max: 139.83% x̄: 1.70% x̃: 0.50%
    95% mean confidence interval for instructions value: -2.96 9.18
    95% mean confidence interval for instructions %-change: -0.56% 1.51%
    Inconclusive result (value mean confidence interval includes 0).

    total loops in shared programs: 4372 -> 4372 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 352646771 -> 352840997 (0.06%)
    cycles in affected programs: 218600800 -> 218795026 (0.09%)
    helped: 21167
    HURT: 21411
    helped stats (abs) min: 1 max: 2924 x̄: 36.89 x̃: 10
    helped stats (rel) min: <.01% max: 41.90% x̄: 2.97% x̃: 0.98%
    HURT stats (abs)   min: 1 max: 26027 x̄: 45.54 x̃: 10
    HURT stats (rel)   min: <.01% max: 324.46% x̄: 3.88% x̃: 1.06%
    95% mean confidence interval for cycles value: 2.87 6.26
    95% mean confidence interval for cycles %-change: 0.40% 0.55%
    Cycles are HURT.

    total spills in shared programs: 8840 -> 8953 (1.28%)
    spills in affected programs: 126 -> 239 (89.68%)
    helped: 1
    HURT: 2

    total fills in shared programs: 21782 -> 21914 (0.61%)
    fills in affected programs: 431 -> 563 (30.63%)
    helped: 1
    HURT: 3

    LOST:   0
    GAINED: 5

Shader-db results on Haswell:

    total instructions in shared programs: 13320918 -> 13320769 (<.01%)
    instructions in affected programs: 40998 -> 40849 (-0.36%)
    helped: 146
    HURT: 56
    helped stats (abs) min: 1 max: 8 x̄: 2.73 x̃: 2
    helped stats (rel) min: 0.16% max: 8.60% x̄: 2.52% x̃: 2.22%
    HURT stats (abs)   min: 2 max: 23 x̄: 4.45 x̃: 4
    HURT stats (rel)   min: 0.21% max: 10.26% x̄: 6.83% x̃: 10.26%
    95% mean confidence interval for instructions value: -1.26 -0.21
    95% mean confidence interval for instructions %-change: -0.62% 0.77%
    Inconclusive result (%-change mean confidence interval includes 0).

    total loops in shared programs: 4373 -> 4373 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 374518258 -> 374384193 (-0.04%)
    cycles in affected programs: 231101954 -> 230967889 (-0.06%)
    helped: 21427
    HURT: 19438
    helped stats (abs) min: 1 max: 2035 x̄: 31.09 x̃: 8
    helped stats (rel) min: <.01% max: 40.95% x̄: 2.42% x̃: 0.86%
    HURT stats (abs)   min: 1 max: 20875 x̄: 27.38 x̃: 8
    HURT stats (rel)   min: <.01% max: 59.09% x̄: 2.49% x̃: 0.80%
    95% mean confidence interval for cycles value: -4.49 -2.07
    95% mean confidence interval for cycles %-change: -0.14% -0.04%
    Cycles are helped.

    total spills in shared programs: 23406 -> 23411 (0.02%)
    spills in affected programs: 3 -> 8 (166.67%)
    helped: 0
    HURT: 2

    total fills in shared programs: 34845 -> 34850 (0.01%)
    fills in affected programs: 3 -> 8 (166.67%)
    helped: 0
    HURT: 2

    LOST:   0
    GAINED: 0

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111566
Fixes: f4ef34f207 "intel/fs: Add an UNDEF instruction to avoid..."
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-09-13 04:12:24 +00:00
Anuj Phogat
729de1488f intel/gen11+: Enable Hardware filtering of Semi-Pipelined State in WM
Initial benchmarking didn't show any performance benefits. But it might eventually.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-09-11 11:29:37 -07:00
Anuj Phogat
ee2bde5232 genxml/gen11+: Add COMMON_SLICE_CHICKEN4 register
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-09-11 11:29:37 -07:00
Mauro Rossi
ae5ac26dfa android: anv: libmesa_vulkan_common: add libmesa_util static dependency
Change needed to fix the following building error:

In file included from external/mesa/src/intel/vulkan/anv_device.c:43:
external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found
         ^~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 4dcb1ff ("anv: add support for driconf")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-09-08 20:07:56 +02:00
Jason Ekstrand
34541be7b0 intel/blorp: Use wide formats for nicely aligned stencil clears
In the case where the stencil clear is nicely aligned, we can clear
stencil much more efficiently by mapping it as a wide format (say
RGBA32_UINT) and blasting out the stencil clear value with a repclear.
On Unigine Heaven, this makes one stencil clear go from non-trivial to
unnoticeable when looking at per-draw timings.

In order for this change to work properly, ANV needs to do a bit more
flushing around depth and stencil clears.  i965 and iris already have
the cache tracking logic to handle this so no changes are required
there.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:35:09 +00:00
Jason Ekstrand
d62ca48c31 intel/blorp: Expose surf_fake_interleaved_msaa internally 2019-09-06 23:35:09 +00:00
Jason Ekstrand
caa786e029 intel/blorp: Expose surf_retile_w_to_y internally
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:35:09 +00:00
Jason Ekstrand
a90b1cbe73 blorp: Memset surface info to zero when initializing it
This isn't known to fix any current bugs but it does prevent a
regression in a subsequent commit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:35:09 +00:00
Jason Ekstrand
c15b197d74 intel/tools: Decode PS kernels on SNB
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:35:09 +00:00
Jason Ekstrand
7f5cb5fd6d intel/tools: Decode 3DSTATE_BINDING_TABLE_POINTERS on SNB
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:35:09 +00:00
Eric Engestrom
037b5b567f anv: add support for vk_x11_override_min_image_count
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:16:05 +01:00
Eric Engestrom
4dcb1fff19 anv: add support for driconf
No option is supported yet, this is just the boilerplate.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-09-06 23:16:05 +01:00
Jordan Justen
9790cfcefa
anv,iris: L3ALLOC register replaces L3CNTLREG for gen12
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-09-06 13:11:25 -07:00
Anuj Phogat
414cae0fd6
intel/gen12: Add L3 configurations
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-09-06 13:11:22 -07:00