Commit graph

593 commits

Author SHA1 Message Date
Kenneth Graunke
d4a4384b31 iris: Implement INTEL_DEBUG=pc for pipe control logging.
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush.  That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on.  It should make it easier to determine whether we're doing
too many flushes and why.
2019-06-20 13:32:15 -05:00
Caio Marcelo de Oliveira Filho
f346b277d1 iris: Create binding table slot for num_work_groups only when needed
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-11 17:57:37 -07:00
Caio Marcelo de Oliveira Filho
045aeccf0e iris: Always reserve binding table space for NIR constants
Don't have a separate mechanism for NIR constants to be removed from
the table.  If unused, we will compact it away.  The use_null_surface
is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
97cd865be2 iris: Compact binding tables
Change the iris_binding_table to keep track of what surfaces are
actually going to be used, then assign binding table indices just for
those.  Reducing unused bytes on those are valuable because we use a
reduced space for those tables in Iris.

The rest of the driver can go from "group indices" (i.e. UBO #2) to
BTI and vice-versa using helper functions.  The value
IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is
not used or a certain BTI is not valid.

The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be
set to skip compacting binding table.

v2: (all from Ken)
    Use BITFIELD64_MASK helper. Improve comments.
    Assert all group is marked as used when we have indirects.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
79f1529ae0 iris: Create an enum for the surface groups
This will make convenient to handle compacting and printing the
binding table.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho
1c8ea8b300 iris: Handle binding table in the driver
Stop using brw_compiler to lower the final binding table indices for
surface access.  This is done by simply not setting the
'prog_data->binding_table.*_start' fields.  Then make the driver
perform this lowering.

This is a better place to perfom the binding table assignments, since
the driver has more information and will also later consume those
assignments to upload resources.

This also prepares us for two changes: use ibc without having to
implement binding table logic there; and remove unused entries from
the binding table.

Since the `block` field in brw_ubo_range now refers to the final
binding table index, we need to adjust it before using to index
shs->constbuf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-03 14:14:45 -07:00
Jason Ekstrand
e459d6d6df iris: Enable nir_opt_large_constants
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15306230 -> 15304726 (<.01%)
    instructions in affected programs: 4570 -> 3066 (-32.91%)
    helped: 16
    HURT: 0

    total cycles in shared programs: 361703436 -> 361680041 (<.01%)
    cycles in affected programs: 129388 -> 105993 (-18.08%)
    helped: 16
    HURT: 0

    LOST:   0
    GAINED: 2

The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal
Space Program

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-29 21:09:16 +00:00
Jason Ekstrand
744f93f5c1 iris: Move upload_ubo_ssbo_surf_state to iris_program.c
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-29 21:09:16 +00:00
Kenneth Graunke
7d2b54e393 iris: Record state sizes for INTEL_DEBUG=bat decoding.
Felix noticed a crash when using INTEL_DEBUG=bat decoding.  It turned
out that we were sometimes placing variable length data near the end
of a buffer, and with the decoder guessing random lengths rather than
having an actual count, it was walking off the end and crashing.  So
this does more than improve the decoder output.

Unfortunately, this is a bit more complicated than i965's handling,
because we don't have a single state buffer.  Various places upload
data via u_upload_mgr, and so there isn't a central place to record
the size.  We don't need to catch every single place, however, since
it's only important to record variable length packets (like viewports
and binding tables).

State data also lives arbitrarily long, rather than being discarded on
every batch like i965, so we don't know when to clear out old entries
either.  (We also don't have a callback when an upload buffer is
released.)  So, this tracking may space leak over time.  That's probably
okay though, as this is only a debugging feature and it's a slow leak.
We may also get lucky and overwrite existing entries as we reuse BOs,
though I find this unlikely to happen.

The fact that the decoder works in terms of offsets from a state base
address is also not ideal, as dynamic state base address and surface
state base address differ for iris.  However, because dynamic state
addresses start from the top of a 4GB region, and binding tables start
from addresses [0, 64K), it's highly unlikely that we'll get overlap.

We can always improve this, but for now it's better than what we had.
2019-05-23 08:07:08 -07:00
Kenneth Graunke
646924cfa1 intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8
Our tessellation control shaders can be dispatched in several modes.

- SINGLE_PATCH (Gen7+) processes a single patch per thread, with each
  channel corresponding to a different patch vertex.  PATCHLIST_N will
  launch (N / 8) threads.  If N is less than 8, some channels will be
  disabled, leaving some untapped hardware capabilities.  Conditionals
  based on gl_InvocationID are non-uniform, which means that they'll
  often have to execute both paths.  However, if there are fewer than
  8 vertices, all invocations will happen within a single thread, so
  barriers can become no-ops, which is nice.  We also burn a maximum
  of 4 registers for ICP handles, so we can compile without regard for
  the value of N.  It also works in all cases.

- DUAL_PATCH mode processes up to two patches at a time, where the first
  four channels come from patch 1, and the second group of four come
  from patch 2.  This tries to provide better EU utilization for small
  patches (N <= 4).  It cannot be used in all cases.

- 8_PATCH mode processes 8 patches at a time, with a thread launched per
  vertex in the patch.  Each channel corresponds to the same vertex, but
  in each of the 8 patches.  This utilizes all channels even for small
  patches.  It also makes conditions on gl_InvocationID uniform, leading
  to proper jumps.  Barriers, unfortunately, become real.  Worse, for
  PATCHLIST_N, the thread payload burns N registers for ICP handles.
  This can burn up to 32 registers, or 1/4 of our register file, for
  URB handles.  For Vulkan (and DX), we know the number of vertices at
  compile time, so we can limit the amount of waste.  In GL, the patch
  dimension is dynamic state, so we either would have to waste all 32
  (not reasonable) or guess (badly) and recompile.  This is unfortunate.
  Because we can only spawn 16 thread instances, we can only use this
  mode for PATCHLIST_16 and smaller.  The rest must use SINGLE_PATCH.

This patch implements the new 8_PATCH TCS mode, but leaves us using
SINGLE_PATCH by default.  A new INTEL_DEBUG=tcs8 flag will switch to
using 8_PATCH mode for testing and benchmarking purposes.  We may
want to consider using 8_PATCH mode in Vulkan in some cases.

The data I've seen shows that 8_PATCH mode can be more efficient in
some cases, but SINGLE_PATCH mode (the one we use today) is faster
in other cases.  Ultimately, the TES matters much more than the TCS
for performance, so the decision may not matter much.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-14 13:16:30 -07:00
Illia Iorin
a35269cf44 iris: Implement ARB_indirect_parameters
iris_draw_vbo is divided into two functions to remove unnecessary
operations from the loop. This implementation of ARB_indirect_parameters
takes into account NV_conditional_render by saving MI_PREDICATE_RESULT
at the start of a draw call and restoring it at the end also the result
of NV_conditional_render is taken into account when computing predicates
that limit draw calls for ARB_indirect_parameters in a similar way
to 1952fd8d in ANV.

v2: Optimize indirect draws (suggested by Kenneth Graunke)
v3: (by Kenneth Graunke)
 - Fix an issue where indirect draws wouldn't set patch information
   before updating the compiled TCS.
 - Move some code back to iris_draw_vbo to avoid duplicating it.
 - Fix minor indentation issues.

Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-11 23:56:52 -07:00
Kenneth Graunke
72ccefb529 iris: Use full ways for L3 cache setup on Icelake.
Anuj fixed this in i965 and anv, but the fix never landed in iris.
Fixes tessellation corruption on Icelake.  Thanks to Rafael for
bisecting this and tracking it down.

Fixes: d0996d5fab iris: Emit default L3 config for the render pipeline
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-05-10 16:50:14 -07:00
Kenneth Graunke
a232aa5c50 iris: Also handle res->offset for buffer sampler/image views 2019-05-07 13:36:18 -07:00
Mike Blumenkrantz
ddd716e746 iris: support dmabuf imports with offsets
this adds support for imports where the image data begins at an offset
from the start of the buffer, as used in h/x264

fixes kwg/mesa#47

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-07 13:36:08 -07:00
Kenneth Graunke
a032a9665f iris: Enable PIPE_CAP_SURFACE_REINTERPRET_BLOCKS
This makes CompressedTexSubImage from a PBO source do proper GPU
rendering to upload instead of stalling to map the PBO source on
the CPU (then copying it on the CPU).

Thanks Bas Nieuwenhuizen for pointing out that Vulkan includes this
functionality, and to Jason Ekstrand for writing the code I adapted.
Vulkan only supports a single layer, however, and this code tries to
support multiple layers as long as it's miplevel 0.

Improves performance in Sid Meier's Civilization VI:

   Average frame time (ms):         -3.67423% +/- 1.46201% (n=5)
   99th percentile frame time (ms): -5.09910% +/- 3.87874% (n=5)
2019-05-06 09:50:32 -07:00
Kenneth Graunke
5ff5d0a895 iris: Disable dual source blending when shader doesn't handle it
This is a port of Danylo's eca4a6548d
which fixed the hang on i965.  It fixes GPU hangs in his new Piglit
test, arb_blend_func_extended-dual-src-blending-discard-without-src1.

I avoided my own review feedback here, and decided to simply adjust
3DSTATE_PS_BLEND rather than BLEND_STATE_ENTRY[0].  It has never been
clear to me which the hardware uses in every case.  However, whacking
the enable in 3DSTATE_PS_BLEND seems to be sufficient to fix the hang,
and that packet is already dynamic, so it's easy to handle.  I'd rather
avoid making BLEND_STATE_ENTRY[0] dynamic unless I have to.
2019-05-02 21:14:49 -07:00
Rafael Antognolli
cf3cadacdf iris: Update the surface state clear color address when available.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-30 08:31:44 -07:00
Kenneth Graunke
dcfca0af7c iris: Set XY Clipping correctly.
I was setting it based off a pipe_rasterizer_state field that appears
to be entirely dead outside of the draw module respecting it.

I should be setting it when the primitive type reaching the SF is
neither points nor lines.  This is, unfortunately, rather dirty,
as we have to look at the rasterizer state, the geometry shader state,
the tessellation evaluation shader state, and the primitive type...
2019-04-29 10:53:23 -07:00
Kenneth Graunke
6bd4cb920e iris: Fix zeroing of transform feedback offsets in strange cases.
Some of the dEQP.functional.transform_feedback tests end up doing
the following sequence of operations:

   1. BeginTransformFeedback
   2. PauseTransformFeedback
   3. Draw
   4. ResumeTransformFeedback

At step 1, we'd pack 3DSTATE_SO_BUFFER commands saying to zero the
SO_WRITE_OFFSET registers.  At step 2, we disable streamout, so step 3
doesn't bother emitting those commands.  Then, step 4 re-packs new
3DSTATE_SO_BUFFER commands with offset = 0xFFFFFFFF, saying to continue
appending at the existing offset.  This loads the value from the BO as
the offsets - but we never actually zeroed it.

So, just maintain a flag saying "we actually emitted the commands",
and stomp offset back to zero until we emit some.
2019-04-27 01:07:14 -07:00
Kenneth Graunke
529ace7887 iris: Silence unused function warning 2019-04-25 17:33:56 -07:00
Andrii Simiklit
4e9592c5fa iris: make the TFB result visible to others
OpenGL 4.6 Spec:
   "5.3.3 Rules
    .......
    Note: “Updates” via rendering or transform feedback
    are treated consistently with updates via GL commands.
    Once EndTransformFeedback has been issued, any subsequent
    command in the same context that uses the results of the
    transform feedback operation will see the results."

v2: removed a wrong comment
    ( Kenneth Graunke <kenneth@whitecape.org> )

v3: - flush+dirty depends on buffers usage history
    - removed an old hack
    ( Kenneth Graunke <kenneth@whitecape.org> )

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110404
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-25 11:48:04 -07:00
Kenneth Graunke
aa7306b4cf iris: Some tidying for preemption support
Just enable it during init_render_context on Gen10+, and move the
Gen9 state tracking into iris_genx_state so it only exists on Gen9.

Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-04-25 11:26:24 -07:00
Mike Blumenkrantz
7315882023 iris: add preemption support on gen9
this is basically just porting the following two commits to gallium:
d8b50e152a
5c454661c6

resolves kwg/mesa#49

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2019-04-24 14:47:08 -07:00
Mike Blumenkrantz
b53d256db8 iris: add support for INTEL_conservative_rasterization
this hooks up the iris gallium driver to existing mesa bits which handle
the implementation

resolves kwg/mesa#8

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-23 16:36:30 -07:00
Kenneth Graunke
2208d5a683 iris: Fix DrawTransformFeedback math when there's a buffer offset
We need to subtract the starting offset from the final offset before
dividing by the stride.  See src/intel/vulkan/genX_cmd_buffer.c:3142.

Not known to fix anything.
2019-04-23 15:57:07 -07:00
Kenneth Graunke
77449d7c41 iris: Track valid data range and infer unsynchronized mappings.
Applications frequently call glBufferSubData() to consecutive regions
of a VBO to append new vertex data.  If no data exists there yet, we
can promote these to unsynchronized writes, even if the buffer is busy,
since the GPU can't be doing anything useful with undefined content.
This can avoid a bunch of unnecessary blitting on the GPU.

u_threaded_context would do this for us, and in fact prohibits us from
doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED).  But we haven't
hooked that up yet, and it may be useful to disable u_threaded_context
when debugging...at which point we'd still want this optimization.  At
the very least, it would let us measure the benefit of threading
independently from this optimization.  And it's not a lot of code.

Removes most stall avoidance blits in "Total War: WARHAMMER."

On my Skylake GT4e at 1920x1080, this appears to improve performance
in games by the following (but I did not do many runs for proper
statistics gathering):

   ----------------------------------------------
   | DiRT Rally        | +2% (avg) | + 2% (max) |
   | Bioshock Infinite | +3% (avg) | + 9% (max) |
   | Shadow of Mordor  | +7% (avg) | +20% (max) |
   ----------------------------------------------
2019-04-23 00:24:08 -07:00
Kenneth Graunke
5ad0c88dbe iris: Replace buffer backing storage and rebind to update addresses.
This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(),
as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag.  When either
of these happen, we swap out the backing storage of the buffer for a
new idle BO, allowing us to write to it immediately without stalling
or queueing a blit.

On my Skylake GT4e at 1920x1080, this improves performance in games:

   -----------------------------------------------
   | DiRT Rally        | +25% (avg) | +17% (max) |
   | Bioshock Infinite | +22% (avg) | +11% (max) |
   | Shadow of Mordor  | +27% (avg) | +83% (max) |
   -----------------------------------------------
2019-04-23 00:24:08 -07:00
Kenneth Graunke
b45dff1da8 iris: Rework image views to store pipe_image_view.
This will be useful when rebinding images.
2019-04-23 00:24:08 -07:00
Kenneth Graunke
2f60850a3f iris: Rework UBOs and SSBOs to use pipe_shader_buffer
This unifies a bunch of the UBO and SSBO code to use common structures.
Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size,
which can be useful when filling out the surface state.
2019-04-23 00:24:08 -07:00
Kenneth Graunke
00d4019676 iris: Track bound constant buffers
This helps avoid having to iterate over [0, PIPE_MAX_CONSTANT_BUFFERS)
looking to see if any resources are bound.
2019-04-23 00:24:08 -07:00
Kenneth Graunke
1566054459 iris: Track bound and writable SSBOs
Marek recently extended pipe->set_shader_buffers() to take an extra
writable_bitmask parameter, indicating which SSBOs are writable (some
may be bound read-only).  We can use this to decide whether to set
EXEC_OBJECT_WRITE when pinning.  Avoiding the write flag can save us
some cross-batch flushing if the SSBO is used for reading in both the
render and compute engines.
2019-04-22 11:31:14 -07:00
Kenneth Graunke
36478b9f77 iris: Enable the dual_color_blend_by_location driconf option.
This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.
2019-04-22 09:36:36 -07:00
Lionel Landwerlin
d1be67db39 iris: implement WaEnableStateCacheRedirectToCS
This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-04-18 17:43:08 +01:00
Kenneth Graunke
8bf9b7b5b6 intel: Emit 3DSTATE_VF_STATISTICS dynamically
Pipeline statistics queries should not count BLORP's rectangles.

    (23) How do operations like Clear, TexSubImage, etc. affect the
         results of the newly introduced queries?

      DISCUSSION: Implementations might require "helper" rendering
      commands be issued to implement certain operations like Clear,
      TexSubImage, etc.

      RESOLVED: They don't. Only application submitted rendering
      commands should have an effect on the results of the queries.

Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when
the driver is hacked to always perform glBufferData via a GPU staging
copy (for debugging purposes).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-04-14 19:58:04 -07:00
Kenneth Graunke
4fcb749044 iris: Actually pin the scratch BO.
We were pinning it for compute shaders, and pinning it when restoring
saved buffers, but we never actually pinned it in the original batch
for VS/TCS/TES/GS/FS.

Fixes rendering in GFXBench5's Tessellation demo and a bunch of Piglit
geometry shader tests.
2019-04-11 15:03:27 -07:00
Marek Olšák
66a82ec6f0 gallium: add writable_bitmask parameter into set_shader_buffers
to indicate write usage per buffer.
This is just a hint (it will be used by radeonsi).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-04-04 19:28:52 -04:00
Danylo Piliaiev
b19494c54e iris: Fix assert when using vertex attrib without buffer binding
The GL 4.5 spec says:
 "If any enabled array’s buffer binding is zero when DrawArrays or
  one of the other drawing commands defined in section 10.4 is called,
  the result is undefined."

The result is undefined but it should not crash.

Fixes: gl-3.1-vao-broken-attrib
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-04 22:57:24 +00:00
Tapani Pälli
41f76dd513 iris: move variable to the scope where it is being used
iris_upload_border_color is passed a pointer which points to
variable that is introduced in a different scope.

CID: 1444296
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-04 04:43:20 +00:00
Rafael Antognolli
7339660e80 iris: Add aux.sampler_usages.
We want to skip some types of aux usages (for instance,
ISL_AUX_USAGE_HIZ when the hardware doesn't support it, or when we have
multisampling) when sampling from the surface.

Instead of checking for those cases while filling the surface state and
leaving it blank, let's have a version of aux.possible_usages for
sampling. This way we can also avoid allocating surface state for the
cases we don't use.

Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-02 15:26:45 -07:00
Rafael Antognolli
dfc5620a41 iris: Do not allocate clear_color_bo for gen8.
Since we are not using it for the clear color, there's no need to
allocate it.

Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-02 15:26:41 -07:00
Rafael Antognolli
2660667284 iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.
The swizzle for rendering surfaces is always identity. So when we are
doing the fast clear, we don't have enough information to store the
clear color OR'ed with the Shader Channel Select bits for the dword in
the SURFACE_STATE.

Instead of trying to patch up the SURFACE_STATE correctly later, by
reading the color from the clear color state buffer and then doing all
the operations to store it, let's just re-emit the whole SURFACE_STATE.
That should make things way simpler on gen8, and we can still use the
clear color state buffer for gen9+.

Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-02 15:26:33 -07:00
Rafael Antognolli
6a02873687 iris: Only update clear color for gens 8 and 9.
Newer gens can read it directly.

Also properly skip updating the ISL_AUX_USAGE_NONE surface.

Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-02 15:24:15 -07:00
Rob Clark
b2d651b862 iris: fix set_sampler_view
Update to match docs.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-30 13:05:31 -04:00
Anuj Phogat
9c421d6b47 iris/icl: Add WA_2204188704 to disable pixel shader panic dispatch
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 19:59:59 +00:00
Anuj Phogat
e0f4359ec1 iris/icl: Set Enabled Texel Offset Precision Fix bit
h/w specification requires this bit to be always set.
See Mesa commit 5eb173304b.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 19:59:59 +00:00
Danylo Piliaiev
c8abe03f3b i965,iris,anv: Make alpha to coverage work with sample mask
From "Alpha Coverage" section of SKL PRM Volume 7:
 "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in
  hardware, regardless of the state setting for this feature."

From OpenGL spec 4.6, "15.2 Shader Execution":
 "The built-in integer array gl_SampleMask can be used to change
 the sample coverage for a fragment from within the shader."

From OpenGL spec 4.6, "17.3.1 Alpha To Coverage":
 "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value
  is generated where each bit is determined by the alpha value at the
  corresponding sample location. The temporary coverage value is then
  ANDed with the fragment coverage value to generate a new fragment
  coverage value."

Similar wording could be found in Vulkan spec 1.1.100
"25.6. Multisample Coverage"

Thus we need to compute alpha to coverage dithering manually in shader
and replace sample mask store with the bitwise-AND of sample mask and
alpha to coverage dithering.

The following formula is used to compute final sample mask:
  m = int(16.0 * clamp(src0_alpha, 0.0, 1.0))
  dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) |
     0x0808 * (m & 2) | 0x0100 * (m & 1)
  sample_mask = sample_mask & dither_mask
Credits to Francisco Jerez <currojerez@riseup.net> for creating it.

It gives a number of ones proportional to the alpha for 2, 4, 8 or 16
least significant bits of the result.

GEN6 hardware does not have issue with simultaneous usage of sample mask
and alpha to coverage however due to the wrong sending order of oMask
and src0_alpha it is still affected by it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-03-25 13:54:55 -07:00
Chris Wilson
db99d02fce iris: Push heavy memchecker code to DEBUG
Invoking VALGRIND_CHECK_MEM_IS_DEFINED pulls in enough code to convince
gcc to not inline __gen_uint and results in a lot of packing code ending
up out-of-line with lots of stack copying. To ameliorate this, only
insert the check inside the packer if DEBUG is defined and instead
perform the validation checking before submitting the batch to the
kernel. This should give accurate results if --trace-origins=yes is
used, and failing that we can recompile in full debug mode to check on
insertion.

Improve drawoverhead baseline by 25% with a default build with
valgrind-dev installed (with effectively no loss of vg coverage).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-22 10:38:03 -07:00
Kenneth Graunke
66c100a8d6 iris: Skip resolves and flushes altogether if unnecessary
Improves drawoverhead baseline scores by 1.17x.
2019-03-21 20:28:17 -07:00
Rafael Antognolli
93123417dd iris: Track fast clear color.
v2: Update tracked clear color when we update the surface state.
v3: Update all aux surface states when updating the clear color.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:26 -07:00
Rafael Antognolli
131b42f0aa iris: Implement fast clear color.
If all the restrictions are satisfied, do a fast clear instead of
regular clear.

v2:
 - add perf_debug() when we can't fast clear (Ken)
 - improve comment: s/miptree/resource/ (Ken)
 - use swizzle_color_value from blorp (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-20 16:46:25 -07:00