Commit graph

5243 commits

Author SHA1 Message Date
Rafael Antognolli
5f13996262 intel/gen12+: Disable mid thread preemption.
Fixes a GPU hang in Car Chase.

Cc: mesa-stable@lists.freedesktop.org

v2: Add comment explaining why (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4035>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4035>
2020-03-03 19:52:06 +00:00
Rafael Antognolli
cd40110420 intel/isl: Implement D16_UNORM workarounds.
GEN:BUG:14010455700 (lineage 1808121037):

   "To avoid sporadic corruptions “Set 0x7010[9] when Depth Buffer
   Surface Format is D16_UNORM , surface type is not NULL & 1X_MSAA"

Required for fixing ttps://gitlab.freedesktop.org/mesa/mesa/issues/2501.

GEN:BUG:1806527549:

   "Set HIZ_CHICKEN (7018h) bit 13 = 1 when depth buffer is D16_UNORM."

This one could fix a GPU hang in some workloads.

v2: Implement WA in isl and add another similar WA (Jason).
v3: Add flushes before changing chicken registers (Jason)
v4: Depth flush and stall + end of pipe sync when changing registers
(Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
2020-03-03 16:25:54 +00:00
Paulo Zanoni
62f7197fb5 anv: multiply the scratch space by 4 on gen9-10 like iris and i965
My understanding is that there's no reason for the scratch space
allocation to be different between iris, i965 and anv. Let's make all
the functions behave the same.

I don't know if this fixes any specific gen9 bugs, it it might since
it increases the scratch space.

v2: Rebase.
v3: Rebase.
v4: Remove redundant gen 11 check (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
2020-03-03 00:36:10 +00:00
Paulo Zanoni
aa78801f0a intel/device: bdw_gt1 actually has 6 eus per subslice
Found by inspection, I'm not aware of any bugs caused by this typo.

According to Lionel, it seems we only use this to generate masks
of available EUs for perfromance queries, and it's only used when we
can't query the fused parts of the GPU through DRM_IOCTL_I915_QUERY.
So this patch should help for the corner case where the Kernel is too
old to support the query ioctl.

v2: improve commit message, cc stable (Lionel).

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
2020-03-03 00:36:10 +00:00
Paulo Zanoni
9e5ce30da7 intel: fix the gen 12 compute shader scratch IDs
This is the same idea as "intel: fix the gen 11 compute shader scratch
IDs".

The number of EUs on TGL is not the same as ICL, but the
MEDIA_VFE_STATE restrictions stay the same, so adapt the code to it.
Also, consider the base configuration instead of what we read from the
Kernel.

According to Mark, this fixes the following piglit tests on TGL:

    piglit.spec.arb_compute_shader.execution.shared-atomicmax-uint.tglm64
    piglit.spec.arb_compute_shader.execution.shared-atomicmax-int.tglm64
    piglit.spec.intel_shader_atomic_float_minmax.execution.shared-atomicmax-float.tglm64

v2: s/ICL+/Gen11+/ (Jason).

Cc: mesa-stable@lists.freedesktop.org
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
2020-03-03 00:36:10 +00:00
Paulo Zanoni
1efe139cad intel: fix the gen 11 compute shader scratch IDs
Scratch space allocation is based on the number of threads in the base
configuration, and we only have one base configuration for ICL, with 8
subslices.

This fixes an issue with Aztec on Vulkan in a machine with a
configuration that's not the base. The issue looks like a regression
from b9e93db208, but it seems things are broken since forever, just
not easily reproducible.

v2: Reimplement it using the subslices variable. Don't touch TGL.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
2020-03-03 00:36:10 +00:00
Rafael Antognolli
43dc842cb9 anv: Wait for the GPU to be idle before invalidating the aux table.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4005>
2020-03-02 22:28:11 +00:00
Jason Ekstrand
3ca3050de5 anv: Do end-of-pipe sync around MCS/CCS ops instead of CS stall
v2: Do end-of-pipe sync after clear depth stencil too (Jason).
v3: Also do end-of-pipe sync before clear depth stencil too (Jason).

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4005>
2020-03-02 22:28:11 +00:00
Jason Ekstrand
2db471953a anv: Use a proper end-of-pipe sync instead of just CS stall
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4005>
2020-03-02 22:28:11 +00:00
Jason Ekstrand
ac8d412ba3 anv: Use the PIPE_CONTROL instead of bits for the CS stall W/A
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4005>
2020-03-02 22:28:11 +00:00
Lionel Landwerlin
85457e350d intel/tools/dump_gpu: fix getparam values
Don't return the pci_id for all params

Fixes: 76bf38eaf0 ("intel/tools/aub_dump: move aub file initialization to maybe_init()")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3994>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3994>
2020-03-02 08:24:40 +00:00
Jordan Justen
cf12faef61
intel/compiler: Restrict cs_threads to 64
Our current GPGPU_WALKER code only supports up to 64 threads.

On HSW we could use up to 70 and TGL up to 112, but only if the walker
is adjusted so the width does not exceed 64. Work to support this is
in progress.

Previous to this change, we might try to downgrade to SIMD8 if the
SIMD16 shader spilled. Since HSW and TGL have the max number of
threads above 64, we would then try to emit an invalid GPGPU walker
command.

Fixes: 932045061b ("i965/cs: Emit compute shader code and upload programs")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2020-02-28 14:45:43 -08:00
Caio Marcelo de Oliveira Filho
dab7a4d82c anv: Remove unused field urb.total_size
This was used before the URB calculation functions were shared by GL
and Vulkan.  Also drop the substruct for the remaining, `l3_config` is
a good name on its own.

Also-written-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3981>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3981>
2020-02-27 14:45:10 -08:00
Caio Marcelo de Oliveira Filho
811990dc1c anv: Remove unused field xfb_used from anv_pipeline
Since we only use xfb_info for GEN >= 8, make the ifdef cover that
local variable.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3973>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3973>
2020-02-27 10:44:11 -08:00
Jason Ekstrand
349898a967 nir: Drop nir_tex_instr::texture_array_size
It's set by lots of things and we spend a lot of time maintaining it but
no one actually uses the value for anything useful.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3940>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3940>
2020-02-26 18:29:49 +00:00
Matt Turner
cb166aea24 intel/tools: Do not print type/qualifiers/name for c_literal
External tools may wish to choose their own type, qualifiers, and name,
so do not emit our own.

Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
2020-02-25 22:23:38 +00:00
Sagar Ghuge
5feea40889 intel/tools: Allow i965_disasm to disassemble c_literal input type
Added extra argument named 'type' which can be 'bin' (default if
ommited) or 'c_literal' for input type.

Change 'binary-path' argument name to 'input-path'.

v2:
- Use util_dynarray for assembly (Matt Turner)
- Read data in 8 bytes chunk (Matt Turner)
- Fix help option (Akeem Abodunrin)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
2020-02-25 22:23:38 +00:00
Sagar Ghuge
2f83daedb1 intel/tools: Print c_literals 4 byte wide
We already print hex value a byte wide, instead of printing c_literal
byte wide, we can print it 4 byte wide, which gives us 2 different
combinations.

v2: Fix the aliasing issue (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
2020-02-25 22:23:38 +00:00
Sagar Ghuge
0b0e958f4f intel/tools: Add test for state register as source
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
2020-02-25 22:23:38 +00:00
Sagar Ghuge
31c29f4f55 intel/tools: Add test for address register as source
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
2020-02-25 22:23:38 +00:00
Sagar Ghuge
9526e5c359 intel/tools: Set correct address register file and number in i965_asm
We need to use already created brw_reg and set correct file type,
register number and sub register number.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
2020-02-25 22:23:38 +00:00
Sagar Ghuge
87d9e78f26 intel/tools: Handle STATE_REG in typed source operand
Also stop using brw_sr0_reg function as it return new brw_reg, we
already created register, all we have to is just set file, register
number and subnr.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
2020-02-25 22:23:38 +00:00
Sagar Ghuge
2a75e60365 intel/tools: Handle illegal instruction
Allow assembler to handle illegal instruction even though mesa doesn't
use it but might be required at some point in future.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
2020-02-25 22:23:38 +00:00
Jason Ekstrand
5dfd83d7a1 anv: Always enable the data cache
Because we set the needs_data_cache bit from the NIR during compilation,
any time a shader was pulled out of the pipeline cache, we wouldn't set
the bit and the data cache was disabled.  Fortunately, on Gen8+, this
bit is ignored because we always use the ALL section in the L3$ config
instead of separate DC and RO sections.  On Gen7, however, this meant
that we were basically never running with the data cache enabled and our
compute performance was suffering massively because of it.  This commit
improves Geekbench 5 scores on my Haswell GT3 by roughly 330% (no,
that's not a typo).

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3912>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3912>
2020-02-25 20:12:10 +00:00
Lionel Landwerlin
d4e7a11bc3 intel/aub_dump: stub the waits when overriding the device
We don't actually want to wait on anything, just complete submitting
the commands as fast as possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3705>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3705>
2020-02-25 20:56:49 +02:00
Lionel Landwerlin
31461e2379 intel/tools/aub_dump: fix crash when using the default legacy context
When execbuffer->rsvd1 == 0, the legacy context is used. Ensure we
have context created for this.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3705>
2020-02-25 20:56:45 +02:00
Lionel Landwerlin
76bf38eaf0 intel/tools/aub_dump: move aub file initialization to maybe_init()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3705>
2020-02-25 20:56:40 +02:00
Jason Ekstrand
6fbe3f40a9 intel/isl: Add isl_aux_info.c to Makefile.sources
This should fix the Android build.

Fixes: 58d4749e56 "isl: Add a module which manages aux resolves"
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reported-by: Clayton Craft <clayton.a.craft@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3934>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3934>
2020-02-25 00:41:15 +00:00
Rafael Antognolli
9ab0e92cff intel/blorp: Implement GEN:BUG:1605967699.
v2:
 - Update comments and refactor code (Lionel).
 - Only apply workaround to stencil resolves.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3909>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3909>
2020-02-25 00:04:36 +00:00
Caio Marcelo de Oliveira Filho
956e4b2d37 nir, intel: Move use_scoped_memory_barrier to nir_options
This option will be used later by GLSL, so move to a common struct.

Because nir_options is filled in the compiler instead of the Vulkan
driver, fix that up.  GLSL will ignore that for now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913>
2020-02-24 19:12:11 +00:00
Eric Anholt
3e16434acd nir: Move intel's intrinsic_image_coordinate_components() to core nir.
This is a query that both Intel and freedreno need to do.  We can simplify
it a lot with the new glsl_get_sampler_dim_coordinate_components()

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
2020-02-24 18:25:02 +00:00
Nanley Chery
58d4749e56 isl: Add a module which manages aux resolves
Provide a generic interface which manages aux resolves in ISL. The
feature differences between this and what's in iris is:
* Support for media compression. ISL_AUX_USAGE_MC behaves differently
  from many other usages of CCS, so it was useful to implement this
  support upfront, while designing the interfaces.
* Optimizations for full-surface writes. For example, after a
  full-surface write occurs with ISL_AUX_USAGE_CCS_E in the PARTIAL_CLEAR
  state, isl_aux_state_transition_write() returns COMPRESSED_NO_CLEAR
  instead of COMPRESSED_CLEAR.

A performance suggestion for main-surface-invalidating/replacing writes
is given as a comment instead of adding a boolean to
isl_aux_prepare_access(). This avoids extra validation and should be
simple enough for the caller to handle.

v2. Add assertions. (Jason)
v3. Use switches in 2 more functions. (Jason)
    Store aux metadata in a static table. (Jason)
    Change prepare and finish function signatures. (Jason)
    Keep isl_aux_state_transition_* functions separate.
v4. (Jason)
    Assert against resolving in AUX_INVALID.
    Rename aux_info struct to aux_usage_info.
    Drop the justification for each aux_usage_info field.
    Split out the NONE case in write function.
    Restructure tests to more easily confirm coverage.
    Rename access_compressed field to compressed.
    Make write behavior less ambiguous.
v5. (Jason)
    Add more detail above WRITES_RESOLVE_AMBIGUATE.
    Add ISL_AUX_USAGE_MC to WritesResolveAmbiguate.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2957>
2020-02-24 18:00:05 +00:00
Caio Marcelo de Oliveira Filho
89a3856714 anv: Add pipe_state_for_stage() helper
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3911>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3911>
2020-02-21 13:09:44 -08:00
Caio Marcelo de Oliveira Filho
7df5d36078 anv: Use intel_debug_flag_for_shader_stage()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3911>
2020-02-21 13:09:44 -08:00
Ian Romanick
273b8cd1ca intel/fs: Correctly handle multiply of fsign with a source modifier
The other source of the multiply will be interpreted as a uint32_t in an
XOR instruction.  Any source modifiers with either not be interpreted at
all or will be misinterpreted due to the differing types.

If the other operand of the multiplication has a source modifier, just
emit an extra move to resolve the source modifiers.

The negation source modifier problem is difficult to reproduce due to an
algebraic optimization that changes (-a*b) to -(a*b).  However, changes
in MR !1359 push the negations back down.

On Gen7+ it might be possible to do slightly better for an abs() source
modifier by using BFI2 as a glorified copysign().

On Gen8+ it might be possible to do slightly better for a neg() source
modifier by emitting (~a ^ b).

There were no shader-db changes on any Intel platform, so I think we can
deal with that problem when it arises.

See also piglit!224.

Fixes: 06d2c11641 ("intel/fs: Add a scale factor to emit_fsign")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
2020-02-19 23:51:42 +00:00
Chad Versace
d8fe9e045f anv: Drop anv_image.c:get_surface()
It was called exactly once, and even there it returned the wrong surface
in a corner case.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3882>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3882>
2020-02-19 19:41:05 +00:00
Danylo Piliaiev
d5931f285b intel/compiler: Do not qsort zero sized array
../src/intel/compiler/brw_nir_analyze_ubo_ranges.c:316:4: runtime error: null pointer passed as argument 1, which is declared to never be null
    #0 0x7f78f5916611 in brw_nir_analyze_ubo_ranges ../src/intel/compiler/brw_nir_analyze_ubo_ranges.c:316
    #1 0x7f78f255c189 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:97
    #2 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
    #3 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
    #4 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
    #5 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
    #6 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
    #7 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
    #8 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
    #9 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
    #10 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
    #11 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
    #12 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
2020-02-19 12:07:24 +02:00
Danylo Piliaiev
d596795d4d brw_fs: Avoid zero size vla
../src/intel/compiler/brw_fs.cpp:2247:46: runtime error: variable length array bound evaluates to non-positive value 0
    #0 0x7f78f5697678 in fs_visitor::assign_constant_locations() ../src/intel/compiler/brw_fs.cpp:2247
    #1 0x7f78f571d29e in fs_visitor::optimize() ../src/intel/compiler/brw_fs.cpp:7361
    #2 0x7f78f574eb84 in fs_visitor::run_fs(bool, bool) ../src/intel/compiler/brw_fs.cpp:8022
    #3 0x7f78f575641b in brw_compile_fs ../src/intel/compiler/brw_fs.cpp:8408
    #4 0x7f78f255c8e4 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:123
    #5 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
    #6 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
    #7 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
    #8 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
    #9 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
    #10 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
    #11 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
    #12 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
    #13 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
    #14 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
    #15 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
2020-02-19 12:07:24 +02:00
Danylo Piliaiev
d4e395a27d brw_nir: Cast bitshift to unsigned
../src/intel/compiler/brw_nir.c:979:40: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
    #0 0x7f78f590d10b in brw_nir_apply_sampler_key ../src/intel/compiler/brw_nir.c:979
    #1 0x7f78f590e07b in brw_nir_apply_key ../src/intel/compiler/brw_nir.c:1057
    #2 0x7f78f5754b45 in brw_compile_fs ../src/intel/compiler/brw_fs.cpp:8347
    #3 0x7f78f255c8e4 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:123
    #4 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
    #5 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
    #6 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
    #7 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
    #8 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
    #9 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
    #10 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
    #11 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
    #12 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
    #13 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
    #14 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
2020-02-19 12:07:24 +02:00
Caio Marcelo de Oliveira Filho
79788b8f7f intel/gen12: Take into account opcode when decoding SWSB
The interpretation of the fields is different depending whether the
instruction is a SEND/MATH or not.

This fixes the disassembly output for non-SEND/MATH instructions that
have both in-order and out-of-order dependencies.  Their dependencies
were wrongly represented as `@A $B` when the correct would be `@A
$B.dst`.

Fixes: 6154cdf924 ("intel/eu/gen12: Add auxiliary type to represent SWSB information during codegen.")
Fixes: 83612c0127 ("intel/disasm/gen12: Disassemble software scoreboard information.")
Acked-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
2020-02-18 09:17:51 -08:00
Caio Marcelo de Oliveira Filho
8004cb256a anv: Advertise VK_KHR_shader_non_semantic_info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3856>
2020-02-18 09:57:15 -06:00
Francisco Jerez
8d3b86e34a intel/fs/gen7+: Implement discard/demote for SIMD32 programs.
At this point this simply involves fixing the initialization of the
sample mask flag register to take the right dispatch mask from the
thread payload, and fixing sample_mask_reg() to return f1.1 for the
second half of a SIMD32 thread.  This improves Manhattan 3.1
performance by 2.4%±0.31% (N>40) on my ICL with SIMD32 enabled
relative to falling back to SIMD16 for the shaders that use discard.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:49 -08:00
Francisco Jerez
04c7d3d4b1 intel/fs: Return consistent UW types from sample_mask_reg() in fragment shaders.
In SIMD32 programs that don't use discard, the upper 16 bits of the UD
result of sample_mask_reg() don't contain the sample mask of the upper
16 channels as one would expect.  Stop pretending we are returning a
valid 32-bit mask.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:49 -08:00
Francisco Jerez
1c6853a9be intel/fs: Refactor predication on sample mask into helper function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:48 -08:00
Francisco Jerez
a792e11f5c intel/fs/gen7+: Swap sample mask flag register and FIND_LIVE_CHANNEL temporary.
FIND_LIVE_CHANNEL was using f1.0-f1.1 as temporary flag register on
Gen7, instead use f0.0-f0.1.  In order to avoid collision with the
discard sample mask, move the latter to f1.0-f1.1.  This makes room
for keeping track of the sample mask of the second half of SIMD32
programs that use discard.

Note that some MOVs of the sample mask into f1.0 become redundant now
in lower_surface_logical_send() and lower_a64_logical_send().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>x
2020-02-14 14:31:48 -08:00
Francisco Jerez
083fd96a97 intel/fs: Use helper for discard sample mask flag subregister number.
Use it instead of hard-coding f0.1 for the sample mask of programs
that use discard.  This will make the task easier when we replace f0.1
with another flag register location in order to support discard with
SIMD32 shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:48 -08:00
Francisco Jerez
a6bc11a789 intel/fs: Make sample_mask_reg() local to brw_fs.cpp and use it in more places.
It's only really useful there.  This will avoid confusion with another
helper with a similar purpose I'm about to introduce that will be
useful in multiple files from the FS back-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:48 -08:00
Francisco Jerez
b84fa0b31e intel/fs/gen11: Work around dual-source blending hangs in combination with SIMD32.
The SIMD8 dual-source blending framebuffer write messages seem to have
trouble releasing the pixel scoreboard dependency in SIMD32 dispatch
mode, which leads to hangs.  I have a better workaround for this which
doesn't involve disabling SIMD32 when dual-source blending is enabled,
but I'm still investigating some issues with it.  Limit the dispatch
width to SIMD16 in such cases for the moment in order to make the CI
happy on ICL with SIMD32 fragment shaders enabled.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:48 -08:00
Francisco Jerez
57dee58c82 intel/fs: Set src0 alpha present bit in header when provided in message payload.
Currently the "Source0 Alpha Present to RenderTarget" bit of the RT
write message header is derived from brw_wm_prog_data::replicate_alpha.
However the src0_alpha payload is provided anytime it's specified to
the logical message.  This could theoretically lead to an
inconsistency if somebody provided a src0_alpha value while
brw_wm_prog_data::replicate_alpha was false, as I'm planning to do in
a future commit in order to implement a hardware workaround.

Instead calculate the header bit based on whether a src0_alpha value
was provided to the logical message, which guarantees the same
behavior on pre-ICL and ICL+ (the latter used an extended descriptor
bit for this which didn't suffer from the same issue).  Remove the
brw_wm_prog_data::replicate_alpha flag.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:48 -08:00
Francisco Jerez
e14529ff32 intel/fs/gen12: Workaround data coherency issues due to broken NoMask control flow.
Together with the fixup_nomask_control_flow() pass introduced in a
previous patch, this implements a less invasive alternative to the
workaround documented in the hardware spec for GEN:BUG:1407528679,
which doesn't involve disabling structured control flow.

Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down.  This could break assumptions of the SWSB pass
if the data computed by a NoMask instruction is synchronized against
by using an SWSB annotation baked into a regular execution-masked
instruction, since the first (NoMask) instruction may be executed
redundantly by the hardware, even though the second will correctly be
shot down, potentially leading to a RaW or WaW hazard if a third
instruction subsequently accesses the destination register of the
first instruction.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
2020-02-14 14:31:48 -08:00