Commit graph

91 commits

Author SHA1 Message Date
Caio Oliveira
5b6987daee intel/compiler: Create and use struct for GS thread payload
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
0ca65b3c4c intel/compiler: Create and use struct for VS thread payload
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
19c6e1b447 intel/compiler: Create and use struct for TES thread payload
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Caio Oliveira
73920b7e2f intel/compiler: Use FS thread payload only for FS
Move the setup into the FS thread payload constructor.  Consolidate
payload setup for that in brw_fs_thread_payload.cpp file.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00
Tapani Pälli
40c2e0a317 intel/compiler: fix assert from ver to verx10
Fixes: 027b8b4249 ("intel/compiler: Add helper for barrier message payload setup for gfx >= 125")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18546>
2022-09-12 19:03:17 +00:00
Caio Oliveira
027b8b4249 intel/compiler: Add helper for barrier message payload setup for gfx >= 125
CS-like and TCS control barriers converged in gfx >= 125, so use a
common helper for the message payload setup.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18362>
2022-09-09 09:35:08 -07:00
Caio Oliveira
55db3aaa3a intel/compiler: Create fs_visitor::emit_tcs_barrier()
Allow us to implement this in brw_fs_visitor.cpp, which then will
let us deduplicate code between the CS-like barrier and the TCS
barrier in a later patch.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18362>
2022-09-09 09:35:08 -07:00
Marcin Ślusarz
7ebae85955 intel/compiler: insert URB fence before task/mesh termination
Bspec 53421 says:
"A URB fence memory is typically performed prior the thread
exit message, so that the next thread dispatch that reads
that URB memory will see it."

Cc: 22.1 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16665>
2022-08-02 09:31:24 +00:00
Ian Romanick
f7f232385f intel/fs: Use canonical form for "work around" tags
Trivial.  Also clean up some weird whitespace.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick
377246318a intel/fs: Eliminate "masked" and "per slot offset" URB messages
All of this information can be inferred from the sources.

v2: Fix "error: unused variable 'opcode'" detected by marge-bot.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:19 +00:00
Ian Romanick
349a040f68 intel/fs: Make logical URB write instructions more like other logical instructions
The changes to fs_visitor::validate() helped track down a place where I
initially forgot to convert a message to the new sources layout.  This
had caused a different validation failure in
dEQP-GLES31.functional.tessellation.tesscoord.triangles_equal_spacing,
but this were not detected until after SENDs were lowered.

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951145 -> 19951133 (<.01%)
instructions in affected programs: 2429 -> 2417 (-0.49%)
helped: 8 / HURT: 0

total cycles in shared programs: 858904152 -> 858862331 (<.01%)
cycles in affected programs: 5702652 -> 5660831 (-0.73%)
helped: 2138 / HURT: 1255

Broadwell
total cycles in shared programs: 904869459 -> 904835501 (<.01%)
cycles in affected programs: 7686744 -> 7652786 (-0.44%)
helped: 2861 / HURT: 2050

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442032 (-0.0%)
Instructions helped: 337

Cycles in all programs: 9099270231 -> 9099036492 (-0.0%)
Cycles helped: 40661
Cycles hurt: 28606

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
2022-07-26 17:25:18 +00:00
Ian Romanick
a477587b4a intel/fs: Add _LOGICAL versions of URB messages
The lowering is currently fake.  It just changes the opcode from the
_LOGICAL version to the non-_LOGICAL version.

v2: Remove some rebase cruft.  's/gfx8_//;s/simd8_/' in
brw_instruction_name.  Both suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
2022-07-08 19:45:34 +00:00
Kenneth Graunke
9886615958 intel/compiler: Move spill/fill tracking to the register allocator
Originally, we had virtual opcodes for scratch access, and let the
generator count spills/fills separately from other sends.  Later, we
started using the generic SHADER_OPCODE_SEND for spills/fills on some
generations of hardware, and simply detected stateless messages there.

But then we started using stateless messages for other things:
- anv uses stateless messages for the buffer device address feature.
- nir_opt_large_constants generates stateless messages.
- XeHP curbe setup can generate stateless messages.

So counting stateless messages is not accurate.  Instead, we move the
spill/fill accounting to the register allocator, as it generates such
things, as well as the load/store_scratch intrinsic handling, as those
are basically spill/fills, just at a higher level.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>
2022-05-25 06:56:01 +00:00
Lionel Landwerlin
0cd93c59ef intel/compiler: add primitive rate output support
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>
2022-02-02 17:09:46 +00:00
Jason Ekstrand
a1de102479 intel/fs: Use compare_func for wm_prog_key::alpha_test_func
Because 0 is no longer a recognizable value (it's NEVER, which isn't a
good default), we add an emit_alpha_test bool to tell the back-end when
to bother alpha testing.  This lets us only touch crocus with the
change.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14157>
2022-01-14 15:08:09 +00:00
Dave Airlie
e12b0d0d60 intel/compiler: remove gfx6 gather wa from backend.
Crocus lowers this in the frontend, they key member is still used
but reset prior to backend.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>
2021-12-22 21:37:55 +00:00
Jason Ekstrand
4fa58d27a5 intel/fs,vec4: Drop support for shader time
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Jason Ekstrand
8f3c100d61 intel/fs,vec4: Drop uniform compaction and pull constant support
The only driver using these was i965 and it's gone now.  This is all
dead code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Caio Oliveira
be89ea3231 intel/compiler: Handle per-primitive inputs in FS
In Fragment Shader, regular inputs are laid out in the thread payload
in a one dword per each half-GRF, that gives room for having the two
delta dwords needed for interpolation.

Per-primitive inputs are laid out before the regular inputs, and since
there's no need to have delta information, they are packed.  So
half-GRF will be fully filled with 4 dwords of input.

When num_per_primitive_inputs is zero (the default case), behavior
should be the same as before.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
858424bd2e intel/compiler: Use gl_shader_stage_uses_workgroup() helpers
Instead of checking for MESA_SHADER_COMPUTE (and KERNEL).  Where
appropriate, also use gl_shader_stage_is_compute().

This allows most of the workgroup-related lowering to be applied to
Task and Mesh shaders.  These will be added later and "inherit" from
cs_prog_data structure.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13629>
2021-11-03 11:09:48 -07:00
Jason Ekstrand
31fdd26d01 intel/compiler: Add unified barrier support for CS
Program CS barrier message fields for producers/consumers.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11963>
2021-08-24 01:31:48 +00:00
Jason Ekstrand
f5e58838c2 intel/fs: Handle non-perspective-correct interpolation on gen4-5
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11125>
2021-06-03 02:36:17 +00:00
Jason Ekstrand
3f36e027d3 intel/fs: Don't use pixel_z for Gen4-5 source_depth_to_render_target
The source_depth_to_render_target flag can get set on old gen4-5 HW in a
few cases which are independent of the app writing gl_FragDepth.  It
should be safe to just use fetch_payload_reg in that case instead of
depending in interpolation setup.  This fixes a bug with certain very
simple shaders where we might end up not including the depth when we
should have.

While we're here, rework the logic around setting src_depth and add a
comment so it's more clear what's going on.

Fixes: 6d4070f3dd "intel/compiler: add support for fragment coordinate..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10596>
2021-05-03 23:51:51 +00:00
Lionel Landwerlin
81f369c93b intel/compiler: add coarse pixel offset on Gfx12.5+
Gfx12.5 has a slightly different code path.

v2: Document the oddness

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Lionel Landwerlin
6d4070f3dd intel/compiler: add support for fragment coordinate with coarse pixels
v2: Drop new internal opcodes (Jason)
    Simplify code (Jason)

v3: Add Z computation for coarse pixels

v4: Document things a little

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Francisco Jerez
a2572a9da4 intel/fs: Add more efficient fragment coordinate calculation.
The PIXEL_X/Y opcodes used by the current implementation are broken on
XeHP due to the new regioning restrictions of the floating-point pipe.
We could have the regioning lowering pass fix it in theory by lowering
the conversions into separate MOV instructions, but that would be more
costly than this implementation that only needs a pair of pipelined
ADDs and a pair of pipelined MOVs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
2021-04-16 08:27:35 +00:00
Yevhenii Kharchenko
edd12acbec intel/compiler: remove unused member 'input_vue_map'
v2: Instead of fixing unitialized member 'fs_visitor::input_vue_map'
(as reported by Coverity Scan in defect CID 1474559),
remove unused members 'vec4_tcs_visitor::input_vue_map' and
'fs_visitor::input_vue_map'.

Also fixed 'debug_enabled' argument skipped in a fs_visitor constructor
call from brw_compile_tes().

Signed-off-by: Yevhenii Kharchenko <yevhenii.kharchenko@globallogic.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10040>
2021-04-08 18:20:10 +00:00
Anuj Phogat
1d296484b4 intel: Rename Genx keyword to Gfxx
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "Gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/Gen\([[:digit:]]\+\)/Gfx\1/g"

Exclude changes in src/intel/perf/oa-*.xml:
find src/intel/perf -type f \( -name "*.xml" \) | xargs sed -ie "s/Gfx/Gen/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
b75f095bc7 intel: Rename genx keyword to gfxx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/gen\([[:digit:]]\+\)/gfx\1/g"

Exclude pack.h and xml changes in this patch:
grep -E "gfx[[:digit:]]+_pack\.h" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+_pack\.h\)/gen\1/g"
grep -E "gfx[[:digit:]]+\.xml" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+\.xml\)/gen\1/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
c1f3a778de intel: Rename GENx prefix in macros to GFXx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN" -rIl src/intel/genxml | grep -E ".*py" |  xargs sed -ie "s/GEN\([%{]\)/GFX\1/g"
grep -E "[^_]GEN[[:digit:]]+" -rIl $SEARCH_PATH | grep -E ".*(\.c|\.h|\.y|\.l)" | xargs sed -ie "s/\([^_]\)GEN\([[:digit:]]\+\)/\1GFX\2/g"

Leave out renaming GFX12_CCS_E macros. They fall under renaming pattern like "_GEN[[:digit:]]+":
grep -E "GFX12_CCS_E" -rIl $SEARCH_PATH | xargs sed -ie "s/GFX12_CCS_E/GEN12_CCS_E/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
abe9a71a09 intel: Rename gen field in gen_device_info struct to ver
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "info\)*(.|->)gen" -rIl $SEARCH_PATH | xargs sed -ie "s/info\()*\)\(\.\|->\)gen/info\1\2ver/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Caio Marcelo de Oliveira Filho
7fb1e58651 intel/compiler: Make visitors take debug_enabled as a parameter
The callers already have this value, and we would like to make it
follow different rules other than stage that might not be visible to
the helper function, so just pass explicitly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9779>
2021-03-24 23:18:46 +00:00
Jason Ekstrand
7280b0911d intel/compiler: Add support for bindless shaders
The Intel bindless thread dispatch model is very simple.  When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs.  These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned.  Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke.  When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers.  The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
2020-11-25 05:37:09 +00:00
Marcin Ślusarz
e3f6a9ea36 intel: remove dead code
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7353>
2020-11-02 19:58:56 +00:00
Marcin Ślusarz
5ea0b6a9c6 intel/compiler: initialize remaining fields of various classes
These variables seem to be initialized before being used, so this
patch is not fixing any bug, but leaving them unitialized may become
a bug after some refactoring.

These classes were affected: fs_reg_alloc, fs_visitor, fs_generator,
instruction_scheduler.

Found by Coverity.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6667>
2020-09-10 12:16:58 +00:00
Jason Ekstrand
003b04e266 intel/compiler: Allow MESA_SHADER_KERNEL
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>
2020-08-12 10:11:06 +00:00
Matt Turner
8da810a7fb intel/compiler: Don't emit no-op cr0 changes
If mask is 0, we're asking for no changes to cr0.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5566>
2020-07-02 01:24:06 +00:00
Matt Turner
fe14dc98bf intel/compiler: Add assert that set bits are within mask
We generate bitfields of bits that we want to retain (mask) and bits
that we want to set (brw_mode) in the cr0 register, so the bits we want
to set should be in the set of bits we want to retain.

Also, remove the initialization of mask from
fs_visitor::emit_shader_float_controls_execution_mode since
brw_rnd_mode_from_nir initializes the mask parameter unconditionally.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5566>
2020-07-02 01:24:06 +00:00
Danylo Piliaiev
296c04d78c intel/fs: Work around dual-source blending hangs in combination with SIMD16
It was found that dual-source blending hangs with SIMD16 dispatch in some
specific but unknown situation. Which in the wild happen when rgba
anti-aliasing is enabled for fonts.

Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2183
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5037>
2020-05-27 14:35:13 +03:00
Francisco Jerez
188a3659ae intel/ir: Import shader performance analysis pass.
This introduces an analysis pass intended to estimate several
performance statistics of the shader, including cycle count latency
and throughput values, based on static modeling.  It has instruction
performance information more comprehensive than the current scheduling
pass for all platforms between Gen4-11, and works on both the FS and
VEC4 back-end.

The most immediate purpose of this pass is to implement a heuristic
meant to determine whether using SIMD32 dispatch for a fragment shader
can be expected to help more than it hurts.  In addition this will
allow the effect of passes run after scheduling (e.g. the TGL software
scoreboard pass and the VEC4 dependency control pass) to be visible in
shader-db statistics.

But that isn't the end of the story, other potential applications of
this pass (not part of this MR) I've been playing around with are:

 - Implement a similar SIMD16 heuristic allowing the identification of
   inefficient SIMD16 fragment shaders.

 - Implement similar SIMD16 and SIMD32 heuristics for the compute
   shader stage -- Currently compute shader builds always use the
   SIMD16 shader if available and never use the SIMD32 shader unless
   strictly necessary, which is suboptimal under certain conditions.

 - Hook up to the instruction scheduler in order to improve the
   accuracy of its timing information.

 - Use as heuristic in order to drive the selection of scheduling
   modes (Matt was experimenting with that).

 - Plug to the TGL software scoreboard pass in order to implement a
   more effective SBID token allocation algorithm, since in general
   the optimal token allocation depends on the timings of all
   instructions in the program.

 - Use its bottleneck detection functionality in order to implement a
   heuristic computing a more optimal bound for the number of fragment
   shader threads executed in parallel (by adjusting the
   MaximumNumberofThreadsPerPSD control of 3DSTATE_PS).

As a follow-up I'm planning to submit updated timing information for
Gen12 platforms -- Everything else required to support Gen12 like SWSB
handling is already included in this patch, but there were some IP
concerns regarding the TGL timing parameters since they cannot
currently be obtained with the documentation and hardware which is
publicly available.  The timing parameters for any previous Gen7-11
platforms can be obtained by anyone by sampling the timestamp register
using e.g. shader_time, though I have some more convenient
instrumentation coming up.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:01:03 -07:00
Francisco Jerez
6310a05f68 intel/fs: Rename half() helpers to quarter(), allow index up to 3.
Makes more sense considering SIMD32.  Relaxing the assertion in
brw_ir_fs.h will be required in order to avoid assertion failures on
SNB with SIMD32 fragment shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:00:29 -07:00
Francisco Jerez
72324035fb intel/fs/gen12: Work around dual-source blending hangs in combination with SIMD32.
This applies the same work-around I commited as b84fa0b31e
"intel/fs/gen11: Work around dual-source blending hangs in combination
with SIMD32." to Gen12, which seems to suffer from the same hardware
bug found empirically.  The failure mode seems to be identical.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:00:28 -07:00
Caio Marcelo de Oliveira Filho
395de69b1f intel/fs: Allow multiple slots for position
Change brw_compute_vue_map() to also take the number of pos slots.  If
more than one slot is used, the VARYING_SLOT_POS is treated as an
array.

When using Primitive Replication, instead of a single position, the
VUE must contain an array of positions.  Padding might be
necessary (after clip distance) to ensure rest of attributes start
aligned.

v2: Add note about array in the commit message and assert that
    pos_slots >= 1 to make clear 0 is invalid. (Jason)
    Move padding to be after the clip distance.

v3: Apply the correct offset when gathering the sources from outputs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
2020-04-07 17:16:09 +00:00
Mathias Fröhlich
630154e77b i965: Move down genX_upload_sbe in profiles.
Avoid looping over all VARYING_SLOT_MAX urb_setup array
entries from genX_upload_sbe. Prepare an array indirection
to the active entries of urb_setup already in the compile
step. On upload only walk the active arrays.

v2: Use uint8_t to store the attribute numbers.
v3: Change loop to build up the array indirection.
v4: Rebase.
v5: Style fix.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
2020-03-10 14:28:36 +00:00
Francisco Jerez
e5e4d016b9 intel/compiler: Move register pressure calculation into IR analysis object
This defines a new BRW_ANALYSIS object which wraps the register
pressure computation code along with its result.  For the rationale
see the previous commits converting the liveness and dominance
analysis passes to the IR analysis framework.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:21:10 -08:00
Francisco Jerez
ea44de6d8c intel/compiler/fs: Switch liveness analysis to IR analysis framework
This involves wrapping fs_live_variables in a BRW_ANALYSIS object and
hooking it up to invalidate_analysis() so it's properly invalidated.
Seems like a lot of churn but it's fairly straightforward.  The
fs_visitor invalidate_ and calculate_live_intervals() methods are no
longer necessary after this change.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:57 -08:00
Francisco Jerez
ba73e606f6 intel/compiler: Move all live interval analysis results into fs_live_variables
This moves the following methods that are currently defined in
fs_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_fs_live_variables.cpp:

> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;

It makes sense for them to be part of the fs_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.

This shortens the virtual_grf prefix in order to compensate for the
slightly increased lexical overhead from the live_intervals pointer
dereference.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:43 -08:00
Francisco Jerez
083fd96a97 intel/fs: Use helper for discard sample mask flag subregister number.
Use it instead of hard-coding f0.1 for the sample mask of programs
that use discard.  This will make the task easier when we replace f0.1
with another flag register location in order to support discard with
SIMD32 shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:48 -08:00
Francisco Jerez
b84fa0b31e intel/fs/gen11: Work around dual-source blending hangs in combination with SIMD32.
The SIMD8 dual-source blending framebuffer write messages seem to have
trouble releasing the pixel scoreboard dependency in SIMD32 dispatch
mode, which leads to hangs.  I have a better workaround for this which
doesn't involve disabling SIMD32 when dual-source blending is enabled,
but I'm still investigating some issues with it.  Limit the dispatch
width to SIMD16 in such cases for the moment in order to make the CI
happy on ICL with SIMD32 fragment shaders enabled.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:48 -08:00
Francisco Jerez
57dee58c82 intel/fs: Set src0 alpha present bit in header when provided in message payload.
Currently the "Source0 Alpha Present to RenderTarget" bit of the RT
write message header is derived from brw_wm_prog_data::replicate_alpha.
However the src0_alpha payload is provided anytime it's specified to
the logical message.  This could theoretically lead to an
inconsistency if somebody provided a src0_alpha value while
brw_wm_prog_data::replicate_alpha was false, as I'm planning to do in
a future commit in order to implement a hardware workaround.

Instead calculate the header bit based on whether a src0_alpha value
was provided to the logical message, which guarantees the same
behavior on pre-ICL and ICL+ (the latter used an extended descriptor
bit for this which didn't suffer from the same issue).  Remove the
brw_wm_prog_data::replicate_alpha flag.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-02-14 14:31:48 -08:00