Commit graph

105203 commits

Author SHA1 Message Date
Rob Clark
b23fc4cacb freedreno/a6xx: move VBO state to stateobj
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:49 -04:00
Rob Clark
e194056832 freedreno/a6xx: move ZSA state to stateobj
Step towards single cmdstream, where we need different state-group-id's
for binning vs draw ZSA state.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
a50a9a44e8 freedreno/a6xx: remove vismode param
We don't need to keep this IGNORE_VISIBILITY in binning pass.  Prep work
for using single cmdstream for both draw and binning passes.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
d9dbc9c21f freedreno/ir3: move binning-pass fixup for a6xx+
Move this to after ir3_cp (which can add lowered immediates to the const
state) for a6xx+, to ensure the uniform state matches between binning
and vertex shaders.  This way we can emit just a single VS_CONST state-
group when we re-use single cmdstream for both binning and draw passes.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
1a51c4a87e freedreno/a6xx: a bit more state emit cleanup
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
2ffc79c7d1 freedreno/a6xx: move framebuffer state emit to emit_mrt()
No point in checking this per-draw, since framebuffer change means new
batch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
5894f37b85 freedreno/a6xx: small emit_mrt() cleanup
On a6xx, this is only used for pfb->cbufs so we can just directly pass
the pfb state.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
b4e94af37d freedreno/a6xx: use program cache
Use the in-memory cache to construct shader program state and re-use it
on subsequent draws, to lower driver overhead.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
1d7fbe2cd1 freedreno/ir3: shader variant cache
Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus
shader variant key to a generation specific state object.

This could eventually replace the linked list of shader variants, but
for now it lets us re-use the work currently done in fdN_program_emit()

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
2e9c08c0bc freedreno/ir3: move binning_pass out of shader variant key
Prep work for a following patch, that introduces a cache to map from
program state (all shader stages) plus variant key to pre-baked hw
state (which could be emit'd via CP_SET_DRAW_STATE, for example).
To do that, we really want the variant key to be immutable, and to
treat the binning pass shader as an extra shader stage, rather than
as a VS variant.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
8b1a3b5dde freedreno/ir3: track # of samplers used by shader
This is useful for a6xx to avoid program state from depending on bound
tex/samp state.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
1b9d69410c freedreno/a6xx: texture state obj
Unfortunately gallium doesn't match what the hw wants perfectly here, in
using a separate CSO for each texture/sampler.  So we have to use a hash
table to map the collection of texture/samplers to hw state object.

We probably could use separate hw state objects for texture and sampler
state, but mesa/st tends to update the tex and samp state together.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
e8606b11dd freedreno: add resource seqno
Intended to be something more compact than a 64b pointer, which could be
used as a key into hashtables.  Prep work for texture state objects.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
abcdf5627a freedreno/a6xx: move const emit to state group
Eventually we want to move nearly everything, but no other state depends
on const state, so this is the easiest one to move first.

For webgl aquarium, this reduces GPU load by about 10%, since for each
fish it does a uniform upload plus draw.. fish frequently are visible in
only a single tile, so this skips the uniform uploads for other tiles.

The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems
to be work an additional 10% gain for aquarium.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
a398d26fd2 freedreno/a6xx: add infrastructure for CP_DRAW_STATE
Add helper to add state-groups to emit, and code to emit CP_DRAW_STATE
packet if we have any state-groups.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
ec717fc629 freedreno: reduce resource dependency tracking overhead
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Neil Roberts
ee61790daf freedreno: Remove the Emacs mode lines
These are not necessary because the corresponding settings are set via
the .dir-locals.el file anyway. Most of them were missing a ‘:’ after
“tab-width” which was making Emacs display an annoying warning
whenever you open the file.

This patch was made with:

sed -ri '/-\*- mode:/,/^$/d' \
    $(find src/gallium/{drivers,winsys} -name \*.\[ch\] \
               -exec grep -l -- '-\*- mode:' {} \+)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Neil Roberts
afe640b360 freedreno: Fix the Emacs indentation configuration file
The .dir-locals.el had the wrong name for the truthy value so it
wasn’t setting indent-tabs-mode.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Hyunjun Ko
8e798e28f7 freedreno: allocate batches from the cache in launch_grid
Needs to allocate batches from the cache so that it could
get a valid index and make resource dependancy tracking right.

In addition this fixes assertion on debug build since the commit
1a40faa8 landed.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Hyunjun Ko
2385d7b066 freedreno: adds nondraw param to fd_bc_alloc_batch
Needs to specify nondraw when creating a batch through
fd_bc_alloc_batch since it'd better create a batch through
it rather than fd_batch_create.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
9e6019bd46 freedreno/a6xx: remove fd6_emit_render_cntl()
It was dead code carried over from a5xx

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
835cb06965 freedreno/ir3: fix broken texcoord inputs
TODO not sure if this is best solution, but current logic is broken for
texcoord inputs.  It is definitely the simplest solution.

Fixes: 1a24f51966 freedreno/ir3: ignore unused inputs
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Rob Clark
cbf9fe50b5 freedreno: fix off-by-one error in BEGIN_RING()
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-10-17 12:44:48 -04:00
Marek Olšák
669dd22983 util: document a limitation of util_fast_udiv32
trivial
2018-10-17 12:27:58 -04:00
Matt Turner
58a51d0a67 i965/fs: Add 64-bit int immediate support to dump_instructions()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-10-16 17:48:17 -07:00
Marek Olšák
fcc70e4855 radeonsi: track context rolls better for the Vega scissor bug workaround
We should get fewer context rolls with the SET_CONTEXT_REG optimization,
but it would have been for nothing if the scissor state rolled the context
anyway. Don't emit the scissor state if there is no context roll.
2018-10-16 17:23:25 -04:00
Marek Olšák
25ddb15cfe radeonsi: emit sample locations for 1xAA only when the hw bug is present 2018-10-16 17:23:25 -04:00
Marek Olšák
9b331e462e radeonsi: use compute shaders for clear_buffer & copy_buffer
Fast color clears should be much faster. Also, fast color clears on
evicted buffers should be 200x faster on GFX8 and older.
2018-10-16 17:23:25 -04:00
Marek Olšák
5030adcbe0 radeonsi: use copy_buffer in buffer_do_flush_region directly 2018-10-16 17:23:25 -04:00
Marek Olšák
0b40fbc879 radeonsi: use faster integer division for instance divisors
We know the divisors when we upload them, so instead we can precompute
and upload division factors derived from each divisor.

This fast division consists of add, mul_hi, and two shifts,
and we have to load 4 dwords intead of 1.

This probably won't affect any apps.
2018-10-16 17:23:25 -04:00
Marek Olšák
bfc795670e ac: add helpers for fast integer division by a constant 2018-10-16 17:23:25 -04:00
Marek Olšák
ea039f789d radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewports 2018-10-16 15:28:22 -04:00
Marek Olšák
4fd8d2df9c radeonsi: move emission of PA_SU_VTX_CNTL into emit_guardband
We'll modify the quant mode there, which also affects the guarband
computation.
2018-10-16 15:28:22 -04:00
Marek Olšák
41a6c3de1f radeonsi: don't re-upload the sample position constant buffer repeatedly 2018-10-16 15:28:22 -04:00
Marek Olšák
b94824c787 radeonsi: set PA_SU_PRIM_FILTER_CNTL optimally 2018-10-16 15:28:22 -04:00
Marek Olšák
9e182b8313 radeonsi: center viewport to improve guardband clipping for high resolutions
This will be more useful when we change the quant mode to increase subpixel
precision and decrease the viewport range (which might not be possible
if the viewport is not centered in the viewport range).
2018-10-16 15:28:22 -04:00
Marek Olšák
fedc1fda30 radeonsi: save raster config in screen, add se_tile_repeat 2018-10-16 15:28:22 -04:00
Marek Olšák
ac76aeef20 radeonsi: switch back to standard DX sample positions
Apps may rely on them.
2018-10-16 15:28:22 -04:00
Marek Olšák
67f02cf810 radeonsi: add GDS support to CP DMA 2018-10-16 15:28:22 -04:00
Marek Olšák
0d05581578 radeonsi: rename si_gfx_* functions to si_cp_*
and write_event_eop -> release_mem
2018-10-16 15:28:22 -04:00
Marek Olšák
6e1cf6532d radeonsi: make si_gfx_write_event_eop more configurable 2018-10-16 15:28:22 -04:00
Sergii Romantsov
0fa9e6d7b3 anv/skylake: disable ForceThreadDispatchEnable
On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.

-v2: enabling of  ForceThreadDispatchEnable is only for gen8, for
     gen9 and higher reverted enabling of PixelShaderHasUAV.

-v3 (Jason Ekstrand): Rework the comments a bit.

CC: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Fixes: 79270d2140 (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-10-16 13:20:51 -05:00
Lionel Landwerlin
322a919a41 anv: Implement VK_EXT_pci_bus_info
Even though the Intel GPU are always at the same PCI location, all the
info we need is already provided by libdrm. Let's be future proof.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-10-16 12:47:55 +01:00
Jose Fonseca
8550be7a2f appveyor: Cache pip's cache files.
It should speed up the Python packages installation.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-10-16 09:41:14 +01:00
Jose Fonseca
bfb8afb14d appveyor: Update to newer Mako/winflexbison versions.
As that's what most people are bound to use.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-10-16 09:41:12 +01:00
Jose Fonseca
b94f9cd8f9 appveyor: Update to MSVC 2017.
That's what we (and I suppose most people out there) are using now.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-10-16 09:41:07 +01:00
Samuel Pitoiset
647c2b90e9 radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT
This feature isn't used for now, so disable it until
wwm is fixed in LLVM.

Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal*

https://bugs.freedesktop.org/show_bug.cgi?id=108115
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-16 10:24:19 +02:00
Samuel Pitoiset
593996bc02 radv: implement buffer to image operations for R32G32B32
This should fix rendering issues with Batman Arkham City.
We will probably need to implement itob and itoi at some
point, but currently nothing hits these paths.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-16 09:22:38 +02:00
Alex Smith
ca83d51cfb ac/nir: Use context-specific LLVM types
LLVMInt*Type() return types from the global context and therefore are
not safe for use in other contexts. Use types from our own context
instead.

Fixes frequent crashes seen when doing multithreaded pipeline creation.

Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant"
Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads"
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-10-16 08:18:24 +01:00
Vadym Shovkoplias
ad558408ff glsl: Check the subroutine associated functions names
Adding compile time check for subroutine functions with
the same names. Similar check for intrastage linking was already
landed in commit 5f0567a4f6.

From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification

    "A program will fail to compile or link if any shader
     or stage contains two or more functions with the same
     name if the name is associated with a subroutine type."

Fixes:
    * no-overloads.vert

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-10-16 08:15:21 +03:00