We don't need to keep this IGNORE_VISIBILITY in binning pass. Prep work
for using single cmdstream for both draw and binning passes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Move this to after ir3_cp (which can add lowered immediates to the const
state) for a6xx+, to ensure the uniform state matches between binning
and vertex shaders. This way we can emit just a single VS_CONST state-
group when we re-use single cmdstream for both binning and draw passes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Use the in-memory cache to construct shader program state and re-use it
on subsequent draws, to lower driver overhead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus
shader variant key to a generation specific state object.
This could eventually replace the linked list of shader variants, but
for now it lets us re-use the work currently done in fdN_program_emit()
Signed-off-by: Rob Clark <robdclark@gmail.com>
Prep work for a following patch, that introduces a cache to map from
program state (all shader stages) plus variant key to pre-baked hw
state (which could be emit'd via CP_SET_DRAW_STATE, for example).
To do that, we really want the variant key to be immutable, and to
treat the binning pass shader as an extra shader stage, rather than
as a VS variant.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Unfortunately gallium doesn't match what the hw wants perfectly here, in
using a separate CSO for each texture/sampler. So we have to use a hash
table to map the collection of texture/samplers to hw state object.
We probably could use separate hw state objects for texture and sampler
state, but mesa/st tends to update the tex and samp state together.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Intended to be something more compact than a 64b pointer, which could be
used as a key into hashtables. Prep work for texture state objects.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Eventually we want to move nearly everything, but no other state depends
on const state, so this is the easiest one to move first.
For webgl aquarium, this reduces GPU load by about 10%, since for each
fish it does a uniform upload plus draw.. fish frequently are visible in
only a single tile, so this skips the uniform uploads for other tiles.
The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems
to be work an additional 10% gain for aquarium.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add helper to add state-groups to emit, and code to emit CP_DRAW_STATE
packet if we have any state-groups.
Signed-off-by: Rob Clark <robdclark@gmail.com>
These are not necessary because the corresponding settings are set via
the .dir-locals.el file anyway. Most of them were missing a ‘:’ after
“tab-width” which was making Emacs display an annoying warning
whenever you open the file.
This patch was made with:
sed -ri '/-\*- mode:/,/^$/d' \
$(find src/gallium/{drivers,winsys} -name \*.\[ch\] \
-exec grep -l -- '-\*- mode:' {} \+)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Needs to allocate batches from the cache so that it could
get a valid index and make resource dependancy tracking right.
In addition this fixes assertion on debug build since the commit
1a40faa8 landed.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Needs to specify nondraw when creating a batch through
fd_bc_alloc_batch since it'd better create a batch through
it rather than fd_batch_create.
Signed-off-by: Rob Clark <robdclark@gmail.com>
TODO not sure if this is best solution, but current logic is broken for
texcoord inputs. It is definitely the simplest solution.
Fixes: 1a24f51966 freedreno/ir3: ignore unused inputs
Signed-off-by: Rob Clark <robdclark@gmail.com>
We should get fewer context rolls with the SET_CONTEXT_REG optimization,
but it would have been for nothing if the scissor state rolled the context
anyway. Don't emit the scissor state if there is no context roll.
We know the divisors when we upload them, so instead we can precompute
and upload division factors derived from each divisor.
This fast division consists of add, mul_hi, and two shifts,
and we have to load 4 dwords intead of 1.
This probably won't affect any apps.
This will be more useful when we change the quant mode to increase subpixel
precision and decrease the viewport range (which might not be possible
if the viewport is not centered in the viewport range).
On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.
-v2: enabling of ForceThreadDispatchEnable is only for gen8, for
gen9 and higher reverted enabling of PixelShaderHasUAV.
-v3 (Jason Ekstrand): Rework the comments a bit.
CC: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Fixes: 79270d2140 (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Even though the Intel GPU are always at the same PCI location, all the
info we need is already provided by libdrm. Let's be future proof.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This feature isn't used for now, so disable it until
wwm is fixed in LLVM.
Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal*
https://bugs.freedesktop.org/show_bug.cgi?id=108115
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This should fix rendering issues with Batman Arkham City.
We will probably need to implement itob and itoi at some
point, but currently nothing hits these paths.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
LLVMInt*Type() return types from the global context and therefore are
not safe for use in other contexts. Use types from our own context
instead.
Fixes frequent crashes seen when doing multithreaded pipeline creation.
Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant"
Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads"
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Adding compile time check for subroutine functions with
the same names. Similar check for intrastage linking was already
landed in commit 5f0567a4f6.
From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification
"A program will fail to compile or link if any shader
or stage contains two or more functions with the same
name if the name is associated with a subroutine type."
Fixes:
* no-overloads.vert
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>