v4: - Ensure inc_amd_common defined when radeonsi is disabled (needed by
r600)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Tested-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(other uses of USE_VC4_SIMULATOR are already correct)
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fix a bunch of labels indicating when registers were added/removed
and normalize the SI-class GRBM_GFX_INDEX.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The next commit will reduce the size even more.
v2: typecast to uint64_t manually
v3: add more typecasts, add asserts
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!
V2:
- Use environmental variable (Karol Herbst)
V3:
- Use the already populated nv50_ir_prog_info to forward information to the
print pass (Pierre Moreau)
V4:
- get rid of default value in PrintPass constructor
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Under certain conditions, waiting on a GL sync objects should act like
a flush, regardless of the timeout.
Portal 2, CS:GO, and presumably other Source engine games rely on this
behavior and hang during loading without this fix.
Fixes: bc65dcab3b ("radeonsi: avoid syncing the driver thread in si_fence_finish")
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103902
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103904
Memory loads can take offsets, but the SHLADD will often attempt to
consume the offsets too. As there may be multiple memory loads with the
same base but different offsets, those would end up in a SHLADD instead
of the offset of the memory operation.
This moves the pass after we've had a chance to attempt to propagate
immediate adds into the indirect offset.
total instructions in shared programs : 6580681 -> 6567716 (-0.20%)
total gprs used in shared programs : 944261 -> 943375 (-0.09%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs : 60339896 -> 60221504 (-0.20%)
local shared gpr inst bytes
helped 0 0 555 2698 2698
hurt 0 0 138 336 336
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
When a MERGE operation gets its constraint moves added, it
susbstantially extends live ranges to be reusing an immediate from
earlier in the program (not to mention the silliness of loading an
immediate into a register, and then moving into another register).
We detect these scenarios and insert moves that take the immediate or
constbuf load directly into the register. If it's the last use, then we
can just move that operation to the closer location.
With SM35 (255 regs) we get these results:
total instructions in shared programs : 6583670 -> 6580681 (-0.05%)
total gprs used in shared programs : 950818 -> 944261 (-0.69%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs : 60367456 -> 60339896 (-0.05%)
local shared gpr inst bytes
helped 0 0 4584 3186 3186
hurt 0 0 55 968 968
I suspect they will be better for SM20 and SM30.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We can still use the optimized division methods which make use of
multiplication with overflow.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
It's common to use signed int modulo in GLSL. As it happens, the GLSL
specs allow the result to be undefined, but that seems fairly
surprising. It's not that much more effort to get it right, at least for
positive modulo operators.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is a copy of the a5xx logic. Fails a few tests, but basic
functionality is there.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
This is needed for profiling multi-context applications like Chrome.
One context can record queries and another context can draw the HUD.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Core Mesa already handles flushing based on ContextReleaseBehavior,
so the comment is wrong.
Also, old_st is always NULL, because unbind_context always precedes
make_current.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
now you can hack the driver to enable DCC for displayable textures and
Glamor that doesn't enable that by default won't crash anymore.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>