This function is going to get a lot messier with tessellation
so I'm going to use some macros to try and clean some bits
of common code up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves to using an array of hw stages for the atoms.
Note this drops the 23 from the vertex shader, this value
is calculated internally when shaders are bound, so not
required here.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This changes the r600 specific GPR adjustment code
to use the stage defines, and arrays.
This is prep work for the tess changes later.
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add a list of defines for the HW stages.
We will use this for GPR calculations amongst other things.
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
These are unnecessary and are likely just left overs from prior
work.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
On SM20 this gives:
total instructions in shared programs : 6299222 -> 6294240 (-0.08%)
total gprs used in shared programs : 944139 -> 944068 (-0.01%)
total local used in shared programs : 54116 -> 54116 (0.00%)
local gpr inst bytes
helped 0 126 2781 2781
hurt 0 55 11 11
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This way $r1 = $r0 + 4; c1[$r1] becomes c1[$r0+4].
On SM35:
total instructions in shared programs : 6206257 -> 6185058 (-0.34%)
total gprs used in shared programs : 911045 -> 910722 (-0.04%)
total local used in shared programs : 39072 -> 39072 (0.00%)
local gpr inst bytes
helped 0 417 4195 4195
hurt 0 280 0 0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This works when the add also has an immediate. This often happens in
address calculations. These addresses can then be inlined as well.
On code targeted to SM35:
total instructions in shared programs : 6223346 -> 6206257 (-0.27%)
total gprs used in shared programs : 911075 -> 911045 (-0.00%)
total local used in shared programs : 39072 -> 39072 (0.00%)
local gpr inst bytes
helped 0 119 3664 3664
hurt 0 74 15 15
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Even if the rasterizer has scissor disabled, we'll have whatever
vc4->scissor bounds were last set when someone set up a scissor, so we
shouldn't clip to them in that case.
Fixes piglit fbo-blit-rect, and a lot of MSAA tests once they're enabled.
We could potentially handle scissored blits when they're tile aligned, but
it doesn't seem worth it. If you're doing a scissored blit, you're
probably a testcase.
Fixes piglit's fbo-scissor-blit fbo
This implements more performance metrics than the previous support,
but some other metrics still need to be figured out.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These performance metrics will be re-introduced in an upcoming
patch that will follow the same design as Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
inst_issued is performance metric not a hardware event on Kepler (SM30).
It will be re-introduced in an upcoming patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
SM30 is the compute capability version for GK104/GK106/GK107.
This also introduces a new signal group selection called UNK0F.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Move these to 'disasm' instead of the more verbose 'optmsgs' since, like
the tgsi dumps, it is useful without the more verbose compiler logging
enabled.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes assertion in vkCreateImage when VkFormat is combined depthstencil.
Fixed many vulkancts tests that use combined depthstencil. For example,
fixes dEQP-VK.pipeline.depth.format.d16_unorm_s8_uint.compare_ops.\
not_equal_less_or_equal_not_equal_greater.
We're required to expose a host-visible, coherent memory type. On big
core GPUs that share, LLC, we can expose one such memory type that's
also cached. However, on non-LLC GPUs we can't both be cached and
coherent. Thus, we expose both the required coherent type and the cached
but non-coherent combination.
Regular objects are created I915_CACHING_CACHED on LLC platforms and
I915_CACHING_NONE on non-LLC platforms. However, userptr objects are
always created as I915_CACHING_CACHED, which on non-LLC means
snooped. That can be useful but comes with a bit of overheard. Since
we're eplicitly clflushing and don't want the overhead we need to turn
it off.