The D3D10 spec is very explicit about treatment of denorm floats and
the behavior is exactly the same for them as it would be for -0 or
+0. This makes our shading code match that behavior, since OpenGL
doesn't care and on a few cpu's it's faster (worst case the same).
Float16 conversions will likely break but we'll fix them in a follow
up commit.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's done automatically for vertex buffers, but not for constant buffers,
textures, and colorbuffers.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This should increase performance if constant uploads are done with the CP DMA,
because only the cache that needs to be flushed is flushed.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
also flushing any cache in evergreen_emit_cs_shader seems to be superfluous
(we don't flush caches when changing the other shaders either)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
1. flush SH with read caches
2. add flag for DB flushes
3. add flag for CB flushes
v2: flush all CBs, remove redundant emit_state variable.
v3: Marek: also set the new flags in r600_context_flush, the CP dma functions,
and texture_barrier, and rename them
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The winsys should do this, because it measures how much time we spend
in buffer_map doing synchronization, which can be viewed with the gallium
HUD.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It was wrong, because the offset shouldn't be applied to MSAA depth buffers.
This small cleanup should prevent such issues in the future.
This fixes a lockup in "piglit/fbo-depthstencil default_fb -samples=n".
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The logic for choosing number of lods was bogus.
(The code should ultimately handle the case of only one lod even with multiple
quads but currently can't.)
It is perfectly valid for the swizzle to be bigger than 2. For example the
texel offsets could be
SAMPLE ..., IMM[0].zzz
What is not correct is for chan_index to be bigger than 2.
Trivial.
Shaders need a lot of work still. Basic stuff generally works, so this
is basically just fine for gnome-shell, OA etc at this point.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The assertion was always broken but the code unused until enabling the
per-element lod code. Fixes piglit texelFetch vs isampler1D and similar
tests (only run with GL 3.0 version override).
d3d10 requires per-pixel lod calculations for explicit lod, lod bias and
explicit derivatives, and we should probably do it for OpenGL too - at least
if they are used from vertex or geometry shaders (so doesn't apply to lod
bias) this doesn't just affect neighboring pixels.
Some code was already there to handle this so fix it up and enable it.
There will no doubt be a performance hit unfortunately, we could do better
if we'd knew we had a real vector shift instruction (with variable shift
count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu).
Don't do anything for lod bias and explicit derivatives yet, though
no special magic should be needed for them neither.
Likewise, the size query is still broken just the same.
v2: Use information if lod is a (broadcast) scalar or not. The idea would be
to base this on the actual value, for now just pretend it's a scalar in fs
and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same
code is generated for fs as before).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The semantics for overflow detection are a bit tricky with
indexed rendering. If the base index in the elements array
overflows, then the index of the first element should be used,
if the index with bias overflows then it should be treated
like a normal overflow. Also overflows need to be checked for
in all paths that either the bias, or the starting index location.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The comparison, incorrectly, was greater-than-or-equal to
elt max.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
With this patch we will only assert that the second temporary is allocated,
when there are more than two active filters.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66423
Signed-off-by: Brian Paul <brianp@vmware.com>
If reg->Register.Indirect is true then the immediate is not truly a
constant LLVM expression.
There is no performance regression in using LLVMBuildBitCast, as it will
fallback to LLVMConstBitCast internally when the argument is a constant.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
We were incorrectly computing the buffer offset when using the
instances. The buffer offset is always equal to:
start_instance * stride + (instance_num / instance_divisor) *
stride
We were completely ignoring the start instance quite
often producing instances that completely wrong, e.g. if
start instance = 5, instance divisor = 2, then on the first
iteration it should be:
5 * stride, not (5/2) * stride as we'd have currently, and if
start instance = 1, instance divisor = 3, then on the first
iteration it should be:
1 * stride, not 0 as we'd have.
This fixes it and adjusts all the code to the changes.
Signed-off-by: Zack Rusin <zackr@vmware.com>
clipper invocations are computed earlier (of course
before the emittion) so this code was adding bogus
numbers to already computed clipper invocations.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Integers could easily overflow is the starting instance
was large enough. Instead of letting bogus counts through
set the instance to max if it overflown and let our
regular buffer overflow computation handle it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Our buffer overflow arithmetic was susceptible to integer
overflows which was the buffer overflow logic to break.
Lets use the llvm overflow intrinsics to check for integer
overflows while computing the stride/needed buffer size.
Signed-off-by: Zack Rusin <zackr@vmware.com>
We weren't taking into account the size of element
that is to be fetched, which meant that it was possible
to overflow the buffer reads if the stride was very
close to the end of the buffer, e.g. stride = 3, buffer
size = 4, and the element to be read = 4. This should
be properly detected as an overflow.
Signed-off-by: Zack Rusin <zackr@vmware.com>
The only reason the checks existed were paranoia, when I first
wrote the code I wasn't sure it was correct. Now that I am,
the asserts triggered when XBMC was dropping frames, so remove it.
NOTE: This is a candidate for the 9.1 branch.
The assembly parser can be used to load r300 assembly dumps
and run them through any of the r300 compiler passes.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Allows MSAA colorbuffers, which have a CMASK automatically and don't
need any further special handling, to be fast cleared. Instead
of clearing the buffer, set the clear color and the CMASK to the
cleared state.
Fast clear is used only when all bound colorbuffers fulfill certain
conditions: a CMASK is required, we have to be able to create a clear
color value for the format and the texture mustn't contain multiple
images. Technically, it should be possible to support array textures
and cubemaps if all images are attached to the framebuffer,
but this does not appear to be common.
v2: fix fast clear check
v3: Marek: - disable fast clear with 128-bit formats, which are unsupported
- set tex->dirty_level_mask in r600_clear, so that the driver knows
the resource must be decompressed/expanded
- return early from r600_clear if there's nothing else to do
Signed-off-by: Marek Olšák <maraeo@gmail.com>
b04a295a4a removed seemingly unnecessary
code in get_query. Turns out this code could in fact be reached - while
timestamps are always binned, if there are no bins (which happens if fb
size is 0) then the rasterization query code filling this in is still
never executed.
So fix this up by filling in some timestamp, but do it at EndQuery time
not GetQuery time which should be more appropriate.
Makes piglit arb_timer_query-timestamp-get happy again.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>