This evolved over several commits, and I also wanted to document some
new information about how we handle formats.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Now that all RBs have miptrees, and miptree mapping covered these last
two code paths, consistently use them.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Right now the fake packed d/s RBs are creating two sub-renderbuffers
with their own storage, and the hardware setup and the mapping code
have been explicitly referencing them. By setting miptrees on them,
we'll be able to make our renderbuffer code for fake packed
depth/stencil more consistent with all our other renderbuffers.
The interesting new behavior here is that there is now a mt with a
non-depthstencil format (X8Z24) that has a stencil_mt field
associated. This looks like it should be safe, and we'll need to be
able to do this for floating point depth/stencil as well.
Before, we had an uncached read of S8 to untile, then a RMW (so
uncached penalty) of the packed S8Z24 to store the value, then the
consumer would uncached read that once per pixel. If data was written
to the map, we would then have to uncached read the written data back
out and do the scatter to the tiled S8 buffer (also uncached access
penalties, since WC couldn't actually combine). So 3 or 5 uncached
accesses per pixel in the ROI (and we we were ignoring the ROI, so it
was the whole image).
Now we get an uncached read of S8 to untile, and an uncached read of
Z. The consumer gets to do cached accesses. Then if data was
written, we do streaming Z writes (WC success), and scattered S8
tiling writes (uncached penalty). So 2 or 3 uncached accesses per
pixel in the ROI.
This should be a performance win, to the extent that anybody is doing
software accesses of packed depth/stencil buffers.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We don't gripe about void * arithmetic for our driver, and this
prevents silly casting when assigning the result of mapping to
non-byte types.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We're going to want to reuse this logic in mapping of fake packed
miptrees wrapping separate depth/stencil miptrees.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This code will be incrementally moving to a model like intel_fbo.c's
renderbuffer mapping with helper functions, as I move that code here.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This will be used for things like packed depth/stencil temporaries and
making LLC-cached temporary mappings using blits.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This will let us share teximage mapping logic with renderbuffer
mapping, which has an intel_mipmap_tree but not a gl_texture_image.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This required is_hiz_depth_format to start returning true on S8_Z24 as
well, since that's the format we have here. The two previous callers
are only calling it on non-depthstencil formats.
This avoids us needing to have HiZ working on a new Z format
immediately upon exposing the format (particularly painful for
Z32_FLOAT_X24S8, which means all the fake packed depth/stencil paths).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Some hardware can't reinterpret the format of hardware buffers and thus
the X server needs to know the format when the buffer is created.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Daenzer <michel@daenzer.net>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
See intel_vertical_texture_alignment_unit() in intel_tex_layout.c;
certain surface types require setting this to VALIGN_4.
Analogous to commit dd0e46c410 on Gen6.
Fixes piglit test fbo-generatemipmap-formats with the
GL_ARB_depth_texture and GL_EXT_packed_depth_stencil arguments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This patch should prevent the crashes when some shaders are absent,
see https://bugs.freedesktop.org/show_bug.cgi?id=43341
Note this is a candidate for the stable branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The code forces single program flow to be enabled on Ironlake, or
equivalently, disables multiple program flow. The comment was reversed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This moves the detiling to the fbo mapping, r200 depth is always tiled,
and we can't detile it with the blitter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This could have been split up better, but the driver is just broken now,
so bisecting the brokenness is going to be painful no matter what.
This adds renderbuffer mapping/unmapping along with texture image allocation.
It drops all the old texture upload paths, some of which could possible be
reimplemented with the blitter later.
It also redoes the span code paths to use its own set of image mapping handlers,
along with removing the tiling decode paths for the color buffers, since
we now hope to use the blitter for this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I think there is a missing state update or flush somewhere, and every
so often PP_CNTL goes to the kernel with a texture enabled but no texture.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For validating ARB program inputs replace hard
coded bitfield and attribute number with the appropriate
VERT_{ATTRIB,BIT}* variant.
This should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=43407
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously a zero writemask would result in dst_chan == -1, meaning an
unnecessary MOV with the destination register dictated by undefined
memory contents would be emitted before returning. This caused
intermittent GPU hangs, e.g. with glean/texCombine.
Reviewed-by: Eric Anholt <eric@anholt.net>
Anything of less than (bw, bh) size is possible when you consider
rectangular textures, and this code is (now) safe for those. Even for
power-of-two textures, width could be 4 for FXT1 while not being
aligned to block size.
Fixes piglit compressedteximage GL_COMPRESSED_RGB_FXT1_3DFX
Reviewed-by: Brian Paul <brianp@vmware.com>
Generally this code works with width and height aligned to compressed
blocks, but at the 2x2 and 1x1 levels of a square texture (or height <
bh in general), we were skipping uploading our single row of blocks.
Fixes piglit compressedteximage GL_COMPRESSED_RGBA_S3TC_DXT5_EXT.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since the MapTextureImage changes on Intel, nwn had corruption in the
scrollbar at the load game menu, and corrupted ground textures in the
starting zone. Heroes of Newerth's intro screen was also thoroughly
garbled. A new piglit test "compressedteximage" was created to
regression test this.
The issue was this code now seeing dstRowStride aligned to hardware
requirements instead of a temporary buffer that gets uploaded to
hardware later. The existing code was just trying to memcpy
srcRowStride * height / bh, while the glCompressedTexSubImage2D()
storage code nearby did the correct walking by blockheight rows at a
time. Just reuse the subimage upload instead of duplicating that
logic.
v2: Update comment at the top of the function (suggestion by Joel
Forsberg)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41451
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
We checked if srcType == GL_UNSIGNED_BYTE earlier so there was no
way to reach this code. This was left-over code from the GLchan
removal work.
Reviewed-by: José Fonseca <jfonseca@vmware.com>