fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 22:18:13 +02:00

Author	SHA1	Message	Date
Kenneth Graunke	3114f5acd3	i965/blorp: Support overriding destination alpha to 1.0. Currently, Blorp requires the source and destination formats to be equal. However, we'd really like to be able to blit between XRGB and ARGB formats; our BLT engine paths have supported this for a long time. For ARGB -> XRGB, nothing needs to occur: the missing alpha is already interpreted as 1.0. For XRGB -> ARGB, we need to smash the alpha channel to 1.0 when writing the destination colors. This is fairly straightforward with blending. For now, this code is never used, as the source and destination formats still must be equal. The next patch will relax that restriction. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Martin Steigerwald <martin@lichtvoll.de> (cherry picked from commit `c0554141a9`)	2013-02-07 22:31:29 -08:00
Kenneth Graunke	332c50b666	i965: Implement CopyTexSubImage2D via BLORP (and use it by default). The BLT engine has many limitations. Currently, it can only blit X-tiled buffers (since we don't have a kernel API to whack the BLT tiling mode register), which means all depth/stencil operations get punted to meta code, which can be very CPU-intensive. Even if we used the BLT engine, it can't blit between buffers with different tiling modes, such as an X-tiled non-MSAA ARGB8888 texture and a Y-tiled CMS ARGB8888 renderbuffer. This is a fundamental limitation, and the only way around that is to use BLORP. Previously, BLORP only handled BlitFramebuffer. This patch adds an additional frontend for doing CopyTexSubImage. It also makes it the default. This is partly to increase testing and avoid hiding bugs, and partly because the BLORP path can already handle more cases. With trivial extensions, it should be able to handle everything the BLT can. This helps PlaneShift massively, which tries to CopyTexSubImage2D between depth buffers whenever a player casts a spell. Since these are Y-tiled, we hit meta and software ReadPixels paths, eating 99% CPU while delivering ~1 FPS. This is particularly bad in an MMO setting because people cast spells all the time. It also helps Xonotic in 4X MSAA mode. At default power management settings, I measured a 6.35138% +/- 0.672548% performance boost (n=5). (This data is from v1 of the patch.) No Piglit regressions on Ivybridge (v3) or Sandybridge (v2). v2: Create a fake intel_renderbuffer to wrap the destination texture image and then reuse do_blorp_blit rather than reimplementing most of it. Remove unnecessary clipping code and conditional rendering check. v3: Reuse formats_match() to centralize checks; delete temporary renderbuffers. Reorganize the code. v4: Actually copy stencil when dealing with separate stencil buffers but packed depth/stencil formats. Tested by a new Piglit test. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com> [v4] Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v3] Reviewed-and-tested-by: Carl Worth <cworth@cworth.org> [v2] Tested-by: Martin Steigerwald <martin@lichtvoll.de> [v3] (cherry picked from commit `0b3bebbaac`)	2013-02-07 22:31:29 -08:00
Kenneth Graunke	55e3f79d55	mesa: Put extern "C" guards in renderbuffer.h. I need to use this from C++ code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (cherry picked from commit `29aef6cce8`)	2013-02-07 22:31:29 -08:00
Kenneth Graunke	1d2ef43032	i965: Fix the SF Vertex URB Read Length calculation for Gen7 platforms. Ivybridge doesn't appear to have the same errata as Sandybridge; no corruption was observed by setting it to more than the minimal correct value. It's possible that we were simply lucky, since the URB entries are 1024-bit on Ivybridge vs. 512-bit Sandybridge. Or perhaps the underlying hardware issue is fixed. Either way, we may as well program the minimum value since it's now readily available, likely to be more efficient, and possibly more correct. v2: Use GEN7_SBE_* defines rather than GEN6_SF_*. (A copy and paste mistake.) They're the same, but using the right names is better. NOTE: This is a candidate for all stable branches. Reviewed-by: Paul Berry <stereotype441@gmail.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit `44aa2e15f6`)	2013-02-07 22:31:28 -08:00
Kenneth Graunke	3acd5ed75b	i965: Fix the SF Vertex URB Read Length calculation for Sandybridge. (This commit message was primarily written by Paul Berry, who explained what's going on far better than I would have.) Previous to this patch, we thought that the only restrictions on 3DSTATE_SF's URB read length were (a) it needs to be large enough to read all the VUE data that the SF needs, and (b) it can't be so large that it tries to read VUE data that doesn't exist. Since the VUE map already tells us how much VUE data exists, we didn't bother worrying about restriction (a); we just did the easy thing and programmed the read length to satisfy restriction (b). However, we didn't notice this erratum in the hardware docs: "[errata] Corruption/Hang possible if length programmed larger than recommended". Judging by the context surrounding this erratum, it's pretty clear that it means "URB read length must be exactly the size necessary to read all the VUE data that the SF needs, and no larger". Which means that we can't program the read length based on restriction (b)--we have to program it based on restriction (a). The URB read size needs to precisely match the amount of data that the SF consumes; it doesn't work to simply base it on the size of the VUE. Thankfully, the PRM contains the precise formula the hardware expects. Fixes random UI corruption in Steam's "Big Picture Mode", random terrain corruption in PlaneShift, and Piglit's fbo-5-varyings test. NOTE: This is a candidate for all stable branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56920 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60172 Tested-by: Jordan Justen <jordan.l.justen@intel.com> (v1/Piglit) Tested-by: Martin Steigerwald <martin@lichtvoll.de> (PlaneShift) Reviewed-by: Paul Berry <stereotype441@gmail.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit `09fbc29828`)	2013-02-07 22:31:28 -08:00
Kenneth Graunke	697f8e56dc	i965: Compute the maximum SF source attribute. The maximum SF source attribute is necessary to compute the Vertex URB read length properly, which will be done in the next commit. NOTE: This is a candidate for all stable branches. Reviewed-by: Paul Berry <stereotype441@gmail.com> Tested-by: Martin Steigerwald <martin@lichtvoll.de> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit `5e9bc7bd12`)	2013-02-07 22:31:28 -08:00
Kenneth Graunke	45ae093e5c	i965: Refactor Gen6+ SF attribute override code. The next patch will benefit from easy access to the source attribute number and whether or not we're swizzling. It doesn't want the final attr_override DWord form, however. NOTE: This is a candidate for all stable branches. Reviewed-by: Paul Berry <stereotype441@gmail.com> Tested-by: Martin Steigerwald <martin@lichtvoll.de> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit `b3efc5bea8`)	2013-02-07 22:31:28 -08:00
Kenneth Graunke	535e95299a	i965: Add chipset limits for Haswell GT1/GT2. The maximum number of URB entries come from the 3DSTATE_URB_VS and 3DSTATE_URB_GS state packet documentation; the thread count information comes from the 3DSTATE_VS and 3DSTATE_PS state packet documentation. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com> (cherry picked from commit `9add4e8038`)	2013-02-07 22:31:28 -08:00
Vinson Lee	a7e2c615f1	i965: Fix assignment instead of comparison in asserts. Fixes side effect in assertion defects reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (cherry picked from commit `1559994cba`)	2013-02-07 22:31:28 -08:00
Paul Berry	5611a5a387	mesa: Don't check (offset + size <= bufObj->Size) in BindBufferRange. In the documentation for BindBufferRange, OpenGL specs from 3.0 through 4.1 contain this language: "The error INVALID_VALUE is generated if size is less than or equal to zero or if offset + size is greater than the value of BUFFER_SIZE." This text was dropped from OpenGL 4.2, and it does not appear in the GLES 3.0 spec. Presumably the reason for the change is because come clients change the size of the buffer after calling BindBufferRange. We don't want to generate an error at the time of the BindBufferRange call just because the old size of the buffer was too small, when the buffer is about to be resized. Since this is a deliberate relaxation of error conditions in order to allow clients to work, it seems sensible to apply it to all versions of GL, not just GL 4.2 and above. (Note that there is no danger of this change allowing a client to access data beyond the end of a buffer. We already have code to ensure that that doesn't happen in the case where the client shrinks the buffer after calling BindBufferRange). Eliminates a spurious error message in the gles3 conformance test "transform_feedback_offset_size". Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (cherry picked from commit `04f0d6cc22`)	2013-02-07 21:20:32 -08:00
Ian Romanick	a48e5526c2	i965: Set UniformBufferOffsetAlignment to sizeof(vec4) This matches the behavior of the Windows driver, but a bspec reference should would be nice. NOTE: This is a candidate for the 9.0 and 9.1 branches. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit `f29ab4ece5`)	2013-02-07 21:20:16 -08:00
Matt Turner	c59808c700	mesa: Allow glGet* queries of MAX_VARYING_COMPONENTS in ES 3 Should have been done in `d9948e49` but I missed it because MAX_VARYING_FLOATS doesn't appear in the ES 3 spec, but is the same value as MAX_VARYING_COMPONENTS. NOTE: Candidate for the 9.1 branch Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-02-07 17:54:16 -08:00
Michel Dänzer	ad62f424b3	radeonsi: Handle scaled and integer formats for samplers and vertex elements. Also, add assertions to stress that render targets don't support scaled formats. 20 more little piglits. (cherry picked from commit 46dd16bca8b4526e46badc9cb1d7c058a8e6173e)	2013-02-07 19:11:30 +01:00
Michel Dänzer	fc04455533	radeonsi: Don't advertise PIPE_FORMAT_L8A8_SRGB support. The hardware can't do it. (cherry picked from commit f6e9430da2d3510f84baefa0fdf26ec5c457f146)	2013-02-07 19:11:19 +01:00
Michel Dänzer	6799bddf6b	radeonsi: Remove incorrect (and dead) assignment in tex_fetch_args(). The proper return type is assigned at the end of the function. (cherry picked from commit 180db2bcb28e94bb1ce18d76b2b3a5818d76262c)	2013-02-07 19:11:09 +01:00
Michel Dänzer	93f61addb5	radeonsi: Use unique names for referring to texture sampling intrinsics. Append the overloaded vector type used for passing in the addressing parameters. Without this, LLVM uses the same function signature for all those types, which cannot work. Fixes problems e.g. with FlightGear and Red Eclipse. (cherry picked from commit 1b3afea30de757815555d9eb1d6e72e2586d6a0c)	2013-02-07 19:10:17 +01:00
Jerome Glisse	d04b50b4de	r600g: fix slice tile max for compressed texture and async dma Was using the pixel size instead of the number of block for the slice tile max computation which resulted in dma writing at wrong address. Signed-off-by: Jerome Glisse <jglisse@redhat.com>	2013-02-07 10:43:37 -05:00
Marek Olšák	f1c46c8418	r300g: fix blending with blend color and RGBA formats NOTE: This is a candidate for the stable branches. (cherry picked from commit `f40a7fc34a`)	2013-02-06 22:24:04 +01:00
Michel Dänzer	4bc85f9aac	Require libdrm_radeon 2.4.42 for radeonsi. It has new PCI IDs and an important tiled surface layout fix. (cherry picked from commit `02a423b239`)	2013-02-05 15:15:49 +01:00
Alex Deucher	e1d798a901	radeonsi: add Oland pci ids Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Note: this is a candidate for the 9.1 branch. (cherry picked from commit `4161d70bba`)	2013-02-04 17:20:22 -05:00
Alex Deucher	6b0fa537a9	radeonsi: default PA_SC_RASTER_CONFIG to 0 That should work in all cases. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Note: this is a candidate for the 9.1 branch. (cherry picked from commit `af0af75881`)	2013-02-04 17:20:03 -05:00
Alex Deucher	0cc0097bb0	radeonsi: add support for Oland chips Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Note: this is a candidate for the 9.1 branch (cherry picked from commit `83e4407f44`)	2013-02-04 17:19:43 -05:00
Michel Dänzer	7f90de5414	radeonsi: Fix draws using user index buffer. Was broken since commit `bf469f4edc` ('gallium: add void *user_buffer in pipe_index_buffer'). Fixes 11 piglit tests and lots of missing geometry e.g. in TORCS. NOTE: This is a candidate for the 9.1 branch. (cherry picked from commit `a8a5055f2d`)	2013-02-04 17:54:03 +01:00
Michel Dänzer	8cd237bcbe	radeonsi: Remove spurious traces of R16G16B16 support. The hardware can't do it, and these were causing warnings in some piglit tests. NOTE: This is a candidate for the 9.1 branch. (cherry picked from commit `6455d40b7e`)	2013-02-04 17:28:18 +01:00
Michel Dänzer	5ca77c27a6	radeonsi: Enable texture arrays. 28/30 piglit tests pass. NOTE: This is a candidate for the 9.1 branch. (cherry picked from commit `6bcb823844`)	2013-02-04 17:28:14 +01:00
Michel Dänzer	b104d151f1	radeonsi: Improve packing of texture address parameters. In particular, the LOD bias and depth comparison values are packed before the 'normal' texture coordinates, and the array slice and LOD values are appended. NOTE: This is a candidate for the 9.1 branch. (cherry picked from commit `120efeef8b`)	2013-02-04 17:27:43 +01:00
Michel Dänzer	5f9f3f381f	radeonsi: Adapt to sample intrinsics changes. Fix up intrinsic names, and bitcast texture address parameters to integers. NOTE: This is a candidate for the 9.1 branch. (cherry picked from commit `e5fb7347a7`)	2013-02-04 17:27:34 +01:00
Marek Olšák	b127ad3489	mesa: don't expose IBM_rasterpos_clip in a core context glRasterPos doesn't exist in the core profile. NOTE: This is a candidate for the stable branches (9.0 and 9.1). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (cherry picked from commit `cc5fdaf2dc`)	2013-02-01 16:35:24 +01:00
Marek Olšák	1003652a7f	r300g: always put MSAA resources in VRAM This along with the latest drm-fixes branch should help with bad performance of MSAA. Remember: Nx MSAA can't be more than N times slower (where N=2,4,6). Anyway, I recommend at least 512 MB of VRAM for Full HD 6x MSAA. NOTE: This is a candidate for the 9.1 branch. (cherry picked from commit `a06f03d795`)	2013-02-01 16:35:18 +01:00
Jerome Glisse	9d8a866db3	r600g: add cs memory usage accounting and limit it v3 We are now seing cs that can go over the vram+gtt size to avoid failing flush early cs that goes over 70% (gtt+vram) usage. 70% is use to allow some fragmentation. The idea is to compute a gross estimate of memory requirement of each draw call. After each draw call, memory will be precisely accounted. So the uncertainty is only on the current draw call. In practice this gave very good estimate (+/- 10% of the target memory limit). v2: Remove left over from testing version, remove useless NULL checking. Improve commit message. v3: Add comment to code on memory accounting precision Signed-off-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Marek Olšák <maraeo@gmail.com>	2013-01-31 14:25:30 -05:00
Marek Olšák	3b8d4f941f	r600g: fix htile buffer leak NOTE: This is a candidate for the 9.1 branch.	2013-01-31 14:25:10 -05:00
Matt Turner	ff515c4e7c	build: Add missing comma in AS_IF Reported-by: Lauri Kasanen<curaga@operamail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47248#c15	2013-01-29 15:06:47 -08:00
Marek Olšák	d7ca04a7c3	docs/relnotes-9.1: document new features in radeon drivers (cherry picked from commit `845130951f`)	2013-01-29 17:38:14 +01:00
Matt Turner	48af880f81	docs: List new extensions added in Mesa 9.1 I did not list the *_get_program_binary extensions since they're not useful to anyone with their current implementation (that supports 0 binary formats).	2013-01-28 16:49:24 -08:00
Jerome Glisse	af2d8f8072	r600g: use uint64_t instead of unsigned long for proper 32bits cpu support Signed-off-by: Jerome Glisse <jglisse@redhat.com>	2013-01-28 19:10:29 -05:00
Jerome Glisse	d8d17441e2	r600g: real fix for non 3.8 kernel Signed-off-by: Jerome Glisse <jglisse@redhat.com>	2013-01-28 17:44:49 -05:00
Jerome Glisse	72916698b0	r600g: fix segfault with old kernel Old kernel do not have dma support, patch pushed were missing some of the check needed to not use dma. Signed-off-by: Jerome Glisse <jglisse@redhat.com>	2013-01-28 14:51:40 -05:00
Zack Rusin	dbb2d192de	glx: only advertise GLX_INTEL_swap_event if it's supported Only drivers supporting DRI2 version >=4 support GLX_INTEL_swap_event. So lets mark it as such otherwise applications which use this extension (i.e. everything based on Clutter, e.g. gnome-shell) break horribly on drivers supporting DRI2 versions only up to 3. Note: This is a candidate for the 9.0 branch. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-01-24 19:13:05 -08:00
Vadim Girlin	c9343047cf	r600g: improve inputs/interpolation handling with llvm backend Get rid of special handling for reserved regs. Use one intrinsic for all kinds of interpolation. v2[Vincent Lejeune]: Rebased against current master Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-01-28 18:30:38 +00:00
Tom Stellard	33dc412b89	r600g: Add ar_chan member to struct r600_bytecode r600_bytecode::ar_chan stores the register channel for the value that will be loaded into the AR register. At the moment, this field is only used by the LLVM backend. The default backend always sets ar_chan = 0.	2013-01-28 18:30:38 +00:00
Tom Stellard	0ba0926861	r600g: More robust checks for MOVA_INT instructions	2013-01-28 18:30:37 +00:00
Vincent Lejeune	a871e01174	r600g/llvm: Add dummy export for vs output Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=59588 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2013-01-28 18:30:37 +00:00
Tom Stellard	91a160b19f	r600g: Fix building with --enable-r600-llvm-compiler https://bugs.freedesktop.org/show_bug.cgi?id=59877	2013-01-28 18:30:37 +00:00
Alex Deucher	e110c98cae	r600g: don't emit WAIT_UNTIL on cayman/TN (v2) It shouldn't be needed and older kernels don't support it. v2: Replace with PS partial flush as before. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=59945 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Marek Olšák <maraeo@gmail.com>	2013-01-28 12:11:27 -05:00
Jerome Glisse	325422c494	r600g: add async for staging buffer upload v2 v2: Add virtual address to dma src/dst offset for cayman Signed-off-by: Jerome Glisse <jglisse@redhat.com>	2013-01-28 11:30:35 -05:00
Jerome Glisse	bff07638a8	r600g: add multi ring support with dma as first second ring v4 We keep track of ring emission order in a stack, whenever we need to flush we empty the stack in a fifo order. There is few helpers function for bo mapping and other ring activities that will make sure that the ring stack is properly flush and submitted. v2: fix st flush path, and other flush path to properly flush all rings if necessary v3: - improve name of ring helpers - make sure that each time a cs is gona be written it endup at top of the stack to avoid any issue such as : STACK[0] = dma (withbo A,B) STACK[1] = gfx (withbo C,D) Now if code try to emit a dma command relative to bo C or D it will start writting cmd stream into the cs and once it reach the point where it adds relocation it will flush. At that point the cs will have cmd that don't have proper relocation into the relocation buffer and kernel will just refuse to run. v4: - Drop the stack idea as it turn out there is no way to use it or benefit from it. Any time the driver start command on other ring, it always need to flush the previous ring. So make code simpler by not using a stack. Signed-off-by: Jerome Glisse <jglisse@redhat.com>	2013-01-28 11:30:35 -05:00
Jerome Glisse	6c064fd749	radeon/winsys: add dma ring support to winsys v3 Add ring support, you can create a cs for each ring. DMA ring is bit special regarding relocation as you must emit as much relocation as there is use of the buffer. v2: - Improved comment on relocation changes - Use a single thread to queue cs submittion this simplify driver code while not impacting performances. Rational for this is that you have to wait for all previous submission to have completed so there was never a case while we could have 2 different thread submitting a command stream at the same time. This code just consolidate submission into one single thread per winsys. v3: - Do not use semaphore for empty queue signaling, instead use cond var. This is because it's tricky to maintain an even number of call to semaphore wait and semaphore signal (the number of cs in the stack would for instance make that number vary). Signed-off-by: Jerome Glisse <jglisse@redhat.com>	2013-01-28 11:30:35 -05:00
Roland Scheidegger	cbf0f66631	gallivm,draw,llvmpipe: mass rename of unit->texture_unit/sampler_unit Make it obvious what "unit" this is (no change in functionality). draw still uses "unit" in places where it changes the shader by adding texture sampling itself - it seems like this can't work with shaders using dx10-style sample opcodes (can't mix gl-style and dx10-style sample instructions in a shader). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-01-28 06:58:06 -08:00
Roland Scheidegger	c789b981b2	gallivm: split sampler and texture state Split the sampler interface to use separate sampler and texture (sampler_view) state. This is needed to support dx10-style sampling instructions. This is not quite complete since both draw/llvmpipe don't really track textures/samplers independently yet, as well as the gallivm code not quite using the right sampler or texture index respectively (but it should work for the sampling codes used by opengl). We are however losing some optimizations in the process, apply_max_lod will no longer work, and we potentially could end up with more (unnecessary) recompiles (if switching textures with/without mipmaps only so it shouldn't be too bad). v2: don't use different callback structs for sampler/sampler view functions (which just complicates things), fix up sampling code to actually use the right texture or sampler index, and similar for llvmpipe/draw actually distinguish between samplers and sampler views. v3: fix more of PIPE_MAX_SAMPLER / PIPE_MAX_SHADER_SAMPLER_VIEWS mismatches (both in draw and llvmpipe), based on feedback from José get rid of unneeded static sampler derived state.(which also fixes the only 2 piglit regressions due to a forgotten assignment), fix comments based on Brian's feedback. v4: remove some accidental unrelated whitespace changes Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-01-28 06:50:36 -08:00
Marek Olšák	87592cff57	gallium/u_upload_mgr: fix a serious memory leak It can eat all memory and crash in a matter of minutes with r600g.	2013-01-28 02:51:52 +01:00

1 2 3 4 5 ...

54954 commits