fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-22 10:58:08 +02:00

Author	SHA1	Message	Date
Ilia Mirkin	0e9232dbcc	nv50,nvc0: enable TEX_LZ and TXF_LZ There should be minimal gain, if any, for nvc0, but nv50 may end up noticing more often that the lod argument is uniform. This, in turn, will remove the need for some unnecessary transformations, which were being hit due to the checks being done pre-ssa. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-03-18 20:37:52 -04:00
Karol Herbst	09f16de7e6	nvc0/ir: treat FMA like MAD for operand propagation Helps mainly Feral-ported games, due to their use of fma() shader-db changes: total instructions in shared programs : 3901147 -> 3842505 (-1.50%) total gprs used in shared programs : 471258 -> 467359 (-0.83%) total local used in shared programs : 27405 -> 27361 (-0.16%) total bytes used in shared programs : 35749888 -> 35214176 (-1.50%) local gpr inst bytes helped 17 1829 4091 4091 hurt 4 44 3 3 Signed-off-by: Karol Herbst <karolherbst@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2017-03-18 20:15:45 -04:00
Timothy Arceri	9e42b93f33	st/dri: wait for thread to finish before unbinding context Fixes a bunch of piglit crashes that hit an assert() when trying to delete the framebuffer. The assert() was triggered because WinSysDrawBuffer was set to NULL before glDeleteFramebuffers() was called. Tested-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-18 14:15:52 +11:00
Marek Olšák	4b064d16e5	gallium/radeon: formalize that create_batch_query doesn't need pipe_context Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-03-17 18:30:21 +01:00
Marek Olšák	be6173e7d6	gallium/radeon: formalize that create_query doesn't need pipe_context for threaded gallium Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-03-17 18:30:21 +01:00
Marek Olšák	04e6977e5d	gallium/radeon: reference pipe_resource in pipe_transfer for threaded gallium Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-03-17 18:30:21 +01:00
Marek Olšák	03127bb6d5	radeonsi: compile all TGSI compute shaders asynchronously required by threaded gallium Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-03-17 18:30:21 +01:00
Marek Olšák	e9c6953ddb	radeonsi: require that compiler threads are enabled threaded gallium can't use pipe_context's LLVM target machine, because create_shader_selector can be called from a non-driver thread. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-03-17 18:30:21 +01:00
Marek Olšák	080f322f06	trace: remove leftover assertions after pipe_resource wrapping removal	2017-03-17 18:30:21 +01:00
Marek Olšák	6c0a28084d	gallium/u_upload: make the first persistent mapping unsynchronized This is simpler for drivers.	2017-03-17 18:30:21 +01:00
Marek Olšák	c83562ccaa	gallium: implement the backend of threaded GL dispatch Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Tested-by: Mike Lothian <mike@fireburn.co.uk>	2017-03-16 14:14:19 +11:00
Roland Scheidegger	e1f9e9bafd	gallivm: (trivial) remove duplicated line pointed out by clang (stored value never read)	2017-03-16 04:03:29 +01:00
Roland Scheidegger	9d104dfd55	draw: (trivial) remove a unnecessary lp_build_alloca() pointed out by clang (stored value never read)	2017-03-16 04:03:29 +01:00
Ilia Mirkin	e893b3a367	swr: support layer output in geometry shaders This makes bin/gl-3.2-layered-rendering-gl-layer-render fail only with 2DMS_ARRAY, which is expected given the lackluster MSAA support. However all the regular types pass. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-03-15 21:03:11 -04:00
Francisco Jerez	e6469ec43b	gallium/tgsi: Treat UCMP sources as floats to match the GLSL-to-TGSI pass expectations. Currently the GLSL-to-TGSI translation pass assumes it can use floating point source modifiers on the UCMP instruction. See the bug report linked below for an example where an unrelated change in the GLSL built-in lowering code for atan2 (`e9ffd12827`) caused the generation of floating-point ir_unop_neg instructions followed by ir_triop_csel, which is translated into UCMP with a negate modifier on back-ends with native integer support. Allowing floating-point source modifiers on an integer instruction seems like rather dubious design for a transport IR, since the same semantics could be represented as a sequence of MOV+UCMP instructions instead, but supposedly this matches the expectations of TGSI back-ends other than tgsi_exec, and the expectations of the DX10 API. I take no responsibility for future headaches caused by this inconsistency. Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by the above-mentioned glsl front-end commit. Even though the commit that triggered the regression doesn't seem to have made it to any stable branches yet, this might be worth back-porting since I don't see any reason why the bug couldn't have been reproduced before that point. Suggested-by: Roland Scheidegger <sroland@vmware.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817 Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2017-03-15 15:47:14 -07:00
Tim Rowley	a7ce0490e4	swr: validate backend state numAttributes General protection and prevents us from smashing the stack on the first clear state validation (`a7b8d50bcb`). Fixes crash using icc. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-03-15 15:08:59 -05:00
Marek Olšák	0550f3d631	radeonsi: implement TGSI opcodes TEX_LZ and TXF_LZ This massively decreases VGPR spilling for DiRT Showdown, because we no longer have to use v4i32 for 2D fetches when level == 0. We now use v2i32 for those cases. DiRT Showdown - Spilled VGPRs: -26 (-81%) This surprisingly doesn't have any useful effect on performance (+ 0.05%).	2017-03-15 18:17:41 +01:00
Marek Olšák	cca0389c72	gallium: add TGSI opcodes TEX_LZ and TXF_LZ for better code generation in radeonsi	2017-03-15 18:17:41 +01:00
Marek Olšák	bf3cdf0fd3	gallium: add PIPE_CAP_TGSI_TEX_TXF_LZ	2017-03-15 18:17:41 +01:00
Samuel Pitoiset	7751ed39e4	radeonsi: disable sinking common instructions down to the end block Initially this was a workaround for a bug introduced in LLVM 4.0 in the SimplifyCFG pass that caused image instrinsics to disappear (because they were badly sunk). Finally, this is a win because it decreases SGPR spilling and increases the number of waves a bit. Although, shader-db results are good I think we might want to remove it in the future once the issue is fixed. For now, enable it for LLVM >= 4.0. This also fixes a rendering issue with the speedometer in Dirt Rally. More information can be found here https://reviews.llvm.org/D26348. Thanks to Dave Airlie for the patch. v2: - add a FIXME comment - use if (HAVE_LLVM >= 0x0400) instead Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: 17.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-15 14:24:40 +01:00
Samuel Pitoiset	74265fd03c	tgsi: add missing compute shader entry in tgsi_get_processor_name() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-15 14:16:29 +01:00
Samuel Pitoiset	38ee3246d2	radeonsi: clean up tex_fetch_ptrs() Will also help when the src sampler register will be TGSI_FILE_CONSTANT for bindless. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-15 14:16:26 +01:00
Emil Velikov	858170e8a4	winsys/amdgpu: use drmGetDevice2 API Analogous to previous commit Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98502 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Tested-by: Mike Lothian <mike@fireburn.co.uk>	2017-03-15 11:37:58 +00:00
Dave Airlie	686d060458	r600: refactor binding code for attach buffer to CB. This refactors out the code and fixes it up to be used for images later. It uses the code in the current RAT binding for compute. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-15 14:33:26 +10:00
Dave Airlie	222e42e45f	r600: refactor out CB setup. This moves the code to create CB info out into a separate function so it can be reused in images code to create RATs. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-15 14:33:23 +10:00
Dave Airlie	0cf717821e	r600: refactor texture resource words setup code. This refactors out the code to setup a texture resource so we can reuse it later from the images code. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-15 14:33:06 +10:00
Dave Airlie	95a976b651	r600: factor out the code to initialise a buffer resource. This takes the code required to initialise a buffer resource out of the texture buffer code, into it's own function. This is going to be used for the image support later. Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-15 14:32:48 +10:00
Dave Airlie	cf2af021b9	r600g: make framebuffer atom rely on dual src blend state. In order to make ARB_shader_image_load_store, we have to share the CB space with RATs, so we should only steal the dual src space if we have dual src enabled. Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-03-15 14:32:44 +10:00
Jason Ekstrand	762a6333f2	nir: Rework conversion opcodes The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <eric@anholt.net>	2017-03-14 07:36:40 -07:00
Marek Olšák	cdbe4990cd	gallium/radeon: disable the shader cache if dumping shaders otherwise, cached shaders aren't dumped. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-03-13 23:34:52 +01:00
Marek Olšák	71a2e4e945	radeonsi: mark all bound shader buffer ranges as initialized This should prevent cases when a buffer was incorrectly mapped without synchronization just because this wasn't done. Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-03-13 23:34:52 +01:00
Julien Isorce	9df3f28a8b	gallium/hud: check NULL return from u_upload_alloc Fixes the following segmentation fault: signal SIGSEGV: invalid address (fault address: 0x0) frame #0: 0x00007fffe718e117 radeonsi_dri.so hud_draw_background_quad hud_context.c:170 167 168 assert(hud->bg.num_vertices + 4 <= hud->bg.max_num_vertices); 169 -> 170 vertices[num++] = (float) x1; 171 vertices[num++] = (float) y1; 172 173 vertices[num++] = (float) x1; (lldb) bt * frame #0: 0x00007fffe718e117 radeonsi_dri.so`hud_draw_background_quad frame #1: 0x00007fffe718f458 radeonsi_dri.so`hud_draw frame #2: 0x00007fffe712967f radeonsi_dri.so`dri_flush Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-03-13 17:20:21 +01:00
Julien Isorce	d08c0930af	winsys/radeon: check null return from radeon_cs_create_fence in cs_flush Follow-up of patch: "radeon_cs_create_fence: check null return from radeon_winsys_bo_create" radeon_drm_cs_flush radeon_cs_create_fence radeon_winsys_bo_create Signed-off-by: Julien Isorce <jisorce@oblong.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-03-13 17:19:29 +01:00
Julien Isorce	d09edb0146	winsys/radeon: check null in radeon_cs_create_fence Fixes the following segmentation fault: radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c -> if (!bo->handle) (gdb) bt 0 radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c 1 0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c 2 0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c Signed-off-by: Julien Isorce <jisorce@oblong.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-03-13 17:17:30 +01:00
Rob Clark	f805593b12	freedreno/ir3: fragz cannot be half precision Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-03-13 10:33:07 -04:00
Rob Clark	b1df639db6	freedreno/ir3: optimize less in glsl Rely on nir for optimization, to reduce compile times. Very minimal impact on shader-db: total instructions in shared programs: 104170 -> 104199 (0.03%) total dwords in shared programs: 209664 -> 209728 (0.03%) total full registers used in shared programs: 7156 -> 7161 (0.07%) total half registers used in shader programs: 109 -> 109 (0.00%) total const registers used in shared programs: 24222 -> 24224 (0.01%) half full const instr dwords helped 12 107 103 112 98 hurt 11 104 105 115 102 But shader db runtime dropped from ~29.3s user to ~20.4s user. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-03-13 10:33:07 -04:00
Christian König	8dee325752	svga: handle P016 format as well Fixes: `62cff79378` ("gallium: add P016 format") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100180 Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2017-03-13 12:49:41 +01:00
Christian König	5369b5a91d	st/va: add config support for 10bit decoding v2 Advertise 10bpp support if the driver supports decoding to a P016 surface. v2: Advertise 10bpp for the decoder as well. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Mark Thompson <sw@jkqxz.net>	2017-03-13 08:51:44 +01:00
Christian König	e9d3e29bb3	st/va: add support for allocating 10bpp surfaces We support P010 and P016 as targets for 10bpp video decoding. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mark Thompson <sw@jkqxz.net>	2017-03-13 08:51:41 +01:00
Christian König	e58a1e8f68	st/va: add support for P010 and P016 formats v3 No hardware I know off can actually support P010 natively. But we can easily support P016 and as long as nobody decodes anything into the lower 6bits it doesn't make any difference to P010. v2: allow P0160 for post processing as well v3: fix post processing once more Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mark Thompson <sw@jkqxz.net>	2017-03-13 08:51:38 +01:00
Christian König	f1d1deb015	st/va: clear the video surface on allocation This makes debugging of decoding problems quite a bit easier. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mark Thompson <sw@jkqxz.net>	2017-03-13 08:51:35 +01:00
Christian König	1ce68af07b	st/va: cleanup error handling in vlVaCreateSurfaces2 No need to have that twice. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mark Thompson <sw@jkqxz.net>	2017-03-13 08:51:32 +01:00
Christian König	88f3451083	radeon/uvd: enable 10bit HEVC decode v2 Just use whatever the state tracker allocated. v2: fix msb mode Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mark Thompson <sw@jkqxz.net>	2017-03-13 08:51:29 +01:00
Christian König	3e1e441aa0	radeon/UVD: fix the decoding target pitch calculation The firmware expects the value in pixel not bytes. Didn't made a difference so far because we only used 8bpp surfaces. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mark Thompson <sw@jkqxz.net>	2017-03-13 08:51:25 +01:00
Christian König	cee591a224	vl/video_buffer: add support for P016 Just simply the description of the planes. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mark Thompson <sw@jkqxz.net>	2017-03-13 08:51:22 +01:00
Christian König	62cff79378	gallium: add P016 format Same layout as NV12, but 16bit per channel instead of 8. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mark Thompson <sw@jkqxz.net>	2017-03-13 08:51:07 +01:00
Timothy Arceri	ca76a2ba1b	gallium/util: replace pipe_thread_setname() with u_thread_setname() They do the same thing we just moved the function to be accessible to all of Mesa. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-12 17:49:04 +11:00
Timothy Arceri	14e6b86952	gallium/util: replace pipe_thread_get_time_nano() with u_thread_get_time_nano() They do the same thing we just moved the function to be accessible to all of Mesa. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-12 17:49:04 +11:00
Timothy Arceri	f8cc4c25b8	gallium/util: replace pipe_thread_create() with u_thread_create() They do the same thing we just moved the function to be accessible to all of Mesa. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-12 17:49:04 +11:00
Timothy Arceri	b822d9dd67	gallium/util: move u_queue.{c,h} to src/util This will allow us to use it outside of gallium for things like compressing shader cache entries. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-03-12 17:49:03 +11:00

1 2 3 4 5 ...

30386 commits