fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-07 13:38:06 +02:00

Author	SHA1	Message	Date
Eric Anholt	aa76ba6f2f	vc4: DCE instructions with a NULL destination. I'm going to add an optimization for redundant SF update removal, which will just remove the SF and leave us (in many cases) with an instruction with a NULL destination and no side effects. Rather than teaching that pass whether the whole instruction can be removed, leave that responsibility to this pass.	2016-07-04 16:33:22 -07:00
Eric Anholt	2a8973fb78	vc4: Mark texturing setup instructions as having side effects. We need to not DCE them even though they don't have a destination in QIR. We also shouldn't relocate them in vc4_opt_vpm. Neither of these things happen, but I'm about to make DCE consider instructions with a NULL destination.	2016-07-04 16:33:22 -07:00
Eric Anholt	44df374a9c	vc4: Fix a pasteo in scheduling condition flag usage. Noticed by code inspection. This hasn't been too big of a deal, because our cond usages all start out as adder ops, either MOVs or the FTOI for Z writes. MOVs can get converted to mul ops during scheduling, but apparently we hadn't hit this.	2016-07-04 16:33:22 -07:00
Eric Anholt	eaa53f80d9	vc4: Drop the dead QIR_PACK() macro. This isn't used since we switched to using the dst.pack field instead of custom instructions.	2016-07-04 16:33:18 -07:00
Marek Olšák	5c92c21369	radeonsi: do compilation from si_create_shader_selector asynchronously Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	84824935cf	radeonsi: don't lock shader cache mutex during compilation to allow multiple shaders to be compiled simultaneously. ALso, shader-db can again use all 4 cores. v2: Remove the pipe_mutex_unlock call in the error path. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)	2016-07-05 00:47:13 +02:00
Marek Olšák	850cd953b1	radeonsi: separate the compilation chunk of si_create_shader_selector The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	6781a2a994	radeonsi: move LLVMTargetMachineRef creation to a separate function Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	8a4ace4a47	gallium/radeon: add and use radeon_info::max_alloc_size (v2) v2: - squashed the patches - use INT_MAX - clamp max_const_buffer_size - check the DRM version in radeon Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Vedran Miletić <vedran@miletic.net>	2016-07-05 00:47:13 +02:00
Marek Olšák	027ad71b57	radeonsi: print LLVM IRs to ddebug logs Getting LLVM IRs of hanging shaders have never been easier. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	28a03be06b	radeonsi: enable string markers and record apitrace call numbers Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:13 +02:00
Marek Olšák	642cf400aa	ddebug: add an option to dump info about a specific apitrace call Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	1daec2b795	ddebug: implement pipe_context::generate_mipmap Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	50b2235478	ddebug: record and dump apitrace call numbers Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	861ecf1ca9	ddebug: implement emit_string_marker and remove some obsolete comments Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	a446c40e0a	gallium/radeon: remove unused code - radeon_llvm_util.* Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	eaccc4e8c8	radeonsi: keep using v_rcp_f32 for division in future LLVM (v2) This will be needed after some LLVM changes that haven't landed yet. v2: - use LLVMIsConstant to fix an LLVM assertion failure. LLVMSetMetadata doesn't work with constants. - don't set float metadata as string Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	1c00086746	radeonsi: remove an obsolete comment It's not true. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	4d1f32376d	radeonsi: don't interpolate colors if flatshading is enabled use v_interp_mov for those Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	4accb02d7a	radeonsi: enable the barycentric optimization in all cases Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	476e9cee1d	radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	a675c6a000	radeonsi: split ps.prolog.force_persample_interp into persp and linear bits This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Marek Olšák	61010cfac0	radeonsi: don't dump the shader key for non-monolithic shaders early It's always zero. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-05 00:47:12 +02:00
Jan Vesely	015e2e0fce	r600g: Add double precision FMA ops Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782 Fixes: `54c4d525da` ("r600g: Enable FMA on chips that support it") Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Tested-by: James Harvey <lothmordor@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2016-07-05 00:47:12 +02:00
Francesco Ansanelli	9827fc3f03	r600: fix duplicate 'const' declaration Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-07-04 21:26:31 +02:00
Topi Pohjolainen	2a60654f56	i965/urb: Allow blorp to record current settings This makes it possible to skip urb re-configuration if the subsequent renders agree with the settings. Also allows blorp to allocate the maximun amount of vs entries available. Core upload logic already knows how to calculate this. Helps one synthetic benchmark. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-04 20:43:11 +03:00
Topi Pohjolainen	39fdee6b2d	i965/blorp/gen7+: Do not trigger push constant space reconfig Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 20:43:11 +03:00
Topi Pohjolainen	cc2d0e64c0	i965/blorp/gen7+: Stop trashing push constant allocation Packet 3DSTATE_CONSTANT_PS is still emitted explicitly as ps stage itself is enabled and hardware may try to prefetch constants from the buffer. From the BSpec: 3D Pipeline - Windower - 3DSTATE_PUSH_CONSTANT_ALLOC_PS "Specifies the size of the PS constant buffer. This value will determine the amount of data the command stream can pre-fetch before the buffer is full." This is not possible on gen6. From the BSpec about 3DSTATE_CONSTANT_PS: "This packet must be followed by WM_STATE." Binding table emissions for stages other than PS can be now dropped, they were only needed for the 3DSTATE_CONSTANT_XS to be effective: From the BSpec: "The 3DSTATE_CONSTANT_* command is not committed to the shader unit until the corresponding (same shader) 3DSTATE_BINDING_TABLE_POINTER_* command is parsed." Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 20:43:11 +03:00
Topi Pohjolainen	175e095744	i965/blorp: Remove support for push constants Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 20:43:11 +03:00
Topi Pohjolainen	46e1132b80	i965/blorp: Use flat inputs instead of uniforms v2 (Jason): Use LOAD_INPUT() macro Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 20:43:11 +03:00
Topi Pohjolainen	07db95c24d	i965/blorp: Fix the size requirement for vertex elements v2: Rebased as this is needed before flat inputs are enabled Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-04 20:43:11 +03:00
Topi Pohjolainen	741a245ae4	i965/blorp: Load tranformation coordinates as vec4 In preparation for loading as flat vertex input. v2: Use LOAD_INPUT() macro Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 20:43:11 +03:00
Topi Pohjolainen	01f2f364d4	i965/blorp: Rename LOAD_UNIFORM to LOAD_INPUT Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 20:43:11 +03:00
Topi Pohjolainen	641868103c	i965/blorp: Organize pixel kill and blend/scaled inputs into vec4s In addition, as these are never used in parallel, add a few assertions. v2 (Jason): Skip some complexity by putting them into a union but pad rectangle grid into a vec4 instead. Also keep the LOAD_UNIFORM macro. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 20:43:11 +03:00
Lionel Landwerlin	dbbc4fb4cc	anv/wsi: create swapchain images using specified image usage The image usage specified by the caller of vkCreateSwapchainKHR should be passed onto the internal image creation. Otherwise the driver might later crash when the user tries to use the image as a combined sampler even though the creation was explicitly created with VK_IMAGE_USAGE_TRANSFER_SRC_BIT. Leaving the previous VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT as this might be expected even if the swapchain is created without any flag. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96791 Cc: "12.0" <mesa-stable@lists.freedesktop.org>	2016-07-04 10:15:48 -07:00
Indrajit Das	51227b41c6	radeon/uvd: fix overflow error while calculating bit stream buffer size Reviewed-by: Christian König <christian.koenig@amd.com>	2016-07-04 11:38:05 +02:00
Topi Pohjolainen	9e3774a460	i965/blorp: Prepare for more than two vertex attributes Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 09:05:02 +03:00
Topi Pohjolainen	e762354309	i965/blorp: Tell vertex fetcher about flat inputs Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 09:04:38 +03:00
Topi Pohjolainen	89e6b4ef5d	i965/blorp: Add support for flat input buffer Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 09:04:00 +03:00
Topi Pohjolainen	9b2fa17e97	i965/blorp: Store input read mask Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 09:03:41 +03:00
Topi Pohjolainen	73f78ab44b	i965/blorp: Rename push constants to inputs Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 08:37:51 +03:00
Topi Pohjolainen	f2c472fcb3	i965/blorp: Use core vertex buffer state setup Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 08:37:44 +03:00
Topi Pohjolainen	4f7e68799f	i965/blorp: Split vertex data and element setup Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 08:33:41 +03:00
Topi Pohjolainen	575c8cbb54	i965: Unify vertex buffer setup On gen >= 8 one doesn't provide ending address but number of bytes available. This is relative to the given offset. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 08:33:41 +03:00
Topi Pohjolainen	bdab945edd	i965/draw: Expose vertex buffer state setup Also change the interface to use start and end offsets. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-07-04 08:33:41 +03:00
Rob Clark	7295428e41	freedreno: fix crash on smaller gpus and higher resolutions Devices with smaller GMEM size need more tiles. On db410c at 2048x1152, glmark2 shadow needed ~330 tiles for fullscreen. Lets bump it up to 512. (Maybe with MRT you could end up needing more, but at that point things are probably going to be painfully slow.) Signed-off-by: Rob Clark <robdclark@gmail.com>	2016-07-03 11:16:28 -04:00
Rob Clark	01ccb0d91e	i965: don't drop const initializers in vector splitting Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-02 09:00:19 -04:00
Rob Clark	f78a6b1ce3	glsl: add driconf to zero-init unintialized vars Some games are sloppy.. perhaps because it is defined behavior for DX or perhaps because nv blob driver defaults things to zero. So add driconf param to force uninitialized variables to default to zero. This issue was observed with rust, from steam store. But has surfaced elsewhere in the past. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-07-02 09:00:19 -04:00
Rob Clark	202710d110	freedreno/ir3: support glsl linking for cmdline compiler For .vert/.frag, now multiple can be specified on the cmdline for purposes of linking, and the last one specified is the one that is fed into the ir3 backend (and dumped along the way if --verbose is specified) Without this, varyings in frag shaders would appear as undefined. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-07-02 09:00:19 -04:00
Rob Clark	07cfe4e6aa	glsl/standalone: initialize MaxUserAssignableUniformLocations Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-07-02 09:00:19 -04:00

1 2 3 4 5 ...

82942 commits