fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-03-17 21:10:35 +01:00

Author	SHA1	Message	Date
Rob Herring	f87330dbce	virgl: reuse screen when fd is already open It is necessary to share the screen between mesa and gralloc to properly ref count resources. This implements a hash lookup on the file description to re-use an already created screen. This is a similar implementation as freedreno and radeon. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2016-02-02 09:58:29 +10:00
Mauro Rossi	6711592c2f	nouveau/video: wrap assertion within #ifndef NDEBUG The change is necessary to avoid the following building error in android: external/mesa/src/gallium/drivers/nouveau/nouveau_vp3_video_bsp.c: In function 'nouveau_vp3_bsp_next': external/mesa/src/gallium/drivers/nouveau/nouveau_vp3_video_bsp.c:269:14: error: 'bsp_bo' undeclared (first use in this function) assert(bsp_bo->size >= str_bsp->w0[0] + num_bytes[i]); ^ This matches the declaration of the variables in question. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-02-01 17:45:19 -05:00
Ilia Mirkin	047b917718	st/mesa: treat a write as a read for range purposes We use this logic to detect live ranges and then do plain renaming across the whole codebase. As such, to prevent WaW hazards, we have to treat a write as if it were also a read. For example, the following sequence was observed before this patch: 13: UIF TEMP[6].xxxx :0 14: ADD TEMP[6].x, CONST[6].xxxx, -IN[3].yyyy 15: RCP TEMP[7].x, TEMP[3].xxxx 16: MUL TEMP[3].x, TEMP[6].xxxx, TEMP[7].xxxx 17: ADD TEMP[6].x, CONST[7].xxxx, -IN[3].yyyy 18: RCP TEMP[7].x, TEMP[3].xxxx 19: MUL TEMP[4].x, TEMP[6].xxxx, TEMP[7].xxxx While after this patch it becomes: 13: UIF TEMP[7].xxxx :0 14: ADD TEMP[7].x, CONST[6].xxxx, -IN[3].yyyy 15: RCP TEMP[8].x, TEMP[3].xxxx 16: MUL TEMP[4].x, TEMP[7].xxxx, TEMP[8].xxxx 17: ADD TEMP[7].x, CONST[7].xxxx, -IN[3].yyyy 18: RCP TEMP[8].x, TEMP[3].xxxx 19: MUL TEMP[5].x, TEMP[7].xxxx, TEMP[8].xxxx Most importantly note that in the first example, the second RCP is done on the result of the MUL while in the second, the second RCP should have the same value as the first. Looking at the GLSL source, it is apparent that both of the RCP's should have had the same source. Looking at what's going on, the GLSL looks something like float tmin_8; float tmin_10; tmin_10 = tmin_8; ... lots of code ... tmin_8 = tmpvar_17; ... more code that never looks at tmin_8 ... And so we end up with a last_read somewhere at the beginning, and a first_write somewhere at the bottom. For some reason DCE doesn't remove it, but even if that were fixed, DCE doesn't handle 100% of cases, esp including loops. With the last_read somewhere high up, we overwrite the previously correct (and large) last_read with a low one, and then proceed to decide to merge all kinds of junk onto this temp. Even if that weren't the case, and there were just some writes after the last read, then we might still overwrite a merged value with one of those. As a result, we should treat a write as a last_read for the purpose of determining the live range. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Dave Airlie <airlied@redhat.com> Cc: mesa-stable@lists.freedesktop.org	2016-02-01 17:40:18 -05:00
Matt Turner	75c9def8ee	i965/gen7+: Use NIR for lowering of pack/unpack opcodes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	f4952421cd	i965/vec4: Implement nir_op_pack_uvec2_to_uint. And mark nir_op_pack_uvec4_to_uint unreachable, since it's only produced by lowering pack[SU]norm4x8 which the vec4 backend does not need. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	955d052058	nir: Add lowering support for unpacking opcodes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	9b8786eba9	nir: Add lowering support for packing opcodes. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	1dc312e295	i965/fs: Implement support for extract_word. The vec4 backend will lower it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	68f8c5730b	nir: Add opcodes to extract bytes or words. The uint versions zero extend while the int versions sign extend. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	8709dc0713	glsl: Remove 2x16 half-precision pack/unpack opcodes. i965/fs was the only consumer, and we're now doing the lowering in NIR. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	1a53a4fc7a	i965/fs: Switch from GLSL IR to NIR for un/packHalf2x16 scalarizing. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	9ce901058f	nir: Add lowering of nir_op_unpack_half_2x16. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	e4278a847e	i965: Make separate nir_options for scalar/vector stages. We'll want to have different lowering options set for scalar/vector stages. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	252d497d4c	i965: Move brw_compiler_create() to new brw_compiler.c. A future patch will want to use designated initalizers, which aren't available in C++, but this is C. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Matt Turner	140a886c41	nir: Make argument order of unop_convert match binop_convert. Strangely the return and parameter types were reversed. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-02-01 10:43:57 -08:00
Marta Lofstedt	77a60ab5dc	mesa: enable enums for OES_geometry_shader Enable GL_OES_geometry_shader enums for OpenGL ES 3.1. V4: EXTRA tokens updated according to comments from Ilia Mirkin. V5: Account for check_extra does not evaluate "or" lazy. Fix issues with EXTRA_EXT_FB_NO_ATTACH_CS. Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-02-01 09:30:50 +01:00
François Tigeot	a48afb92ff	gallium: Add DragonFly support Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2016-01-31 11:56:09 +00:00
Ilia Mirkin	7f19e29305	nv50/ir: get rid of memory stores with nop values This happens especially with exports and varying packing, where the last bits aren't always filled in. We end up trying to do quad-wide stores, which ends up being a lot of register moves that carefully preserve the nop value. Instead don't do the stores. total instructions in shared programs : 6131375 -> 6125267 (-0.10%) total gprs used in shared programs : 910139 -> 895501 (-1.61%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst helped 0 7442 4693 hurt 0 90 2687 Most of the helped/hurt instruction changes are by one or two ops because can no longer do quad-wide stores in all cases. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-30 17:18:41 -05:00
Ilia Mirkin	3ca941d60e	nv50/ir: fix false global CSE on instructions with multiple defs If an instruction has multiple defs, we have to do a lot more checks to make sure that we can move it forward. Among other things, various code likes to do a, b = tex() if () c = a else c = b which means that a single phi node will have results pointing at the same instruction. We obviously can't propagate the tex in this case, but properly accounting for this situation is tricky. Just don't try for instructions with multiple defs. This fixes about 20 shaders in shader-db, including the dolphin efb2ram shader. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2016-01-30 17:18:41 -05:00
Ilia Mirkin	3ca2001b53	nv50,nvc0: fix buffer clearing to respect engine alignment requirements It appears that the nvidia render engine is quite picky when it comes to linear surfaces. It doesn't like non-256-byte aligned offsets, and apparently doesn't even do non-256-byte strides. This makes arb_clear_buffer_object-unaligned pass on both nv50 and nvc0. As a side-effect this also allows RGB32 clears to work via GPU data upload instead of synchronizing the buffer to the CPU (nvc0 only). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> # tested on GF108, GT215 Tested-by: Nick Sarnie <commendsarnex@gmail.com> # GK208 Cc: mesa-stable@lists.freedesktop.org	2016-01-30 16:01:41 -05:00
Rob Clark	f15447e7c9	freedreno/ir3: ignore clip-vertex varying Since we emulate clip-planes, the clip-vertex is used within the VS itself (thanks to nir_lower_clip). So just ignore it as a VS output. Fixes a boatload of piglit tests that were asserting on unknown varying slot. (Also unrelated spelling/typo fix.) Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-30 12:29:21 -05:00
Rob Clark	f20cf22b54	freedreno/ir3: don't ignore local vars With glsl_to_nir we end up with local variables, instead of global, for arrays. Note that we'll eventually have to do something more clever, I think, when we support multiple functions, but that will probably take some work in a few places. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-30 12:27:57 -05:00
Rob Clark	8039a2a6b3	freedreno/ir3: handle tex instrs w/ const offset Something we start to see with glsl_to_nir. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-30 12:27:27 -05:00
Rob Clark	f212d7dc50	freedreno/ir3: support load_front_face intrinsic With tgsi_to_nir we get this as a normal input with VARYING_SLOT_FACE. But glsl_to_nir plus nir_lower_system_values this becomes an intrinsic. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-30 12:11:54 -05:00
Rob Clark	9e05e8cb75	freedreno: limit string marker to max packet size Experimentally derived max size. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-01-30 12:10:13 -05:00
Ilia Mirkin	438d421f8b	nvc0: avoid crashing when there are holes in vertex array bindings When using the "shared" vertex array configuration strategy, we bind each of the buffers as a separate array. However there can be holes in such vertex buffer lists, so just emit a disable for those. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2016-01-29 22:10:42 -05:00
Ilia Mirkin	899b1b98a4	nvc0: enable atomic counters and ssbo Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-29 22:10:42 -05:00
Ilia Mirkin	48cf392c0e	nv50/ir: handle new TGSI MEMBAR opcode Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-29 21:22:48 -05:00
Ilia Mirkin	df043f0764	nvc0/ir: fix atomic compare-and-swap arguments Teach the emitter that the two registers are sequential, and drop the second arg entirely, in favor of a double-wide first argument. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-29 21:22:48 -05:00
Ilia Mirkin	7b9a77b905	nv50/ir: add support for indirect buffer loading Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-29 21:22:48 -05:00
Ilia Mirkin	2c4eeb0b5c	nv50/ir: add SUQ op by reading the info from driver constbuf Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-29 21:22:47 -05:00
Ilia Mirkin	c3083c7082	nv50/ir: add support for BUFFER accesses This largely leaves the existing image logic alone. When image support is added this will have to be harmonized somehow. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-29 21:22:47 -05:00
Ilia Mirkin	abe427ebd2	nvc0: handle shader buffer memory barrier Issue a MEM_BARRIER. No idea if this is sufficient. As there are no tests for this, it'll have to do for now. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-29 21:22:38 -05:00
Ilia Mirkin	fe01be4ad5	nvc0: add state management for shader buffers (address, length) pairs are uploaded to the driver constbuf as well to make these values available to the shaders. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-29 21:06:07 -05:00
Ilia Mirkin	b4688c4615	nvc0: double per-shader stage driver constants area We need to store a lot more info now with per-buffer address/size. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-29 21:06:06 -05:00
Ilia Mirkin	ae725d5746	trace: add support for set_shader_buffers Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) v1 -> v2: add arg_begin/arg_end around buffer array Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-01-29 21:05:47 -05:00
Ilia Mirkin	fea25db925	st/mesa: enable ARB_shader_storage_buffer_object when supported Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-29 21:05:47 -05:00
Ilia Mirkin	6fb8fac853	st/mesa: add shader buffer barrier bit Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-29 21:05:47 -05:00
Ilia Mirkin	792bab24ac	st/mesa: add support for memory barrier intrinsics Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2) v1 -> v2: use TGSI_MEMBAR defines	2016-01-29 21:05:47 -05:00
Ilia Mirkin	c0e1c54a4f	st/mesa: use RESQ to find buffer size Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>	2016-01-29 21:05:47 -05:00
Ilia Mirkin	6880036694	st/mesa: add support for SSBO binding and GLSL intrinsics Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> v1 -> v2: some 80 char reformatting	2016-01-29 21:05:46 -05:00
Ilia Mirkin	9d6f9ccf6b	st/mesa: add atomic counter support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-29 21:05:46 -05:00
Ilia Mirkin	0fddb677e6	mesa: add PROGRAM_IMMEDIATE, PROGRAM_BUFFER This makes PROGRAM_IMMEDIATE a first-class gl_register_file type, and adds PROGRAM_BUFFER to the list. These are used purely inside glsl_to_tgsi conversion. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-29 21:05:35 -05:00
Ilia Mirkin	35f8488668	glsl: keep track of ssbo variable being accessed, add access params Currently any access params (coherent/volatile/restrict) are being lost when lowering to the ssbo load/store intrinsics. Keep track of the variable being used, and bake its access params in as the last arg of the load/store intrinsics. If the variable is accessed via an instance block, then 'variable' points to the instance block variable and not the field inside the instance block that we are accessing. In order to check access parameters for the field itself we need to detect this case and keep track of the corresponding field struct so we can extract the specific field access information from there instead. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) v1 -> v2: add tracking of struct field v2 -> v3: minor adjustments based on Iago's feedback Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2016-01-29 21:05:08 -05:00
Ilia Mirkin	2b089c7ffe	glsl: always initialize image_* fields, copy them on interface init Interfaces can have image properties set in case they are buffer interfaces. Make sure not to lose this information. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-01-29 21:04:56 -05:00
Ilia Mirkin	2ccc42fd2c	tgsi: add MEMBAR opcode to handle memoryBarrier* GLSL intrinsics Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) v1 -> v2: add defines for the various bits Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-01-29 21:04:36 -05:00
Michel Dänzer	30fcf241e1	winsys/amdgpu: Process RADEON_FLAG_* independently from RADEON_DOMAIN_* In particular, AMDGPU_GEM_CREATE_CPU_GTT_USWC can affect even BOs created in VRAM if they get evicted to GTT. In general there's no need to restrict any of the flags to any particular domains. Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2016-01-29 16:06:06 +09:00
Michel Dänzer	62f837e2ea	winsys/amdgpu: Handle RADEON_FLAG_NO_CPU_ACCESS Failing to do this was resulting in the kernel driver unnecessarily leaving open the possibility of CPU access to tiled BOs. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93862 (This change shouldn't be backported to stable branches, because released versions of xf86-video-amdgpu unnecessarily try to map the front buffer) Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2016-01-29 16:06:06 +09:00
Karol Herbst	29d09f8747	nv50/ir: optimize mad/fma with third argument 0 to mul Very modest effect, but it's clearly the right thing to do. total instructions in shared programs : 6131491 -> 6131398 (-0.00%) total gprs used in shared programs : 910157 -> 910131 (-0.00%) total local used in shared programs : 15328 -> 15328 (0.00%) local gpr inst bytes helped 0 55 85 85 hurt 0 26 20 20 Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-28 15:59:41 -05:00
Karol Herbst	3aa681449e	nv50/ir: run DCE backwards Reduces calls up to 50% Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2016-01-28 15:34:29 -05:00

1 2 3 4 5 ...

69132 commits