fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-06 07:18:17 +02:00

Author	SHA1	Message	Date
Kenneth Graunke	2fd79ebe8f	i965: Fix JIP to skip over sibling do...while loops. We've apparently always been botching JIP for sequences such as: do cmp.f0.0 ... (+f0.0) break ... do ... while ... while Because the "do" instruction doesn't actually exist, the inner "while" is at the same depth as the "break". brw_find_next_block_end() thus mistook the inner "while" as the end of the loop containing the "break", and set the "break" to point to the wrong place. Only "while" instructions that jump before our instruction are relevant. We need to ignore the rest, as they're sibling control flow nodes (or children, but this was already handled by the depth == 0 check). See also commit `1ac1581f38`. This prevents channel masks from being screwed up, and fixes GPU hangs() in dEQP-GLES31.functional.shaders.multisample_interpolation. interpolate_at_sample.centroid_qualified.multisample_texture_16. The test ended up executing code with no channels enabled, and that code contained FIND_LIVE_CHANNEL, which returned 8 (out of range for a SIMD8 program), which then was used in indirect GRF addressing, which randomly got a boolean value (0xFFFFFFFF), interpreted it as a sample ID, OR'd it into an indirect send message descriptor, which corrupted the message length, sending a pixel interpolator message with mlen 15, which is illegal. Whew :) () Technically, the test doesn't GPU hang currently, but only because another bug prevents it from issuing pixel interpolator messages entirely...with that fixed, it hangs. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-05-16 00:20:07 -07:00
Kenneth Graunke	2f02fad6b3	i965: Make a "does this while jump before our instruction?" helper. I need to use this in an additional place. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-05-16 00:19:53 -07:00
Kenneth Graunke	b6f250d7f2	i965: Send the minimal number of STATE_BASE_ADDRESS packets. STATE_BASE_ADDRESS stalls the whole pipeline, and the documentation cautions us to emit it as little as possible for better performance. We recently put some hacks in BLORP to try and avoid emitting it if it was already set correctly. However, this wasn't quite minimal: if BLORP is the first operation (i.e. glClear()), then it would emit it, and subsequent draw calls would emit it again. This caused a small drop in performance in GPUTest Triangle when switching from Meta to BLORP. Unlike most packets, STATE_BASE_ADDRESS isn't influenced by GL state: it needs to be emitted once per batch, before most other commands, or whenever we change the program cache BO. It's also valid in both the 3D and compute pipelines, which makes it even more unique. This patch removes it from the atom mechanism and instead directly calls it as part of every draw, compute dispatch, or BLORP operation. We introduce a new flag indicating that STATE_BASE_ADDRESS has already been emitted this batch, and if so, skip doing it again. When we make a new program cache BO, we simply reset the flag, so the next operation will emit it again. When we flush/reset the batch, we reset the flag. This guarantees that we'll emit STATE_BASE_ADDRESS only when we have to. It's also less code than the old atom mechanism. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-16 00:11:51 -07:00
Kenneth Graunke	97179c606c	i965: Combine Gen4-7 and Gen8+ state base address emitters. We're about to start calling it directly, and this means the callers won't have to think about generations. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-16 00:11:50 -07:00
Kenneth Graunke	7b70a12e1c	i965: Move Gen4-5 programs to brw_upload_programs() too. This way all the programs are in one place again, and it also should make some future STATE_BASE_ADDRESS related changes possible. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-16 00:11:49 -07:00
Kenneth Graunke	b23b099a0b	i965: Mark brw const in brw_state_dirty and callers. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-16 00:11:43 -07:00
Kenneth Graunke	8e71ac731b	glsl: Don't do constant propagation in opt_constant_folding. opt_constant_folding is supposed to fold trees of constants into a single constant. Surprisingly, it was also propagating constant values from variables into expression trees - even when the result couldn't be folded together. This is opt_constant_propagation's job. The ir_dereference_variable::constant_expression_value() method returns a clone of var->constant_value. So we would replace the dereference with a constant, propagating it into the tree. Skip over ir_dereference_variable to avoid this surprising behavior. However, add code to explicitly continue doing it in the constant propagation pass, as it's useful to do so. shader-db statistics on Broadwell: total instructions in shared programs: 8905349 -> 8905126 (-0.00%) instructions in affected programs: 30100 -> 29877 (-0.74%) helped: 93 HURT: 20 total cycles in shared programs: 71017030 -> 71015944 (-0.00%) cycles in affected programs: 132456 -> 131370 (-0.82%) helped: 54 HURT: 45 The only hurt programs are by a single instruction, while the helped ones are helped by 1-4 instructions. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-15 23:59:39 -07:00
Kenneth Graunke	db8fcbbaf9	glsl: Avoid excess tree walking when folding ir_dereference_arrays. If an ir_dereference_array has non-constant components, there's no point in trying to evaluate its value (which involves walking down the tree and possibly allocating memory for portions of the subtree which are constant). This also removes convoluted tree walking in opt_constant_folding(), which tries to fold constants while walking up the tree. No need to walk down, then up, then down again. We did this for swizzles and expressions already, but I was lazy back in the day and didn't do this for ir_dereference_array. No change in shader-db. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-15 23:59:33 -07:00
Kenneth Graunke	329fe93210	glsl: Consolidate duplicate copies of constant folding. We could probably clean this up more (maybe make it a method), but at least there's only one copy of this code now, and that's a start. No change in shader-db. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-15 23:59:20 -07:00
Kenneth Graunke	3bf27a9a00	glsl: Remove bonus tree walking in opt_constant_folding(). It looks like this was missed when converting opt_constant_folding() from a hierarchical visitor to an rvalue visitor in `6606fde3`. ir_rvalue_visitor already processes values on the way back up the tree, so we will have already visited every child node. There's no point in doing it again. No change in shader-db. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-15 23:59:10 -07:00
Kenneth Graunke	8e59670bcf	glsl: Make opt_constant_variable() bail in useless cases. The pass ultimately skips over any entries with assignment_count != 1, so there's no need to do further work once we've determined that there are multiple assignments. The constant value could be a large array (i.e. uvec4[327]), at which point skipping the constant_expression_value() call (and the clone() call within) can save us piles of memory. No change in shader-db. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2016-05-15 23:59:05 -07:00
Kenneth Graunke	c907ca6c8d	i965: Flip interpolateAtOffset's y offset when necessary. Fixes 4 dEQP-GLES31.functional.shaders.multisample_interpolation tests: - interpolate_at_offset.no_qualifiers.default_framebuffer - interpolate_at_offset.centroid_qualifier.default_framebuffer - interpolate_at_offset.sample_qualifier.default_framebuffer - interpolate_at_offset.array_element.default_framebuffer Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-15 23:50:52 -07:00
Kenneth Graunke	6d65b0c6dc	nir: Add a nir->info.uses_interp_var_at_offset flag. I've added this to nir_gather_info(), but also to glsl_to_nir() as a temporary measure, since the i965 GL driver today doesn't use nir_gather_info() yet. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-15 23:50:28 -07:00
Kenneth Graunke	d4d7e1516b	glsl: Drop bad ASSERT_TRUE in gl_CullDistance link_varyings test. I don't know what the intention was here, but this function returns void. We can't assert anything about its return value. Fixes "make check" failures. v2: Also fix prototype for the function (caught by Jordan). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2016-05-15 23:49:19 -07:00
Jan Vesely	9525f33164	clover: Handle PIPE_SHADER_IR_NIR in switch Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2016-05-15 20:05:10 -04:00
Rob Clark	277818ecfb	freedreno/ir3: small standalone compiler cleanup Don't hard-code the gpu-id anymore. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-15 17:25:48 -04:00
Rob Clark	f06343d6ea	nir: forward-declare 'struct gl_shader_program' Drop extra #include which is otherwise unneeded (and makes this header difficult to include from outside of src/mesa). Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2016-05-15 17:25:48 -04:00
Rob Clark	79d6409a14	nir: return progress from lower_idiv With algebraic-opt support for lowering div to shift, the driver would like to be able to run this pass after the main opt-loop, and then conditionally re-run the opt-loop if this pass actually lowered some- thing. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-15 17:25:48 -04:00
Rob Clark	f8840f471d	freedreno/ir3: lower fdiv Not sure how we didn't hit this already, but since we want fdiv converted into mul + rcp, we should set this. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-15 17:25:48 -04:00
Rob Clark	53cde5e295	freedreno/ir3: handle VARYING_SLOT_PNTC In the glsl->tgsi path, this already gets translated to VAR8, which matches up with rasterizer->sprite_coord_enable. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-15 17:25:48 -04:00
Rob Clark	2f1581059b	freedreno/ir3: disable TGSI specific hacks in nir case When we got NIR directly from state tracker (vs using tgsi_to_nir) we need to realize this and skip some TGSI specific hacks. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-15 17:25:48 -04:00
Rob Clark	784086f3c1	freedreno/ir3: add support for NIR as preferred IR For now under debug flag, since only suitable for debugging/testing. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2016-05-15 17:25:47 -04:00
Rob Clark	8b24f7b440	nir: fix comment typo about f2d/d2f Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2016-05-15 17:25:47 -04:00
Ilia Mirkin	be2b13e3bf	nv50/ir: avoid asserts when the state tracker feeds us bogus inputs INTERP is defined (by me) to have to have a INPUT source. However the state tracker does not always obey this. This happens due to varying packing logic introducing additional mov's which can't always be undone. Instead of just giving up, we instead try harder to find the original input. This won't always be possible, for example with indirect accesses. There's not much we can (easily) do about that though. This fixes the remaining interpolateAt* failures in dEQP: dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at* some of which were asserting due to INTERP_* being passed a non-input. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-05-15 14:12:56 -04:00
Ilia Mirkin	9323d084ac	nvc0: don't try to go through the push path for indirect draws This fixes dEQP-GLES31.functional.draw_indirect.draw_elements_indirect.*.default_attribute These tests were causing a const vbo to be set up, and were small enough draws that the logic was trying to go via the push path (which emits data directly into the cmd stream rather than uploading a user vbo). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-05-15 10:48:39 -04:00
Ilia Mirkin	2ef3cdb07e	nvc0/ir: make sure to align the second arg of TXD to 4, as we do for TEX This was handled in handleTEX(), however the way the logic works, those extra arguments aren't added on by then, so it did nothing. Instead we must duplicate that bit here. GK110 appears to complain about MISALIGNED_GPR, however it's reasonable to believe that GK104 has the same requirements. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95403 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2016-05-15 10:48:39 -04:00
Tobias Klausmann	8c02939794	nv50,nvc0: add support for cull distances Cull distances are just a special case of clip distances as far as the hardware is concerned. Make sure that the relevant "planes" are enabled, and flip the clip mode to cull for those. Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> [imirkin: add enables on nvc0, add nv50 support] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>	2016-05-15 10:48:39 -04:00
Ilia Mirkin	2ad970ecf4	st/mesa: disable cull distance for now The pass that st/mesa relies on to combine clip and cull distances has been reverted, so we can't expose ARB_cull_distance until that is resolved. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-05-15 10:48:38 -04:00
Jason Ekstrand	09e041d61d	i965: Use blorp for all clears We used to use a meta path on gen8 but we haven't since `c7cf17ae75`. We might as well delete the meta path since blorp works on all gens. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 14:18:21 -07:00
Jason Ekstrand	1cfb4bc890	i965: Use blorp for all stencil blits We used to use a meta path because blorp didn't support 16x MSAA. Now it does, so we don't need the meta paths anymore. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 14:18:21 -07:00
Jason Ekstrand	64f2907030	i965: Use blorp for all updownsample blits We used to use a meta path because blorp didn't support 16x MSAA. Now it does, so we don't need the meta paths anymore. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 14:18:21 -07:00
Jason Ekstrand	f5febc83a7	i965/blorp: Add support for 16x MSAA Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 14:18:21 -07:00
Jason Ekstrand	a32315bd19	i965: move brw_meta_set_fast_clear_color to brw_meta_util.c Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 14:18:21 -07:00
Jason Ekstrand	36529f670f	i965; Move brw_meta_get_*_rect to brw_meta_util.c Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 14:18:21 -07:00
Jason Ekstrand	21034f1b08	i965: Move brw_is_color_fast_clear_compatible to brw_meta_util Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 14:18:21 -07:00
Jason Ekstrand	b05c68fc8a	i965: Move brw_get_rb_for_slice to brw_meta_util Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 14:18:21 -07:00
Jason Ekstrand	672cffee0f	i965/blorp: Get rid of the blorp_prog_data_int() helper The helper was initially created to allow us to set reasonable defaults as we mutated the brw_blorp_prog_data structure in preparation for NIR. Now that everything is going through brw_blorp_compile_nir_shader() which fully fills out the brw_blorp_prog_data structure, we don't need the helper. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:54 -07:00
Jason Ekstrand	c228ea8345	i965/blorp: Delete the old blorp shader emit code Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:54 -07:00
Jason Ekstrand	c18da26abf	i965/blorp: Stop doing f2i(i2f(sample_id)) NIR gets kind of awkward when you have a 3-component vector with two floats and one int. This led to us accidentally going through float for the sample index. It doesn't hurt anything but it also isn't needed. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	e503da61c6	i965/blorp: Refactor coordinate munging The original code-flow tried to map original blorp. This puts things more where they belong and simplifies some of the logic. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	8636937dd6	i965/blorp: Add bilinear blending support to the NIR path Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	6bd7bd6633	i965/blorp: Add support for averaging resolves to the NIR path Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	c7269c1551	i965/blorp: Add MSAA encode/decode support to the NIR path Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	df8c2936cd	i965/blorp: Add support for W-[de]tiling to the NIR path Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	6adb8d6d3a	i965/blorp: Add support for discard-based bounds checks to the NIR path Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	4bdace0791	i965/blorp: Add initial support for NIR-based blit shaders Many of the more complex cases still fall back to the old shader builder. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	b0275ad0c9	i965/blorp: Refactor getting the blit kernel into a helper Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	6df3d75206	i965/blorp: Use NIR for clear shaders Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95373 Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	bb45f42f55	i965/blorp: Create the program key in get_clear_kernel There's no reason to be passing a whole struct around just for a single boolean. We can create it later when we actually need to use it as a key. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00
Jason Ekstrand	c1fe8859d3	i965/blorp: Add a helper for compiling NIR shaders Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2016-05-14 13:34:53 -07:00

1 2 3 4 5 ...

81353 commits