fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 15:40:11 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	ab378734f5	intel/eu: Explicitly set EXECUTE_1 where needed Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	8280560705	intel/eu: Make automatic exec sizes a configurable option We have had a feature in codegen for some time that tries to automatically infer the execution size of an instruction from the width of its destination. For things such as fixed function GS, clipper, and SF programs, this is very useful because they tend to have lots of hand-rolled register setup and trying to specify the exec size all the time would be prohibitive. For things that come from a higher-level IR, however, it's easier to just set the right size all the time and the automatic exec sizes can, in fact, cause problems. This commit makes it optional while enabling it by default. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	7a82ad54bb	intel/fs: Rework zero-length URB write handling Originally we tried to handle this case based on slots_valid. However, there are a number of ways that this can go wrong. For one, we throw away any trailing slots which either aren't written or are set to VARYING_SLOT_PAD. Second, even if PSIZ is a valid slot, we may not actually write anything there. Between the lot of these, it was possible to end up in a case where we tried to do a regular URB write but ended up with a length of 1 which is invalid. This commit moves it to the end and makes it based on a new boolean flag urb_written. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6132992cdb	intel/compiler/fs: Set up subgroup invocation as a system value Subgroup invocation is computed using a vector immediate and some dispatch-aware arithmetic. Unfortunately, due to the vector arithmetic, and the fact that it's frequently read 16-wide, it's not something that can easily be CSEd by the back-end compiler. There are a few different possible approaches to this problem: 1) Emit the code to calculate the subgroup invocation on-the-fly and trust NIR to do the CSE. This is what we were doing. 2) Add a back-end instruction for the subgroup ID. This has the advantage of helping the back-end compiler with CSE but has the downside of very poor scheduling for the calculation because it has to be emitted in the back-end. 3) Emit the calculation at the top of the program and re-use the result. This gets rid of the CSE problem but comes at the cost of an extra live register. This commit switches us from 1) to 3). We choose to store the subgroup invocation values as a W type to reduce the impact of the extra live register. Trusting NIR and using 1) was fine but we're soon going to want to use the subgroup invocation value for other things in the back-end compiler and this makes it much easier to do without having to worry about CSE problems. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	295605c930	intel/cs: Push subgroup ID instead of base thread ID We're going to want subgroup ID for SPIR-V subgroups eventually anyway. We really only want to push one and calculate the other from it. It makes a bit more sense to push the subgroup ID because it's simpler to calculate and because it's a real API thing. The only advantage to pushing the base thread ID is to avoid a single SHL in the shader. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6411defdcd	intel/cs: Re-run final NIR optimizations for each SIMD size With the advent of SPIR-V subgroup operations, compute shaders will have to be slightly different depending on the SIMD size at which they execute. In order to allow us to do dispatch-width specific things in NIR, we re-run the final NIR stages for each sIMD width. One side-effect of this change is that we start rallocing fs_visitors which means we need DECLARE_RALLOC_CXX_OPERATORS. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	4e79a77cdc	intel/compiler: Move the destructor from vec4_visitor to backend_shader Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	16ada419d7	i965/fs: Get rid of the early return in brw_compile_cs Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	80ddfab2f5	intel/cs: Rework the way thread local ID is handled Previously, brw_nir_lower_intrinsics added the param and then emitted a load_uniform intrinsic to load it directly. This commit switches things over to use a specific NIR intrinsic for the thread id. The one thing I don't like about this approach is that we have to copy thread_local_id over to the new visitor in import_uniforms. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	25f7453c9e	intel/fs: Mark 64-bit values as being contiguous This isn't often a problem , when we're in a compute shader, we must push the thread local ID so we decrement the amount of available push space by 1 and it's no longer even and 64-bit data can, in theory, span it. By marking those uniforms contiguous, we ensure that they never get split in half between push and pull constants. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	c4c8cba705	intel/cs: Ignore runtime_check_aads_emit for CS It's only set on gen4-5 which clearly don't support compute shaders. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	d4de813d86	intel/cs: Stop setting dispatch_grf_start_reg Nothing ever reads it for compute shaders because it's always 1. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b1a9cdede4	intel/cs: Drop max_dispatch_width checks from compile_cs The only things that adjust fs_visitor::max_dispatch_width are render target writes which don't happen in compute shaders so they're pointless. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1077981eb5	intel/fs: Remove min_dispatch_width from fs_visitor It's 8 for everything except compute shaders. For compute shaders, there's no need to duplicate the computation and it's just a possible source of error. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b299ded02e	intel/fs: use pull constant locations to check for first compile of a shader Before, we bailing in assign_constant_locations based on the minimum dispatch size. The more direct thing to do is simply to check for whether or not we have constant locations and bail if we do. For nir_setup_uniforms, it's completely safe to do it multiple times because we just copy a value from the NIR shader. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	103081c9a9	intel/fs: Retype dest to match value in read[First]Invocation This is what we really wanted all along. Always retyping to D works because that's what get_nir_src() always gives us, at least for 32-bit types. The SPIR-V variants of these operations accept arbitrary types and we need this if we're going to handle 64 or 16-bit values. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	ebaee9da4a	intel/fs: Uniformize the index in readInvocation The index is any value provided by the shader and this can be called in non-uniform control flow so we can't just take component 0. Found by inspection. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	b67230de63	intel/fs: Protect opt_algebraic from OOB BROADCAST indices Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	aa4ff4b98c	i965/fs/nir: Don't stomp 64-bit values to D in get_nir_src Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	ec8c6649f1	i965/fs/nir: Minor refactor of store_output Stop retyping the output of shuffle_64bit_data_for_32bit_write. It's always BRW_REGISTER_TYPE_D which is perfectly fine for writing out. Also, when we change get_nir_src to return something with a 64-bit type for 64-bit values, the retyping will not be at all what we want. Also, retyping the output based on src.type before we whack it back to 32 bits is a problem because the output is always 32 bits. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	030d2b5016	i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_write All callers of this function allocate a fs_reg expressly to pass into it. It's much easier if we just let the helper allocate the register. While we're here, we switch it to doing the MOVs with an integer type so that we don't accidentally canonicalize floats on half of a double. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6197a6b7ac	i965/fs/nir: Simplify 64-bit store_output The swizzles weren't doing any good because swiz is just XYZW. Also, we were emitting an extra set of MOVs because shuffle_64bit_data_for_32bit already does a MOV for us. Finally, the temporary was only ever used inside the inner loop so there's no need for it to actually be an array. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	18fde36ced	intel/fs: Use the original destination region for int MUL lowering Some hardware (CHV, BXT) have special restrictions on register regions when doing integer multiplication. We want to respect those when we lower to DxW multiplication. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	d54f8ec744	intel/fs: Fix integer multiplication lowering for src/dst hazards Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	fd1bcccc2d	intel/fs: Fix MOV_INDIRECT for 64-bit values on little-core The same workaround we need for 64-bit values on little core also takes care of the Ivy Bridge problem and does so a bit more efficiently so we can drop that code while we're here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6041a31e77	intel/eu: Fix broadcast instruction for 64-bit values on little-core We're not using broadcast for any 32-bit types right now since we mostly use it for emit_uniformize on 32-bit buffer indices. However, SPIR-V subgroups are going to need it for 64-bit so let's make it work. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	10e4feed39	intel/eu/reg: Add a subscript() helper This is similar to the identically named fs_reg helper. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	068beb41d8	intel/eu: Just modify the offset in brw_broadcast This means we have to drop const from a variable but it also means that 100% of the code which deals with the offset limit is in one place. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	e3bcc86133	intel/compiler: Add some restrictions to MOV_INDIRECT and BROADCAST These restrictions effectively already existed due to the way we use indirect sources but weren't being directly enforced. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1b8ef49f48	intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all For some reason, the any/all predicates don't work properly with SIMD32. In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read the correct subset of the flag register and you end up getting garbage in the second half. Work around this by using a pair of 1-wide MOVs and scattering the result. This fixes the any/all instructions for SIMD32. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	1f41663007	intel/fs: Use an explicit D type for vote any/all/eq intrinsics The any/all intrinsics return a boolean value so D or UD is the correct type. Unfortunately, get_nir_dest has the annoying behavior of returnning a float type by default. This causes format conversion which gives us -1.0f or 0.0f in the register. If the consumer of the result does an integer comparison to zero, it will give you the right boolean value but if we do something more clever based on the 0/~0 assumption for booleans, this will give the wrong value. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	6c00240bc6	intel/fs: Don't stomp f0.1 in SIMD16 ballot In fragment shaders f0.1 is used for discards so doing ballot after a discard can potentially cause the discard to not happen. However, we don't support SIMD32 fragment shaders yet so this isn't a problem. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	def013a863	intel/fs: Use ANY/ALL32 predicates in SIMD32 We have ANY/ALL32 predicates and, for the most part, they work just fine. (See the next commit for more details.) Also, due to the way that flag registers are handled in hardware, instruction splitting is able to split the CMP correctly. Specifically, that hardware looks at the execution group and knows to shift it's flag usage up correctly so a 2H instruction will write to f0.1 instead of f0.0. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	0d905597fe	intel/fs: Be more explicit about our placement of [un]zip Before, we were careful to place the zip after the last of the split instructions but did unzip on-demand. This changes things so that the unzips go before all of the split instructions and the unzip comes explicitly after all the split instructions. As a side-effect of this change, we now emit the split instruction from highest SIMD group to lowest instead of low to high. We could have kept the old behavior, but it shouldn't matter and this made the code easier. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	fcd4adb9d0	intel/fs: Pass builders instead of blocks into emit_[un]zip This makes it far more explicit where we're inserting the instructions rather than the magic "before and after" stuff that the emit_[un]zip helpers did based on block and inst. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	e8c9e65185	intel/fs: Use a pure vertical stride for large register strides Register strides higher than 4 are uncommon but they can happen. For instance, if you have a 64-bit extract_u8 operation, we turn that into UB -> UQ MOV with a source stride of 8. Our previous calculation would try to generate a stride of <32;8,8>:ub which is invalid because the maximum horizontal stride is 4. To solve this problem, we instead use a stride of <8;1,0>. As noted in the comment, this does not work as a destination but that's ok as very few things actually generate that stride. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-07 10:37:52 -08:00
Jason Ekstrand	172e8e42c4	intel/fs: Don't allocate a param array for zero push constants Thanks to the ralloc invariant of "any pointer returned from ralloc can be used as a context", calling ralloc_size with a size of zero will cause it to allocate at least a header. If we don't have any push constants, then NULL is perfectly acceptable (and even preferred). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2017-11-02 09:55:21 -07:00
Jason Ekstrand	7b4387519c	intel/fs: Alloc pull constants off mem_ctx It doesn't actually matter since the only user of push constants, i965, ralloc_steals it back to NULL but it's more consistent and probably fixes memory leaks in some error cases. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-11-02 09:55:21 -07:00
Jordan Justen	f082d7f64f	intel/compiler: Add functions to get prog_data and prog_key sizes for a stage v2: * Return unsigned instead of size_t. (Ken) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-31 23:36:54 -07:00
Jordan Justen	05b1193361	intel/compiler: Add union types for prog_data and prog_key stages Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-31 23:36:54 -07:00
Jordan Justen	3dcbc5cdaa	intel/compiler: Remove final_program_size from brw_compile_* The caller can now use brw_stage_prog_data::program_size which is set by the brw_compile_* functions. Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-31 23:36:54 -07:00
Carl Worth	540636045f	intel/compiler: add new field for storing program size This will be used by the on disk shader cache. v2: * Set in brw_compile_* rather than brw_codegen_*. (Jason) Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com> [jordan.l.justen@intel.com: Only add to brw_stage_prog_data] Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-31 23:36:54 -07:00
Eric Engestrom	ceaad79f85	i965: remove unused variable Fixes: `2c873060d3` "i965: Delete unused brw_vs_prog_data::nr_attributes field." Cc: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>	2017-10-30 16:32:05 +00:00
Ian Romanick	6403efbe74	glsl: Remove ir_binop_greater and ir_binop_lequal expressions NIR does not have these instructions. TGSI and Mesa IR both implement them using < and >=, repsectively. Removing them deletes a bunch of code and means I don't have to add code to the SPIR-V generator for them. v2: Rebase on 2+ years of change... and fix a major bug added in the rebase. text data bss dec hex filename 8255291 268856 294072 8818219 868e2b 32-bit i965_dri.so before 8254235 268856 294072 8817163 868a0b 32-bit i965_dri.so after 7815339 345592 420592 8581523 82f193 64-bit i965_dri.so before 7813995 345560 420592 8580147 82ec33 64-bit i965_dri.so after Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-30 09:27:09 -07:00
Tapani Pälli	446c5726ec	i965: fix blorp stage_prog_data->param leak Patch uses mem_ctx for allocation to ensure param array gets freed later. ==6164== 48 bytes in 1 blocks are definitely lost in loss record 61 of 193 ==6164== at 0x4C2EB6B: malloc (vg_replace_malloc.c:299) ==6164== by 0x12E31C6C: ralloc_size (ralloc.c:121) ==6164== by 0x130189F1: fs_visitor::assign_constant_locations() (brw_fs.cpp:2095) ==6164== by 0x13022D32: fs_visitor::optimize() (brw_fs.cpp:5715) ==6164== by 0x13024D5A: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:6229) ==6164== by 0x1302549A: brw_compile_fs (brw_fs.cpp:6570) ==6164== by 0x130C4B07: blorp_compile_fs (blorp.c:194) ==6164== by 0x130D384B: blorp_params_get_clear_kernel (blorp_clear.c:79) ==6164== by 0x130D3C56: blorp_fast_clear (blorp_clear.c:332) ==6164== by 0x12EFA439: do_single_blorp_clear (brw_blorp.c:1261) ==6164== by 0x12EFC4AF: brw_blorp_clear_color (brw_blorp.c:1326) ==6164== by 0x12EFF72B: brw_clear (brw_clear.c:297) Fixes: `8d90e28839` ("intel/compiler: Allocate pull_param in assign_constant_locations") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-30 08:19:37 +02:00
Kenneth Graunke	d1b392d060	i965: Delete brw_wm_prog_key::drawable_height. This has been unused since we switched to nir_lower_wpos_ytransform. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-29 20:52:02 -07:00
Topi Pohjolainen	97e01adfd5	intel/compiler/gen9: Pixel shader header only workaround Fixes intermittent GPU hangs on Broxton with an Intel internal test case. There are plenty of similar fragment shaders in piglit that do not use any varyings and any uniforms. According to the documentation special timing is needed between pipeline stages. Apparently we just don't hit that with piglit. Even with the failing test case one doesn't always get the hang. Moreover, according to the error states the hang happens significantly later than the execution of the problematic shader. There are multiple render cycles (primitive submissions) in between. I've also seen error states where the ACTHD points outside the batch. Almost as if the hardware writes somewhere that gets used later on. That would also explain why piglit doesn't suffer from this - most tests kick off one render cycle and any corruption is left unseen. v2 (Ken): Instead of enabling push constants, enable one of the inputs (PSIZ). v3 (Ken, Jason): Use LAYER instead making vulkan emit_3dstate_sbe() happy. Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2017-10-28 10:07:29 +03:00
Kenneth Graunke	2c873060d3	i965: Delete unused brw_vs_prog_data::nr_attributes field. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-10-27 02:53:38 -07:00
Kevin Rogovin	75d10e4c84	intel/compiler: brw_validate_instructions to take const void* instead of void* The disassembler does not (and should not) be modifying the data. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-26 10:43:48 -07:00
Jason Ekstrand	d24311b7b5	intel/compiler: Call nir_lower_system_values in brw_preprocess_nir Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-25 16:14:09 -07:00

1 2 3 4 5 ...

258 commits