fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2025-12-22 17:50:12 +01:00

Author	SHA1	Message	Date
Jason Ekstrand	fa6e74e33e	intel/fs: Handle flag read/write aliasing in needs_src_copy In order to implement the ballot intrinsic, we do a MOV from flag register to some GRF. If that GRF is used in a SEL, cmod propagation helpfully changes it into a MOV from the flag register with a cmod. This is perfectly valid but when lower_simd_width comes along, it simply splits into two instructions which both have conditional modifiers. This is a problem since we're reading the flag register. This commit makes us check whether or not flags_written() overlaps with the flag values that we are reading via the instruction source and, if we have any interference, will force us to emit a copy of the source. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-25 16:14:09 -07:00
Jordan Justen	b35e8c3b86	intel/nir: Zero local index const struct for valgrind & nir_serialize Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-25 12:36:21 -07:00
Rob Clark	2207af032b	meson: extract out variable for nir_algebraic.py Also needed in freedreno/ir3. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-10-24 15:33:40 -04:00
Eric Anholt	e91c3540fc	i965: Fix memmem compiler warnings. gcc is throwing this warning in my meson build: ../src/intel/compiler/brw_eu_validate.c:50:11: warning argument 1 null where non-null expected [-Wnonnull] return memmem(haystack.str, haystack.len, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ needle.str, needle.len) != NULL; ~~~~~~~~~~~~~~~~~~~~~~~ The first check for CONTAINS has a NULL error_msg.str and 0 len. The glibc implementation will exit without looking at any haystack bytes if haystack.len < needle.len, so this was safe, but silence the warning anyway by guarding against implementation variablility. Fixes: `122ef3799d` ("i965: Only insert error message if not already present") Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-10-24 10:51:18 -07:00
Matt Turner	9cd60fce9c	i965/fs: Use align1 mode on ternary instructions on Gen10+ Align1 mode offers some nice features over align16, like access to more data types and the ability to use a 16-bit immediate. This patch does not start using any new features. It just emits ternary instructions in align1 mode. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	8c16c9c677	i965: Add align1 ternary instruction emission support Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	f11fa5ac6c	i965: Add align1 ternary instruction disassembler support Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	6c7fc9b73a	i965: Add align1 ternary instruction-word support Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	3b2c868848	i965: Add align1 ternary instruction support to conversion functions Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	281e8b8f27	i965: Add align1 ternary instruction field encodings Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	5f6ee55e68	i965: Add functions to abstract access to 3src register types Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:17 -07:00
Matt Turner	e15dac319b	i965: Rename brw_inst's functions that access the 3src register type Put hw_ in the name so that it's clear these are the hardware encodings. Similar to commit `9fb8323328` ("i965: Rename brw_inst's functions that access the register type") Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Matt Turner	e7f3b82e03	i965: Rename brw_inst 3src functions in preparation for align1 Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Matt Turner	ba50b538af	i965: Print subreg in units of type-size on ternary instructions The instruction word contains SubRegNum[4:2] so it's in units of dwords (hence the * 4 to get it in terms of bytes). Before this patch, the subreg would have been wrong for DF arguments. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Matt Turner	3f14150e9a	i965: Add functions for brw_reg_type <-> hw 3src type Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Matt Turner	4c857d1f3b	i965: Move brw_reg_type_is_floating_point to brw_reg_type.h I'm going to call this from brw_inst.h, and I don't want to have to include all of brw_reg.h. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-20 15:00:16 -07:00
Jason Ekstrand	59fb59ad54	nir: Get rid of nir_shader::stage It's redundant with nir_shader::info::stage. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-10-20 12:49:17 -07:00
Samuel Iglesias Gonsálvez	9e515cf381	i965/vec4: remove setting default LOD in the backend It is already done in NIR. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-20 08:29:53 +02:00
Samuel Iglesias Gonsálvez	c6d7d09bd0	i965/fs: remove setting default LOD in the backend It is already done in NIR. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-20 08:29:53 +02:00
Kenneth Graunke	68f69ebdcc	i965: Use is_scheduling_barrier instead of schedule_node::is_barrier. Commit `a73116ecc6` tried to make add_barrier_deps() walk to the next barrier, and stop. To accomplish that, it added an is_barrier flag. Unfortunately, this only works half of the time. The issue is that add_barrier_deps() walks both backward (to the previous barrier), and forward (to the next barrier). It also sets is_barrier. Assuming that we're processing instructions in forward order, this means that is_barrier will be set for previous instructions, but not future ones. So we'll never see it, and walk further than we need to. dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 now compiles its shaders in 3.6 seconds instead of 3.3 minutes. Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Pallavi G <pallavi.g@intel.com>	2017-10-19 10:19:20 -07:00
Kenneth Graunke	3d112a7cd4	i965: Move fs_inst::has_side_effects()'s eot check to the parent class. This eliminates a layer of wrapping, and makes a backend_instruction sufficient. The downside is that it exposes 'eot' to the vec4 backend, which it doesn't need, but can basically happily ignore. Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Pallavi G <pallavi.g@intel.com>	2017-10-19 10:19:20 -07:00
Jason Ekstrand	79d403417c	intel/cs: Make thread_local_id a regular builtin param This is a lot more natural than special casing it all over the place. We still have to do a bit of special-casing in assign_constant_locations but it's not special-cased quite as bad as it was before. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:31 -07:00
Jason Ekstrand	8d90e28839	intel/compiler: Allocate pull_param in assign_constant_locations Now that everything is nicely ralloc'd, we can allocate the pull_param array in assign_constant_locations instead of higher up. We can also re-allocate the param array so that it's exactly the needed size. This should save us some memory because we're not allocating the total needed param space for both push and pull. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:31 -07:00
Jason Ekstrand	29737eac98	intel: Allocate prog_data::[pull_]param deeper inside the compiler Now that we're always growing the param array as-needed, we can allocate the param array in common code and stop repeating the allocation everywere. In order to keep things sane, we ralloc the [pull_]param array off of the compile context and then steal it back to a NULL context later. This doesn't get us all the way to where prog_data::[pull_]param is purely an out parameter of the back-end compiler but it gets us a lot closer. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:31 -07:00
Jason Ekstrand	4dfb8b3416	intel/vs: Grow the param array for clip planes Instead of requiring the caller of brw_compile_vs to figure it out, just grow the param array on-demand. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	6bcc5c0c75	intel/cs: Grow prog_data::param on-demand for thread_local_id_index Instead of making the caller of brw_compile_cs add something to the param array for thread_local_id_index, just add it on-demand in brw_nir_intrinsics and grow the array. This is now safe to do because everyone is now using ralloc for prog_data::param. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	b1d1b7222a	intel/compiler: Make brw_nir_lower_intrinsics compute-specific It's already only ever called from brw_compile_cs and only handles compute intrinsics. Let's just make it CS-specific. We can always make it handle other stages again later if we want. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	2db9470d88	intel/compiler: Add a helper for growing the prog_data::param array Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	4efd079aba	intel/compiler: Add a flag for pull constant support The Vulkan driver does not support pull constants. It simply limits things such that we can always push everything. Previously, we were determining whether or not to push things based on whether or not the prog_data::pull_param array is non-null. This is rather hackish and about to stop working. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	cfc7ed75eb	i965: Store image_param in brw_context instead of prog_data This burns an extra 10k of memory or so in the case where you don't have any images. However, if you have several shaders which use images, this should be much less memory. It also gets rid of a part of prog_data that really has nothing to do with the compiler. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	2975e4c56a	intel: Rewrite the world of push/pull params This moves us away to the array of pointers model and onto a model where each param is represented by a generic uint32_t handle. We reserve 2^16 of these handles for builtins that get generated by somewhere inside the compiler and have well-defined meanings. Generic params have handles whose meanings are defined by the driver. The primary downside to this new approach is that it moves a little bit of the work that we would normally do at compile time to draw time. On my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to have any measurable affect on OglBatch7. So, while this may come back to bite us, it doesn't look too bad. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:29 -07:00
Kenneth Graunke	6f5abf3146	i965: Fix output register sizes when multiple variables share a slot. ARB_enhanced_layouts allows multiple output variables to share the same location - and these variables may not have the same sizes. For example, consider these output variables: // consume X/Y/Z components of 6 vectors layout(location = 0) out vec3 a[6]; // consumes W component of the first vector layout(location = 0, component = 3) out float b; Looking at the first declaration, we see that VARYING_SLOT_VAR0 needs 24 components worth of space (vec3 padded out to a vec4, 4 * 6 = 24). But looking at the second declaration, we would think that VARYING_SLOT_VAR0 needs only 4 components of space (a single float padded out to a vec4). nir_setup_outputs() only considered the space requirements of the first declaration it happened to see, so if 'float b' came first, it would underallocate the output register space, causing brw_fs_validator.cpp to assert fail about inst->dst.offset exceeding the register size. Fixes Piglit's tests/spec/arb_enhanced_layouts/execution/component-layout/ vs-to-fs-array-interleave-single-location.shader_test. Thanks to Tim Arceri for finding this bug and writing a test! Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-10-10 17:29:37 -07:00
Kenneth Graunke	03087686ff	i965: Don't try to decode types for non-existent src1. KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks has a MOV that hits this validation path. MOVs don't have a src1 file, but calling brw_inst_src1_type() was tripping on src1.file being BRW_IMMEDIATE_VALUE and the hw_type being something invalid for immediates. To work around this, just pretend src1 is src0 if there isn't a src1. Fixes: `2572c2771d` (i965: Validate "Special Requirements for Handling Double Precision Data Types") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2017-10-10 15:11:35 -07:00
Iago Toral Quiroga	5ec21eb1a0	i965/tes: account for the fact that dvec3/4 inputs take two slots When computing the total size of the URB for tessellation evaluation inputs we were not accounting for this, and instead we were always assuming that each input would take a single vec4 slot, which could lead to computing a smaller read size than required. Specifically, this is a problem when the last input is a dvec3/4 such that its XY components are stored in the the second half of a payload register (which can happen if the offset for the input in the URB is not 64-bit aligned because there are 32-bit inputs mixed in) and the ZW components in the first half of the next, as in this case we would fail to account for the extra slot required for the ZW components. Fixes (requires another fix in CTS currently in review): KHR-GL45.enhanced_layouts.varying_locations KHR-GL45.enhanced_layouts.varying_array_locations Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-10 08:59:54 +02:00
Jason Ekstrand	7463d50580	intel/compiler: Don't propagate cmod into integer multiplies No shader-db change on Sky Lake. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-05 11:54:49 -07:00
Jason Ekstrand	b91ecee04a	intel/compiler: Don't cmod propagate into a saturated operation Shader-db results on Sky Lake: total instructions in shared programs: 12954445 -> 12955125 (0.01%) instructions in affected programs: 141862 -> 142542 (0.48%) helped: 0 HURT: 626 Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-05 11:54:49 -07:00
Matt Turner	2572c2771d	i965: Validate "Special Requirements for Handling Double Precision Data Types" I did not implement: CNL's restriction on 64-bit int + align16, because I don't think we'll ever use this combination regardless of hardware generation. The restriction on immediate DF -> F conversions, because there's no reason to ever generate that, and I don't even know how DF -> F conversions are supposed to work in Align16 since (1) the dst stride must be 1, but (2) the dst stride would have to be 2 for src and dst strides to be aligned.	2017-10-04 14:08:54 -07:00
Matt Turner	98298c7e3d	i965: Fix and enable forgotten validation test I seem to have forgotten I still had work to do.	2017-10-04 14:08:54 -07:00
Matt Turner	122ef3799d	i965: Only insert error message if not already present Some restrictions require something like strides to match between src and dest. For multi-source instructions, I'd rather encapsulate the logic for not inserting already present errors in ERROR_IF than open-coding it multiple places.	2017-10-04 14:08:54 -07:00
Matt Turner	5e76cf153c	i965: Avoid validation error when src1 is not present There can be no violation of the restriction that source offsets are aligned if there is only one source offset.	2017-10-04 14:08:54 -07:00
Matt Turner	cacc229ba0	i965: Remove validate_reg() Replaced by the assembly validator, and in fact gets in the way of writing tests for the assembly validator.	2017-10-04 14:08:54 -07:00
Matt Turner	678d88bcee	i965: Add and use STRIDE and WIDTH macros You'll notice there were bugs in some of the code being replaced. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	1fcdb1cbea	i965: Add GLK, CFL, CNL to test_eu_validate.c	2017-10-04 14:08:54 -07:00
Matt Turner	6db5ec7deb	i965: Fix support for disassembling 64-bit integer immediates The type suffixes were wrong, and the 16 was missing the 0 prefix. Fixes: `92f787ff86` ("i965: Add support for disassembling 64-bit integer immediates") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	7e88f93469	i965/fs: Rewrite fsign64 to skip the float -> double conversion ... without the float -> double conversion. Low power parts have additional restrictions when it comes to operating on 64-bit types, and the instruction used to do the conversion violates one of them: specifically, the restriction that "Source and Destination horizontal stride must be aligned to the same qword". Previously we generated a float and then converted, but we can avoid the conversion by using the same extract-the-sign-bit + or-in-1.0 algorithm by directly operating on the high four bytes of each double-precision component in the result. In SIMD8 and SIMD16 this cuts one instruction from the implementation, and more importantly that instruction is the one which violated the regioning restriction. Along the way I removed some comments that I did not think helped, and some code about double comparisons which does not seem to be necessary today. This prevents validation failures caught by the new EU validation code added in later patches. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	b541945c20	i965/fs: Unpack count argument to 64-bit shift ops on Atom 64-bit operations on Atom parts have additional restrictions over their big-core counterparts (validated by later patches). Specifically, the restriction that "Source and Destination horizontal stride must be aligned to the same qword" is violated by most shift operations since NIR uses a 32-bit value as the shift count argument, and this causes instructions like shl(8) g19<1>Q g5<4,4,1>Q g23<4,4,1>UD where src1 has a 32-bit stride, but the dest and src0 have a 64-bit stride. This caused ~4 pixels in the ARB_shader_ballot piglit test fs-readInvocation-uint.shader_test to be incorrect. Unfortunately no ARB_gpu_shader_int64 test hit this case because they operate on uniforms, and their scalar regions are an exception to the restriction. We work around this by effectively unpacking the shift count, so that we can read it with a 64-bit stride in the shift instruction. Unfortunately the unpack (a MOV with a dst stride of 2) is a partial write, and cannot be copy-propagated or CSE'd. Bugzilla: https://bugs.freedesktop.org/101984	2017-10-04 14:08:54 -07:00
Matt Turner	2082c32950	i965/fs: Don't apply POW/FDIV workaround on Gen10+ The documentation says it applies only to Gens 8 and 9. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-04 14:08:37 -07:00
Matt Turner	d407935327	i965: Fix src0 vs src1 typo A typo caused us to copy src0's reg file to src1 rather than reading src1's as intended. This caused us to fail to compact instructions like mov(8) g4<1>D 0D { align1 1Q }; because src1 was set to immediate rather than architecture file. Fixing this reenables compaction (after the precompact() pass changes the data types): mov(8) g4<1>UD 0x00000000UD { align1 1Q compacted }; Fixes: `1cb0a7941b` ("i965: Switch to using the logical register types") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-04 14:08:24 -07:00
Lionel Landwerlin	d3acc240d0	intel: compiler: vec4: add missing default 0 lod We set a similar default value for LOD in the fs backend for TXS/TXL. Without this we end up generating invalid MOV with a null src. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-10-03 22:50:46 +01:00
Dylan Baker	7a5a986ddd	meson: convert gtest to an internal dependency In truth gtest is an external dependency that upstream expects you to "vendor" into your own tree. As such, it makes sense to treat it more like a dependency than an internal library, and collect it's requirements together in a dependency object. v2: - include with -isystem instead of setting compiler args (Eric) Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2017-10-03 10:02:08 -07:00

... 22 23 24 25 26 ...

1355 commits