fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-05-17 13:58:05 +02:00

Author	SHA1	Message	Date
Jason Ekstrand	490d80fd1a	anv/pipeline: Add a mem_ctx parameter to anv_pipeline_compile This lets us avoid some of the manual ralloc stealing and prepares for future commits in which we will want to ralloc prog_data::param. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	cfc7ed75eb	i965: Store image_param in brw_context instead of prog_data This burns an extra 10k of memory or so in the case where you don't have any images. However, if you have several shaders which use images, this should be much less memory. It also gets rid of a part of prog_data that really has nothing to do with the compiler. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:30 -07:00
Jason Ekstrand	2975e4c56a	intel: Rewrite the world of push/pull params This moves us away to the array of pointers model and onto a model where each param is represented by a generic uint32_t handle. We reserve 2^16 of these handles for builtins that get generated by somewhere inside the compiler and have well-defined meanings. Generic params have handles whose meanings are defined by the driver. The primary downside to this new approach is that it moves a little bit of the work that we would normally do at compile time to draw time. On my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to have any measurable affect on OglBatch7. So, while this may come back to bite us, it doesn't look too bad. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-12 22:39:29 -07:00
Jason Ekstrand	b8ab78d1af	anv/pipeline_cache: Rework to use multialloc and blob This gets rid of all of our hand-rolled size calculation and serialization code and replaces it with safe "standards" that are used elsewhere in anv and mesa. This should be significantly safer than rolling our own. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-10-12 21:47:06 -07:00
Jason Ekstrand	2d29dd9ee4	anv/pipeline: Declare bind maps closer to their use This is just a trivial cleanup. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-10-12 21:47:06 -07:00
Jason Ekstrand	ba4b7e9c44	anv/multialloc: Add new add_size helper Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-10-12 21:47:06 -07:00
tournier.elie	1233d32d2a	meson: fix typo in isl Signed-off-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: Antia Puentes <apuentes@igalia.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2017-10-12 09:39:07 -07:00
Lionel Landwerlin	e568d2bd1f	anv: intel: use anv_image's computed size for importing a BO Rather than relying on size = stride * height, we can rely on anv_image's total size. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Daniel Stone <daniels@collabora.com>	2017-10-11 22:29:55 +01:00
Lionel Landwerlin	c0a4f56fb9	anv: bo_cache: allow importing a BO larger than needed It's not a problem if a BO has been allocated larger than we need it to be. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102940 Fixes: `818b857914` ("anv: Use the BO cache for DeviceMemory allocations") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-11 22:29:55 +01:00
Kenneth Graunke	6f5abf3146	i965: Fix output register sizes when multiple variables share a slot. ARB_enhanced_layouts allows multiple output variables to share the same location - and these variables may not have the same sizes. For example, consider these output variables: // consume X/Y/Z components of 6 vectors layout(location = 0) out vec3 a[6]; // consumes W component of the first vector layout(location = 0, component = 3) out float b; Looking at the first declaration, we see that VARYING_SLOT_VAR0 needs 24 components worth of space (vec3 padded out to a vec4, 4 * 6 = 24). But looking at the second declaration, we would think that VARYING_SLOT_VAR0 needs only 4 components of space (a single float padded out to a vec4). nir_setup_outputs() only considered the space requirements of the first declaration it happened to see, so if 'float b' came first, it would underallocate the output register space, causing brw_fs_validator.cpp to assert fail about inst->dst.offset exceeding the register size. Fixes Piglit's tests/spec/arb_enhanced_layouts/execution/component-layout/ vs-to-fs-array-interleave-single-location.shader_test. Thanks to Tim Arceri for finding this bug and writing a test! Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-10-10 17:29:37 -07:00
Dave Airlie	5be3fdfa32	anv: fix assert in wsi image code. This assert was firing just running demos. Jason said it should be this. Fixes: `6c7720ed78` (anv/wsi: Allocate enough memory for the entire image) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-10-11 09:52:57 +10:00
Kenneth Graunke	03087686ff	i965: Don't try to decode types for non-existent src1. KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks has a MOV that hits this validation path. MOVs don't have a src1 file, but calling brw_inst_src1_type() was tripping on src1.file being BRW_IMMEDIATE_VALUE and the hw_type being something invalid for immediates. To work around this, just pretend src1 is src0 if there isn't a src1. Fixes: `2572c2771d` (i965: Validate "Special Requirements for Handling Double Precision Data Types") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2017-10-10 15:11:35 -07:00
Iago Toral Quiroga	5ec21eb1a0	i965/tes: account for the fact that dvec3/4 inputs take two slots When computing the total size of the URB for tessellation evaluation inputs we were not accounting for this, and instead we were always assuming that each input would take a single vec4 slot, which could lead to computing a smaller read size than required. Specifically, this is a problem when the last input is a dvec3/4 such that its XY components are stored in the the second half of a payload register (which can happen if the offset for the input in the URB is not 64-bit aligned because there are 32-bit inputs mixed in) and the ZW components in the first half of the next, as in this case we would fail to account for the extra slot required for the ZW components. Fixes (requires another fix in CTS currently in review): KHR-GL45.enhanced_layouts.varying_locations KHR-GL45.enhanced_layouts.varying_array_locations Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-10 08:59:54 +02:00
Tapani Pälli	63e6db18c5	anv: fix null pointer dereference CID: 1419033 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-10 08:17:44 +03:00
Józef Kucia	91ba331ef4	anv: Do not assert() on VK_ATTACHMENT_UNUSED Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-09 16:28:43 -07:00
Jason Ekstrand	6c7720ed78	anv/wsi: Allocate enough memory for the entire image Previously, we allocated memory for image->plane[0].surface.isl.size which is great if there is no compression. However, on BDW, we can do CCS_D on X-tiled images so we also have to allocate space for the auxiliary buffer. This fixes hangs in some of the WSI CTS tests and should also reduce hangs in real applications. In particular, it fixes the dEQP-VK.wsi..incremental_present. test group. When we hand the image off to X11 or Wayland, it will ignore the CCS entirely which is ok because we do a resolve when it's transitioned to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-07 17:12:38 -07:00
Lionel Landwerlin	e262845e37	anv: fix nir.h include All over mesa we include "nir/nir.h", we should probably do the same here. This fixes the meson build that was broken by the ycbcr series. Thanks to Dylan for finding the issue. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `f3e91e78a3` ("anv: add nir lowering pass for ycbcr textures") Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-07 22:57:50 +01:00
Lionel Landwerlin	0763f814d7	anv/cmd_buffer: Reset state in cmd_buffer_destroy This ensures that everything gets cleaned up properly. In particular, it fixes a memory leak where we were leaking the push constants structs. Valgrind stats on dEQP-VK.pipeline.push_constant.graphics_pipeline.range_size_128 : Before: HEAP SUMMARY: in use at exit: 2,467,513 bytes in 1,305 blocks total heap usage: 697,853 allocs, 696,530 frees, 138,466,600 bytes allocated LEAK SUMMARY: definitely lost: 1,068 bytes in 11 blocks indirectly lost: 24,669 bytes in 412 blocks possibly lost: 0 bytes in 0 blocks still reachable: 2,441,776 bytes in 882 blocks suppressed: 0 bytes in 0 blocks After: HEAP SUMMARY: in use at exit: 2,467,381 bytes in 1,304 blocks total heap usage: 697,853 allocs, 696,531 frees, 138,466,600 bytes allocated LEAK SUMMARY: definitely lost: 936 bytes in 10 blocks indirectly lost: 24,669 bytes in 412 blocks possibly lost: 0 bytes in 0 blocks still reachable: 2,441,776 bytes in 882 blocks suppressed: 0 bytes in 0 blocks Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>	2017-10-06 17:32:34 +01:00
Lionel Landwerlin	d296dea54e	anv/cmd_buffer: fix push descriptors with set > 0 When writing to set > 0, we were just wrongly writing to set 0. This commit fixes this by lazily allocating each set as we write to them. We didn't go for having them directly into the command buffer as this would require an additional ~45Kb per command buffer. v2: Allocate push descriptors from system memory rather than in BO streams. (Lionel) Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org> Fixes: `9f60ed98e5` ("anv: add VK_KHR_push_descriptor support") Reported-by: Daniel Ribeiro Maciel <daniel.maciel@gmail.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 17:32:13 +01:00
Lionel Landwerlin	b24b93d584	anv: enable VK_KHR_sampler_ycbcr_conversion v2: Make GetImageMemoryRequirements2KHR() iterate over all pInfo structs (Lionel) Handle VkSamplerYcbcrConversionImageFormatPropertiesKHR (Andrew/Jason) Iterator over BindImageMemory2KHR's pNext structs correctly (Jason) v3: Revert GetImageMemoryRequirements2KHR() change from v2 (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 16:34:04 +01:00
Lionel Landwerlin	a62a979335	anv: enable multiple planes per image/imageView This change introduce the concept of planes for image & views. It matches the planes available in new formats. We also refactor depth & stencil support through the usage of planes for the sake of uniformity. In the backend (genX_cmd_buffer.c) we have to take some care though with regard to auxilliary surfaces. Multiplanar color buffers can have multiple auxilliary surfaces but depth & stencil share the same HiZ one (only store in the depth plane). v2: by Jason Remove unused aspect parameters from anv_blorp.c Assert when attempting to resolve YUV images Drop redundant logic for plane offset in make_surface() Rework anv_foreach_plane_aspect_bit() Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 16:32:20 +01:00
Jason Ekstrand	185e719090	anv: Take an image in can_sample_with_hiz Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-06 16:32:19 +01:00
Jason Ekstrand	558d8a3979	anv: Take a single aspect in anv_layout_to_aux_usage Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-06 16:32:19 +01:00
Jason Ekstrand	3735af0415	anv/cmd_buffer: Make get_fast_clear_state return an address Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-06 16:32:19 +01:00
Jason Ekstrand	fd146e4f3f	anv/blorp: Add a concept of default aux usage A good chunk of anv_blorp just wants the aux usage from the image. This magic aux_usage value means just that. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-10-06 16:32:19 +01:00
Lionel Landwerlin	f3e91e78a3	anv: add nir lowering pass for ycbcr textures This pass implements all the implicit conversions required by the VK_KHR_sampler_ycbcr_conversion specification. It also inserts plane sources onto sampling instructions that we then let the pipeline layout pass deal with, when mapping things correctly to descriptors. v2: Add new file to meson build (Lionel) Use nir_frcp() rather than (1.0f / x) (Jason) Reuse nir_tex_instr_dest_size() rather than handwritten one (Jason) Return progress (Jason) Account for array of samplers (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 16:32:19 +01:00
Lionel Landwerlin	3492d56067	anv: prepare sampler emission code for multiplanar images New settings from the KHR_sampler_ycbcr_conversion specifications might require different sampler settings for luma and chroma planes. This change makes the sampler table emission ready to handle multiple planes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 16:32:19 +01:00
Lionel Landwerlin	a2a7846d37	anv/apply_pipeline_layout: Prepare for multi-planar images Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 16:32:19 +01:00
Lionel Landwerlin	72aec2060f	anv: add new formats KHR_sampler_ycbcr_conversion Adding new downsampling factors for each planes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 11:46:08 +01:00
Lionel Landwerlin	bbc3700798	anv: modify the internal concept of format to express multiple planes A given Vulkan format can now be decomposed into a set of planes. We now use 'struct anv_format_plane' to represent the format of those planes. v2: by Jason Rename anv_get_plane_format() to anv_get_format_plane() Don't rename anv_get_isl_format() Replace ds_fmt() by fmt2() Introduce fmt_unsupported() Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 11:46:03 +01:00
Lionel Landwerlin	18914715d1	anv: prepare formats to handle disjoints sets Newer format enums start at offset 1000000000, making it impossible to have them all in one table. This change splits the formats into sets that we then access through indirection. v2: rename format_extract to vk_to_anv_format (Chad/Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 11:45:56 +01:00
Lionel Landwerlin	42a8fd1670	isl: fill out layout descriptions for yuv formats Some description was missing. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 11:45:52 +01:00
Lionel Landwerlin	f86c1b1595	isl: check whether a format is rgb if colorspace is yuv Suggested by Chad. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 11:45:49 +01:00
Lionel Landwerlin	5e9f52ff4d	isl: make format layout channels accessible by index Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2017-10-06 11:45:44 +01:00
Jason Ekstrand	7463d50580	intel/compiler: Don't propagate cmod into integer multiplies No shader-db change on Sky Lake. Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-05 11:54:49 -07:00
Jason Ekstrand	b91ecee04a	intel/compiler: Don't cmod propagate into a saturated operation Shader-db results on Sky Lake: total instructions in shared programs: 12954445 -> 12955125 (0.01%) instructions in affected programs: 141862 -> 142542 (0.48%) helped: 0 HURT: 626 Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-10-05 11:54:49 -07:00
Matt Turner	2572c2771d	i965: Validate "Special Requirements for Handling Double Precision Data Types" I did not implement: CNL's restriction on 64-bit int + align16, because I don't think we'll ever use this combination regardless of hardware generation. The restriction on immediate DF -> F conversions, because there's no reason to ever generate that, and I don't even know how DF -> F conversions are supposed to work in Align16 since (1) the dst stride must be 1, but (2) the dst stride would have to be 2 for src and dst strides to be aligned.	2017-10-04 14:08:54 -07:00
Matt Turner	98298c7e3d	i965: Fix and enable forgotten validation test I seem to have forgotten I still had work to do.	2017-10-04 14:08:54 -07:00
Matt Turner	122ef3799d	i965: Only insert error message if not already present Some restrictions require something like strides to match between src and dest. For multi-source instructions, I'd rather encapsulate the logic for not inserting already present errors in ERROR_IF than open-coding it multiple places.	2017-10-04 14:08:54 -07:00
Matt Turner	5e76cf153c	i965: Avoid validation error when src1 is not present There can be no violation of the restriction that source offsets are aligned if there is only one source offset.	2017-10-04 14:08:54 -07:00
Matt Turner	cacc229ba0	i965: Remove validate_reg() Replaced by the assembly validator, and in fact gets in the way of writing tests for the assembly validator.	2017-10-04 14:08:54 -07:00
Matt Turner	678d88bcee	i965: Add and use STRIDE and WIDTH macros You'll notice there were bugs in some of the code being replaced. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	4c961a5e79	i965: Add parentheses around usage of macro arguments Otherwise I cannot use this macro in test_eu_validate.cpp Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	1fcdb1cbea	i965: Add GLK, CFL, CNL to test_eu_validate.c	2017-10-04 14:08:54 -07:00
Matt Turner	6db5ec7deb	i965: Fix support for disassembling 64-bit integer immediates The type suffixes were wrong, and the 16 was missing the 0 prefix. Fixes: `92f787ff86` ("i965: Add support for disassembling 64-bit integer immediates") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	7e88f93469	i965/fs: Rewrite fsign64 to skip the float -> double conversion ... without the float -> double conversion. Low power parts have additional restrictions when it comes to operating on 64-bit types, and the instruction used to do the conversion violates one of them: specifically, the restriction that "Source and Destination horizontal stride must be aligned to the same qword". Previously we generated a float and then converted, but we can avoid the conversion by using the same extract-the-sign-bit + or-in-1.0 algorithm by directly operating on the high four bytes of each double-precision component in the result. In SIMD8 and SIMD16 this cuts one instruction from the implementation, and more importantly that instruction is the one which violated the regioning restriction. Along the way I removed some comments that I did not think helped, and some code about double comparisons which does not seem to be necessary today. This prevents validation failures caught by the new EU validation code added in later patches. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2017-10-04 14:08:54 -07:00
Matt Turner	b541945c20	i965/fs: Unpack count argument to 64-bit shift ops on Atom 64-bit operations on Atom parts have additional restrictions over their big-core counterparts (validated by later patches). Specifically, the restriction that "Source and Destination horizontal stride must be aligned to the same qword" is violated by most shift operations since NIR uses a 32-bit value as the shift count argument, and this causes instructions like shl(8) g19<1>Q g5<4,4,1>Q g23<4,4,1>UD where src1 has a 32-bit stride, but the dest and src0 have a 64-bit stride. This caused ~4 pixels in the ARB_shader_ballot piglit test fs-readInvocation-uint.shader_test to be incorrect. Unfortunately no ARB_gpu_shader_int64 test hit this case because they operate on uniforms, and their scalar regions are an exception to the restriction. We work around this by effectively unpacking the shift count, so that we can read it with a 64-bit stride in the shift instruction. Unfortunately the unpack (a MOV with a dst stride of 2) is a partial write, and cannot be copy-propagated or CSE'd. Bugzilla: https://bugs.freedesktop.org/101984	2017-10-04 14:08:54 -07:00
Matt Turner	2082c32950	i965/fs: Don't apply POW/FDIV workaround on Gen10+ The documentation says it applies only to Gens 8 and 9. Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>	2017-10-04 14:08:37 -07:00
Matt Turner	d407935327	i965: Fix src0 vs src1 typo A typo caused us to copy src0's reg file to src1 rather than reading src1's as intended. This caused us to fail to compact instructions like mov(8) g4<1>D 0D { align1 1Q }; because src1 was set to immediate rather than architecture file. Fixing this reenables compaction (after the precompact() pass changes the data types): mov(8) g4<1>UD 0x00000000UD { align1 1Q compacted }; Fixes: `1cb0a7941b` ("i965: Switch to using the logical register types") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-10-04 14:08:24 -07:00
Tapani Pälli	b2dce27373	android: fix build issues with brw_nir_trig_workarounds.c Fixes: `848da66222` ("intel: use a flag instead of setting PYTHONPATH") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>	2017-10-04 07:39:05 +03:00

1 2 3 4 5 ...

2224 commits