fdo-mirrors/mesa

mirror of https://gitlab.freedesktop.org/mesa/mesa.git synced 2026-02-25 11:30:29 +01:00

Author	SHA1	Message	Date
Lionel Landwerlin	aab21cedc6	intel: devinfo: add simulator id Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Scott D Phillips	0f53948c59	intel: tools: dump-gpu: dump 48-bit addresses For gen8+, write out PPGTT tables in aub files so that full 48-bit addresses can be serialized. v2: Fix handling of `end` index in map_ppgtt v3: Correctly mark GGTT entry as present (Rafael) Signed-off-by: Scott D Phillips <scott.d.phillips@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Lionel Landwerlin	6e37b949d5	intel: tools: import intel_aubdump Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Lionel Landwerlin	fa00b9c1c9	intel: tools: update intel_aub.h Scott added new stuff in IGT. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Lionel Landwerlin	5ffa35b64d	intel: batch-decoder: add missing return line Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Lionel Landwerlin	28476c9d81	intel: batch-decoder: don't asks for constant BO until decoding With PPGTT mappings, our aubinator implementation can be quite slow if we request a buffer that doesn't exist. Instead of doing a PPGTT walk for invalid addresses (0 lengths), wait until we're sure we want to decode the data. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Scott D Phillips	c262ec19d0	intel/batch-decoder: handle non-contiguous binding table / surface state Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-07-05 11:57:45 +01:00
Scott D Phillips	3ebee627cb	intel/tools/aubinator: aubinate ppgtt aubs v2: by Lionel Fix memfd_create compilation issue Fix pml4 address stored on 32 instead of 64bits Return no buffer if first ppgtt page is not mapped v3: Drop additional memfd_create() (Rafael) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Lionel Landwerlin	3228335b55	intel: aubinator: handle GGTT mappings We use memfd to store physical pages as they get read/written to and the GGTT entries translating virtual address to physical pages. Based on a commit by Scott Phillips. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Lionel Landwerlin	144b40db54	intel: aubinator: drop the 1Tb GTT mapping Now that we're softpinning the address of our BOs in anv & i965, the addresses selected start at the top of the addressing space. This is a problem for the current implementation of aubinator which uses only a 40bit mmapped address space. This change keeps track of all the memory writes from the aub file and fetch them on request by the batch decoder. As a result we can get rid of the 1<<40 mmapped address space and only rely on the mmap aub file \o/ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Lionel Landwerlin	9d08ef6335	intel: aubinator: rework register writes handling Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Lionel Landwerlin	86cb05a6d3	intel: aubinator: remove standard input processing option On a follow up commit in this series, we stop copying the data from the mmap'ed file into our big gtt mmap, and start referencing data in it directly. So reallocating the read buffer and adding more data from stdin wouldn't work. For that reason, let's stop supporting stdin process. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Lionel Landwerlin	08d85a8301	intel: aubinator: remove unused variables These memory offsets are stored in the gen_batch_decode_ctx. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-07-05 11:57:45 +01:00
Neil Roberts	2d5ddbe960	i965: Fix output register sizes when variable ranges are interleaved In `6f5abf3146` this code was fixed to calculate the maximum size of an attribute in a seperate pass and then allocate the registers to that size. However this wasn’t taking into account ranges that overlap but don’t have the same starting location. For example: layout(location = 0, component = 0) out float a[4]; layout(location = 2, component = 1) out float b[4]; Previously, if ‘a’ was processed first then it would allocate a register of size 4 for location 0 and it wouldn’t allocate another register for location 2 because it would already be covered by the range of 0. Then if something tries to write to b[2] it would try to write past the end of the register allocated for ‘a’ and it would hit an assert. This patch changes it to scan for any overlapping ranges that start within each range to calculate the maximum extent and allocate that instead. Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/ vs-fs-array-interleave-range.shader_test Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: `6f5abf3146` "i965: Fix output register sizes when multiple variables share a slot."	2018-07-04 10:57:51 +02:00
Ian Romanick	995d993710	i965/vec4: Don't cmod propagate from CMP to ADD if the writemask isn't compatible Otherwise we can incorrectly cmod propagate in situations like add(8) g10<1>.xD g2<0>.xD -16D ... cmp.ge.f0(8) null<1>D g2<0>.xD 16D ... (+f0) sel(8) g21<1>.xyUD g14<4>.xyyyUD g18<4>.xyyyUD Sadly, this change hurts quite a few shaders. v2: Refactor writemask compatibility check into a separate function. Suggested by Caio. Ivy Bridge and Haswell had similar results. (Haswell shown) total instructions in shared programs: 12968489 -> 12968738 (<.01%) instructions in affected programs: 60679 -> 60928 (0.41%) helped: 0 HURT: 249 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.22% max: 0.81% x̄: 0.46% x̃: 0.44% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.44% 0.48% Instructions are HURT. total cycles in shared programs: 409171965 -> 409172317 (<.01%) cycles in affected programs: 260056 -> 260408 (0.14%) helped: 0 HURT: 176 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.34% x̄: 0.17% x̃: 0.17% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %-change: 0.16% 0.18% Cycles are HURT. Sandy Bridge total instructions in shared programs: 10423577 -> 10423753 (<.01%) instructions in affected programs: 40667 -> 40843 (0.43%) helped: 0 HURT: 176 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.29% max: 0.79% x̄: 0.48% x̃: 0.42% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.46% 0.51% Instructions are HURT. total cycles in shared programs: 146097503 -> 146097855 (<.01%) cycles in affected programs: 503990 -> 504342 (0.07%) helped: 0 HURT: 176 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.02% max: 0.36% x̄: 0.12% x̃: 0.11% 95% mean confidence interval for cycles value: 2.00 2.00 95% mean confidence interval for cycles %-change: 0.11% 0.13% Cycles are HURT. No changes on any other platforms. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Fixes: `cd635d149b` i965/vec4: Propagate conditional modifiers from compares to adds Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-07-02 19:19:16 -07:00
Ian Romanick	fb6dc8e894	intel/compiler: Silence unused parameter warnings brw_nir.c src/intel/compiler/brw_nir.c: In function ‘brw_nir_lower_vue_outputs’: src/intel/compiler/brw_nir.c:464:32: warning: unused parameter ‘is_scalar’ [-Wunused-parameter] bool is_scalar) ^~~~~~~~~ src/intel/compiler/brw_nir.c: In function ‘lower_bit_size_callback’: src/intel/compiler/brw_nir.c:610:57: warning: unused parameter ‘data’ [-Wunused-parameter] lower_bit_size_callback(const nir_alu_instr alu, void data) ^~~~ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-07-02 16:17:19 -07:00
Jason Ekstrand	afa8f58921	anv: Add support for the on-disk shader cache The Vulkan API provides a mechanism for applications to cache their own shaders and manage on-disk pipeline caching themselves. Generally, this is what I would recommend to application developers and I've resisted implementing driver-side transparent caching in the Vulkan driver for a long time. However, not all applications do this and, for some use-cases, it's just not practical. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-02 14:52:05 -07:00
Jason Ekstrand	e0f7a3aa5b	anv/pipeline_cache: Add a _locked suffix to a function Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-02 13:07:06 -07:00
Jason Ekstrand	f5c38f4a30	anv: Add device-level helpers for searching for and uploading kernels Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-02 13:07:06 -07:00
Jason Ekstrand	eae192bf5f	anv/pipeline: Stop optimizing for not having a cache Before, we were only hashing the shader if we had a shader cache to cache things in. This means that if we ever get it wrong, we could end up trying to cache a shader with an undefined hash. Since not having a shader cache is an extremely uncommon case, let's optimize for code clarity and obvious correctness over avoiding a hash operation. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-02 13:07:06 -07:00
Jason Ekstrand	76fdc8a85c	anv: Use a default pipeline cache if none is specified If a client is dumb enough to not specify a pipeline cache, give it a default. We have to create one anyway for blorp so we may as well let the client cache shaders in it. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-02 13:07:06 -07:00
Jason Ekstrand	d1c778b362	anv: Be more careful about hashing pipeline layouts Previously, we just hashed the entire descriptor set layout verbatim. This meant that a bunch of extra stuff such as pointers and reference counts made its way into the cache. It also meant that we weren't properly hashing in the Y'CbCr conversion information information from bound immutable samplers. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-02 13:07:06 -07:00
Jason Ekstrand	06412bfc98	anv,intel: Enable nir_opt_large_constants for Vulkan According to RenderDoc, this shaves 99.6% of the run time off of the ambient occlusion pass in Skyrim Special Edition when running under DXVK and shaves 92% off the runtime for a reasonably representative frame. When running the actual game, Skyrim goes from being a slide-show to a very stable and playable framerate on my SKL GT4e machine. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-02 12:09:50 -07:00
Jason Ekstrand	70ce880434	anv: Add state setup support for shader constants Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-02 12:09:49 -07:00
Jason Ekstrand	3a5ed18c51	anv: Add support for shader constant data to the pipeline cache Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-02 12:09:47 -07:00
Iago Toral Quiroga	1b54824687	anv/cmd_buffer: make descriptors dirty when emitting base state address Every time we emit a new state base address we will need to re-emit our binding tables, since they might have been emitted with a different base state adress. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> CC: <mesa-stable@lists.freedesktop.org>	2018-07-02 08:31:20 +02:00
Iago Toral Quiroga	6a1d8350c9	anv/cmd_buffer: clean dirty push constants flag after emitting push constants Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> CC: <mesa-stable@lists.freedesktop.org>	2018-07-02 08:31:02 +02:00
Iago Toral Quiroga	198a72220b	anv/cmd_buffer: never shrink the push constant buffer size If we have to re-emit push constant data, we need to re-emit all of it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> CC: <mesa-stable@lists.freedesktop.org>	2018-07-02 08:30:40 +02:00
Jose Maria Casanova Crespo	a99c9e63a0	anv: finish the binding_table_pool on destroyDevice when use_softpin Running VK-CTS in batch execution mode was raising the VK_ERROR_INITIALIZATION_FAILED error in multiple tests. But when the same failing tests were run isolated they always passed. createDevice and destroyDevice were called before and after every tests. Because the binding_table_pool was never closed, we reached the maximum number of open file descriptors (ulimit -n) and when that happened every call to createDevice implied a VK_ERROR_INITIALIZATION_FAILED error. Fixes: `c7db0ed4e9` ("anv: Use a separate pool for binding tables when soft pinning") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-29 21:49:31 +02:00
Francisco Jerez	c2c803be7b	intel/fs: Build 32-wide FS shaders. Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>	2018-06-28 13:25:21 -07:00
Jason Ekstrand	b95b0e2918	intel/anv,blorp,i965: Implement the SKL 16x MSAA SIMD32 workaround Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-28 13:25:18 -07:00
Jason Ekstrand	d5e028a57b	intel/fs: Add fields to wm_prog_data for SIMD32 dispatch Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	bcbc7d3a17	intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	7144247c2c	intel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	37c1df28c9	intel/fs: Fix Gen6+ interpolation setup for SIMD32 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	e208bc3bb7	intel/fs: Get rid of MOV_DISPATCH_TO_FLAGS We can just emit the MOV in the two places where we use this. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	5e3028d826	intel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaround There's no reason for us to emit it a pile of times and then have a whole pass to clean it up. Just emit it once like we really want. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	40fe108e2b	intel/fs: Generalize the unlit centroid workaround This generalizes the unlit centroid workaround so it's less code and now supports SIMD32. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	1d381731e0	intel/fs: Fix sample id setup for SIMD32. v2 (Jason Ekstrand): - Disallow gl_SampleId in SIMD32 on gen7 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	2fd0aed89a	intel/fs: Fix Gen7 compressed source region alignment restriction for SIMD32 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	6909aed90e	intel/fs: Implement 32-wide FS payload setup on Gen6+ Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	f6c4aace22	intel/fs: Extend thread payload layout to SIMD32 And handle 32-wide payload register reads in fetch_payload_reg(). v2 (Jason Ekstrand); - Fix some whitespace and brace placement Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	8f143f70d6	intel/fs: Wrap FS payload register look-up in a helper function. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	d996e5b812	intel/fs: Use fs_regs instead of brw_regs in the unlit centroid workaround While we're here, we change to using horiz_offset() instead of abusing half(). v2 (Jason Ekstrand): - Use horiz_offset() instead of half() Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	38aee1a06d	intel/fs: Simplify fs_visitor::emit_samplepos_setup The original code manually handled splitting the MOVs to 8-wide to handle various regioning restrictions. Now that we have a SIMD width splitting pass that handles these things, we can just emit everything at the full width and let the SIMD splitting pass handle it. We also now have a useful "subscript" helper which is designed exactly for the case where you want to take a W type and read it as a vector of Bs so we may as well use that too. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	244a0ff3a8	i965: Add plumbing for shader time in 32-wide FS dispatch mode. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	2d7d652d5c	intel/fs: Disable opt_sampler_eot() in 32-wide dispatch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	db6ca13efc	intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates On g4x through Sandy Bridge, src1 (the coordinates) of the PLN instruction is required to be an even register number. When it's odd (which can happen with SIMD32), we have to emit a LINE+MAC combination instead. Unfortunately, we can't just fall through to the gen4 case because the input registers are still set up for PLN which lays out the four src1 registers differently in SIMD16 than LINE. v2 (Jason Ekstrand): - Take advantage of both accumulators and emit LINE LINE MAC MAC (Based on a patch from Francisco Jerez) - Unify the gen4 and gen4x-6 cases using a loop v3 (Jason Ekstrand): - Don't unify gen4 with gen4x-6 as this turns out to be more fragile than first thought without reworking the gen4 barycentric coordinate layout. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Jason Ekstrand	566e6abd6d	intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLN When we don't have PLN (gen4 and gen11+), we implement LINTERP as either LINE+MAC or a pair of MADs. In both cases, the accumulator is written by the first of the two instructions and read by the second. Even though the accumulator value isn't actually ever used from a logical instruction perspective, it is trashed so we need to make the scheduler aware. Otherwise, the scheduler could end up re-ordering instructions and putting a LINTERP between another an instruction which writes the accumulator and another which tries to use that result. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00
Francisco Jerez	73d60455e9	intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU operation and less like a send. This is less code over-all and, as a side-effect, it now properly handles execution groups and lowering so SIMD32 support just falls out. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-06-28 13:19:38 -07:00

1 2 3 4 5 ...

3188 commits