Some commands like MI_BATCH_BUFFER_START have this indicator.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Fixes leaks from anv_device_upload_nir:
==7345== 8,192 bytes in 2 blocks are definitely lost in loss record 24 of 24
==7345== at 0x4C2ED78: malloc (vg_replace_malloc.c:308)
==7345== by 0x4C31393: realloc (vg_replace_malloc.c:836)
==7345== by 0x54E0848: grow_to_fit (blob.c:67)
==7345== by 0x54E0BE5: blob_reserve_bytes (blob.c:166)
==7345== by 0x54E0C7C: blob_reserve_intptr (blob.c:186)
==7345== by 0x54704A7: nir_serialize (nir_serialize.c:1091)
==7345== by 0x512F97D: anv_device_upload_nir (anv_pipeline_cache.c:756)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Use anv_gem_munmap for unmap when softpin in use, this corresponds to
anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind
errors seen for each pool when softpin is in use:
==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31
==25581== at 0x50E77E8: anv_gem_mmap (anv_gem.c:96)
==25581== by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543)
==25581== by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477)
==25581== by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920)
==25581== by 0x510B8EB: anv_CreateDevice (anv_device.c:2031)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later. This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything. This substantially reduces both fp64 shader compile times
and the resulting code size. On the vs-isnan-dvec test from piglit:
Before this commit:
1684.63s user 17.29s system 99% cpu 28:28.24 total
101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills.
Peak memory usage (according to massif): 1.435 GB
After this commit:
179.64s user 7.75s system 99% cpu 3:07.92 total
57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills.
Peak memory usage (according to massif): 531.0 MB
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves. This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even though this is technically a step in the function inlining process
as laid out in nir_inline_functions.c, it's not really needed. We
already have constant initializers lowered here and no new ones are
added by appending the softfp64 functions.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the old code, we would generate the exact same instruction for
extract_u8(some_u64, 0) and extract_u8(some_u64, 1). The mask-a-word
trick only works for even numbered bytes.
This fixes the (new) piglit test
tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test.
v2: Use a SHR instead of an AND. This saves an instruction compared to
using two moves. Suggested by Jason.
Fixes: 6ac2d16901 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The parameter is never used, and it's not part of a common interface
idiom. Remove it.
src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’:
src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
const struct gen_device_info *devinfo)
^~~~~~~
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver. This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This buffer goes along side the CPU data structure and may contain
pointers, bindless handles, or any other descriptor information.
Currently, all descriptors are size zero and nothing goes in the buffer
but this commit sets up the framework we will need later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Technically, descriptor set layouts aren't required to survive past the
function they're passed into so we need to reference them.
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Pull the common code out of the two entrypoints into the helper which
fetches the push descriptor set for us. Now that it does more than just
get a thing, call it anv_cmd_buffer_push_descriptor_set.
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The descriptor set layout code in our driver has undergone many changes
over the years. Some of the fields which were once essential are now
useless or nearly so. The has_dynamic_offsets field was completely
unused accept for the code to set and hash it. The per-stage indices
were only being used to determine if a particular binding had images,
samplers, etc. The fact that it's per-stage also doesn't matter because
that binding should never be accessed by a shader of the wrong stage.
This commit deletes a pile of cruft and replaces it all with a
descriptive bitfield which states what a particular descriptor contains.
This merely describes the data available and doesn't necessarily dictate
how it will be lowered in anv_nir_apply_pipeline_layout.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is what we're actually storing in the descriptor set and consuming
when we bind surface states. This commit renames image_count to
image_param_count a few places and moves the decision to not count image
params on gen9+ into anv_descriptor_set.c when we build the layout.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Make them all take a device followed by a set. This is consistent
with how the actual Vulkan entrypoint parameters are laid out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This commit just puts the free list code together as part of the pool
instead of having it inlined into the descriptor set create code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We already propagate coord_components correctly and did not have
layer restrictions for ycbcr formats.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This does not seem to fix anything ATM but is the right thing todo.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: f3e91e78a3 ("anv: add nir lowering pass for ycbcr textures")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We zero out the prog data anyway and, now that bias is always zero, this
function is accomplishing nothing.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit moves our handling of gl_NumWorkgroups over to work like our
handling of other special bindings in the Vulkan driver. We give it a
magic descriptor set number and teach emit_binding_tables to handle it.
This is better than the bias mechanism we were using because it allows
us to do proper accounting through the bind map mechanism.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header. Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.
Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were accidentally not counting those surfaces
Fixes: ddc4069122 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.
Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended
v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
2) Add lower_mul_2x32_64 flag (Matt Turner)
3) Remove associative property as bit size is different (Connor
Abbott)
v3: Fix indentation and variable naming convention (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies
Rename the variable labels according to targets and python scripts
Align the building rules as per Automake for simplification
Fixes building errors during rebuils due to missing dependencies
(v2) Fixed a missing $(VULKAN_API_XML) reference
Fixes: 9a508b7 ("android: anv/extensions: fix generated sources build")
Fixes: dd088d4bec ("anv/extensions: Generate a header file with extension tables")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This ran afoul of Iris's use of nir_lower_clamp_color_outputs which
applies fsat() before writes to vertex shader color outpus.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7725d60938 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")
As requested by Ken ;)
v2: Also decode simple batches (Caio)
Fix u_vector usage issues (Lionel)
v3: Make binding/instruction/state/surface available (Lionel)
v4: Going through device pools for simple batches (Lionel)
Centralize search BO callbacks into anv_device.c (Lionel)
v5: Clear decoded batch buffer var after use (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Other places will need to do this soon to properly handle source
swizzles. The patch looks a little odd, but the change is pretty
straight forward. All of the swizzle and mask handling is moved out,
but the code for handling move instructions and vecN instructions
remains in nir_emit_alu.
I'm not terribly pleased with the "need_dest" parameter, but
get_nir_dest is (somewhat surprisingly) destructive. I am open to
suggestions of alternatives.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To allow cmod propagation from a MOV in a sequence like:
and(16) g31<1>UD g20<8,8,1>UD g22<8,8,1>UD
mov.nz.f0(16) null<1>F g31<8,8,1>D
A similar change to the vec4 backend had no effect.
Somewhere between c1ec582059 and 40fc4b5acd (1,094 commits) the
effectiveness of this patch diminished, and as of commit d7e0d47b9d
(nir: Add a bunch of b2[if] optimizations) this optimization no longer
has any effect on any platform.
A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a
logic result on Gen8+," generates some instruction sequences that
require this change in order for cmod propagation to make progress.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
pylint complains:
> C0326: No space allowed around keyword argument assignment
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
I'm guessing a previous version of this script used an index-based map
of entrypoints, but that's not the case anymore.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>