We'll start doing slow depth clears more often on HIZ_CCS buffers in a
future commit. Reduce the performance impact by making them use less
bandwidth.
From the Depth Test section of the BSpec:
This function is enabled by the Depth Test Enable state variable. If
enabled, the pixel's ("source") depth value is first computed. After
computation the pixel's depth value is clamped to the range defined
by Minimum Depth and Maximum Depth in the selected CC_VIEWPORT state.
Then the current ("destination") depth buffer value for this pixel is
read.
and from the Depth Buffer Updates section of the BSpec:
If depth testing is disabled or the depth test passed, the incoming
pixel's depth value is written to the Depth Buffer.
Taken together, it's clear that depth testing isn't necessary to perform
a depth buffer clear. Mark Janes and I analyzed this patch with
frameretrace and a depthrange piglit test. I disabled HiZ to ensure we'd
get slow depth clears. We've observed the bandwidth consumption by the
depth buffer access to be cut ~50% on BDW and SKL during depth clears.
On a more graphically intensive workload, the Shadowmapping Sascha
benchmark, I took the average of 3 runs on a BDW with a display
resolution of about 1920x1200 (minus some desktop environment
decorations). I measured a 22.61% FPS improvement when HiZ is disabled.
v2. The BSpec doesn't mandate this behavior, update comment accordingly.
(Ken)
Fixes: bc4bb5a7e3 ("intel/blorp: Emit more complete DEPTH_STENCIL state")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In ISL:
Update the format table to add CCS_E support for some 8BPP formats,
some 16BPP formats, and R10G10B10A2_UNORM_SRGB.
In the helper for determining CCS_E support, we return false for some
16BPP formats because they aren't properly handled in blorp_copy().
In BLORP:
Allow the new and non-problematic formats for CCS_E-enabled copies.
v2. Update other fields for A1B5G5R5_UNORM and A4B4G4R4_UNORM in table.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
The CCS could be described in a number of ways, but this format was
chosen to minimize churn in the drivers. We may decide on an different
direction in the future.
v2. Increase alignment for display surfaces. (Nanley)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Use a helper that will automatically handle Gen12's CCS tiling when
creating a CCS isl_surf.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In the function which translates ISL tilings to i915 tilings, map ISL's
HiZ and CCS tilings to Y instead of NONE (linear). The HW docs describe
HiZ and pre-Gen12 CCS surfaces as being Y-tiled in memory.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The isl_surf structs for Gen12's CCS won't describe how many slices in
the main surface can be compressed. All slices will be compressable if
CCS is enabled, so lookup the main surface's logical dimension.
v2. Add a space before a `?`. (Jordan)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This isn't accurate enough for HiZ which can have a discontiguous range
of supported aux slices. This also won't work with the plan to represent
Gen12 CCS as a single slice surface.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Update their dimensions according to the Bspec.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From "Render Target Fast Clear" description for Gen12:
"SW must store clear color using MI_STORE_DATA_IMM with
ForceWriteCompletionCheck bit set."
From Instruction_MI_STORE_DATA_IMM, bitfield 10 (when set to 1):
"Following the last write from this command, Command Streamer
will wait for all previous writes are completed and in global
observable domain before moving to next command."
We use 4 SDIs to store the clear color (one per channel). From the
description, it looks to me that setting that flag only on the last SDI
should be enough.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen12's CCS requires that the main surface have a pitch aligned to 512B.
v2. Provide a BSpec citation. (Ken)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
add_aux_state_tracking_buffer() actually checks the aux usage when
determining how many dwords to allocate for state tracking. Move the
function call to the point after the CCS_E aux usage is assigned.
Fixes: de3be61801 ("anv/cmd_buffer: Rework aux tracking")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Avoid failing the `info->use_clear_address` assertion in ISL on Gen12+.
Fixes: 6c9f9a82d7 ("intel/genxml,isl: Add gen12 render surface state changes")
Reported-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In gen12 we use the 3DSTATE_DEPTH_BOUNDS instruction
to enable depth bounds testing.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In gen12 we add the 3DSTATE_DEPTH_BOUNDS instruction
which enables support for depth bounds testing.
Signed-off-by: Plamena Manolova <plamena.manolova@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When building with "-flto" brw::block_data definitions
were colliding.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This interface allows the aux-map code in the intel/common library to
allocate and free buffers.
Reworks:
* free gen_buffer in gen_aux_map_buffer_free. (Rafael)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8d43e2b2de ("meson: add -Werror=empty-body to disallow `if(x);`")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise a smaller type may be promoted to int, which can hit undefined
behaviour:
../src/intel/compiler/brw_packed_float.c:66:17: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
#0 0x5604a03969aa in brw_vf_to_float ../src/intel/compiler/brw_packed_float.c:66
#1 0x5604a0391305 in vf_float_conversion_test_test_vf_to_float_Test::TestBody() ../src/intel/compiler/test_vf_float_conversions.cpp:70
#2 0x5604a041a323 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#3 0x5604a0405c31 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#4 0x5604a03ab03b in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#5 0x5604a03ad714 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#6 0x5604a03afea2 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#7 0x5604a03cb87c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#8 0x5604a041df3c in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#9 0x5604a0409609 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#10 0x5604a03c2e9e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#11 0x5604a0442d57 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#12 0x5604a0442c17 in main ../src/gtest/src/gtest_main.cc:37
#13 0x7f9a1983dbba in __libc_start_main ../csu/libc-start.c:308
#14 0x5604a0390d89 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/vf_float_conversions+0x8dd89)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
To avoid it, use the modulo of the number of bits in the value being
shifted, which is presumably what ended up happening on x86.
Flagged by UBSan:
../src/intel/compiler/brw_eu_validate.c:974:33: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'
#0 0x561abb612ab3 in general_restrictions_on_region_parameters ../src/intel/compiler/brw_eu_validate.c:974
#1 0x561abb617574 in brw_validate_instructions ../src/intel/compiler/brw_eu_validate.c:1851
#2 0x561abb53bd31 in validate ../src/intel/compiler/test_eu_validate.cpp:106
#3 0x561abb555369 in validation_test_source_cannot_span_more_than_2_registers_Test::TestBody() ../src/intel/compiler/test_eu_validate.cpp:486
#4 0x561abb742651 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#5 0x561abb72e64d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#6 0x561abb6d5451 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#7 0x561abb6d7b2a in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#8 0x561abb6da2b8 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#9 0x561abb6f5c92 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#10 0x561abb74626a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#11 0x561abb732025 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#12 0x561abb6ed2b4 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#13 0x561abb768b3b in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#14 0x561abb7689fb in main ../src/gtest/src/gtest_main.cc:37
#15 0x7f525e5a9bba in __libc_start_main ../csu/libc-start.c:308
#16 0x561abb538ed9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/eu_validate+0x1b8ed9)
Reviewed-by: Adam Jackson <ajax@redhat.com>
`strerror()` takes an `errno`, not the negative value returned by the
`ioctl()`.
Instead of fixing this as `"%s", strerror(errno)`, let's just use the
`"%m"` shortcut for it.
Fixes: 2b5f30b1d9 ("anv: implement VK_INTEL_performance_query")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: Introduce the appropriate pipe controls
Properly deal with changes in metric sets (using execbuf parameter)
Record marker at query end
v3: Fill out PerfCntr1&2
v4: Introduce vkUninitializePerformanceApiINTEL
v5: Use new execbuf extension mechanism
v6: Fix comments in genX_query.c (Rafael)
Use PIPE_CONTROL workarounds (Rafael)
Refactor on the last kernel series update (Lionel)
v7: Only I915_PERF_IOCTL_CONFIG when perf stream is already opened (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Those are not part of the OA reports.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We have 2 of those we can configure to source programmable events.
Those are not part of the OA reports. Configuration happens in i915
through the metric set selected by the application. On the Mesa side
we'll just sample those and do a diff.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We use this as a communication mechanism between MDAPI & Anv.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Will conflict with the genxml RPSTAT register.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We want to query the content of register configurations from the
kernel. Let's pull this out of the query.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The Vulkan performance query extension is a bit lower level than the
GL one. Expose some of the functions to do the result accumulation
directly in the Anv driver.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
A simple utility to put the marker at the right location.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>