- Overview
- Appendix A: Understand what PGO can do
- Step 0 - Build a PGO instrumented LLVM
- Step 1 - Build "Hello, World!" RPM
- Step 2 - Automatically add subpackage
- Step 3 - Build unmodified packages on Copr
- Step 4 - Merge PGO Profiles
- Step 5 - Build PGO optimized LLVM
- Step 6 - Evaluation
- Conclusion
- Frequently Asked Questions
- Resources
In this experiment we generate PGO profile data from compiling unmodified RPM packages with a PGO instrumented LLVM and feed those profiles into a PGO optimized rebuild of LLVM. Please note, that this experiment is not continued.
Overview
We create an instrumented LLVM toolchain in a Copr project called kkleine/llvm-pgo-instrumented. In another Copr project called kkleine/profile-data-collection we install a new package called llvm-pgo-instrumented-macros
into the chroot buildroot. Every package that gets built in that buildroot will automatically produce a subpackage <PACKAGE>-llvm-pgo-profdata
with LLVM PGO profile data. We demonstrate this with a simple "Hello, World!" application that is called myapp
. We then collect all generated subpackages through BuildRequires:
tags in another package called llvm-pgo-profdata
. During the build of this llvm-pgo-profdata
package, all profiles are merged into an indexed profile data file. The final llvm-pgo-profdata
RPM then installs the indexed profile data file into a location from which a PGO optimized build of LLVM can read it. This PGO optimized build of the LLVM toolchain is done in a third Copr project called kkleine/llvm-pgo-optimized.
Non-goals
It is not a goal to get a perfectly tweaked PGO optimization build of LLVM. Instead we want to explore a way how to setup a pipeline in Copr for further tweaking and experimentation.
It is also not a goal to change from the standalone build mode to a mono-repo build with a single buildroot. But rest assured, we’re looking into merging the llvm
and clang
RPM packages into one so that we can apply PGO and LTO in a more holistic way.
The only operating system that we build for in this experiment is Fedora rawhide on x86_64.
Appendix A: Understand what PGO can do
PGO (Profile-Guided Optimization) allows your compiler to better optimize code for how it actually runs. Users report that applying this to Clang and LLVM can decrease overall compile time by 20%. (Source)
Profile information enables better optimization. For example, knowing that a branch is taken very frequently helps the compiler make better decisions when ordering basic blocks. Knowing that a function
foo
is called more frequently than another functionbar
helps the inliner. Optimization levels-O2
and above are recommended for use of profile guided optimization. […] [Be] careful to collect profiles by running your code with inputs that are representative of the typical behavior. Code that is not exercised in the profile will be optimized as if it is unimportant, and the compiler may make poor optimization choices for code that is disproportionately used while profiling. (Source)
For the Fedora Linux distribution we build a ton of packages with LLVM. The aforementioned inputs are these packages themselves. The programs to optimize are those under the LLVM umbrella (e.g. clang
).
The question is: How can we tap in the RPM build pipeline using Copr and build RPM packages without modifying their *.spec
files manually?
I’ve created a multi-step experiment that shows how this can be achieved. For educational purposes I’ve written many of the steps using Containerfile
s. This allows for a good level of isolation when you want to build the steps on your own. To run any of the steps on your own, navigate to the step directory and run make all
, for example:
$ cd step1-myapp
$ make all
But make sure you first read the description for each step below. Sometimes a step really only serves a documentation purpose and it isn’t necessary that you build it on your own.
Note
|
The Containerfiles run as root . Afterall the resulting images are not meant for anything but demonstration purposes and MUST NOT be used in production sites.
|
How to follow along?
I’ve been writing and testing everything on a Fedora Linux 37 laptop.
Here’s a good starting point for preparing your system if you want to follow along. Don’t worry we don’t install any custom RPM on your system. Everything is build either in a podman container or on Copr.
$ sudo dnf install -y git make fedpkg podman fedora-packager krb5-workstation asciidoctor pandoc # (1)
$ gem install pygments.rb asciimath (2)
$ git clone --recurse-submodules https://github.com/kwk/pgo-experiment.git # (3)
$ cd pgo-experiment # (4)
$ kinit <FAS_USER>@FEDORAPROJECT.ORG # (5)
-
Install packages that you need in order to build the steps. Maybe this list is not capturing everything you need but at least most of it.
asciidoctor
is optional for building this documentation usingmake docs
. -
OPTIONAL: Installing these ruby gems is only for building these docs using
make docs
. -
Clone the project including submodules.
-
Navigate to the project’s root directory.
-
OPTIONAL: If you want to build the steps that involve the
copr
CLI, you need to have a valid Kerberos ticket. Replace<FAS_USER>
with the your own Fedora FAS user name.
Step 0 - Build a PGO instrumented LLVM
Note
|
This step mainly exists for documentation purposes. If you do build this step on your own, make sure to walk through the files where there’s a reference to kkleine/llvm-pgo-instrumented and change it to your project. I don’t see a need to consider this part of this excersise. All we have to do is really pass along a few CMake flags when building these LLVM RPM packages: llvm , clang , and lld .
|
In this step, we’re essentially following the official documentation for how to build a PGO instumented clang. We’re going to create PGO instrumented LLVM packages and host them for later consumption on a Copr project. The resulting clang
will generate profile data upon execution and we’re trying to collect, bundle, and merge it for optimizing a rebuild of the LLVM toolchain later (Step 5 - Build PGO optimized LLVM). But rest assured, you don’t need to run this on your own. A build takes a couple of hours. The kkleine/llvm-pgo-instrumented project is ready for you to consume in the next steps. So you’re free to continue with Step 1 - Build "Hello, World!" RPM.
Spec file modifications
I’ve set up pgo-experiment
branches in each of the following package repositories on the Fedora Source:
In all of these repositries I’ve essentially done the same changes. At first I’ve added a build-conditional that is off by default:
%bcond_with pgo_instrumented_build
%bcond_with pgo_optimized_build
As you can see, one is for building an instrumented package and one is for building an optimized package. In Step 5 - Build PGO optimized LLVM we’re using the pgo_optimized_build
but here we’re only turning on pgo_instrumented_build
in our Makefile
:
.PHONY: create-copr-project
create-copr-project:
-copr create fedora-rawhide-x86_64 --unlisted-on-hp on $(copr_project)
-copr modify fedora-rawhide-x86_64 --unlisted-on-hp on $(copr_project)
copr edit-chroot --rpmbuild-with pgo_instrumented_build $(fas_user)/$(copr_project)/fedora-rawhide-x86_64
Another change I had to make was adding a build dependency on compiler-rt
:
%if %{with pgo_instrumented_build}
BuildRequires: compiler-rt
%endif
Then we’re modifying the the CMake arguments according to the official documentation.
\
%if %{without compat_build}
-DLLVM_VERSION_SUFFIX='' \
%endif
%if %{with pgo_instrumented_build}
Tip
|
There were a couple of errors that I ran into. One basically said:
As a solution I’ve added the
The solution was to modify |
Building Step 0
To build this step, run cd step0-instrumented-llvm && make all
.
Step 1 - Build "Hello, World!" RPM
In this step we set the foundation for our experiment.
We have a simple "Hello, World!" application that we build and package as an RPM file.
Tip
|
This step does NOT depend on Step 0 - Build a PGO instrumented LLVM. So you should be good to just run make build-step1 .
|
Let’s have a look at the specfile first:
# See https://docs.fedoraproject.org/en-US/packaging-guidelines/#_compiler_macros
%global toolchain clang
Name: myapp
Version: 1.0.0
Release: 1%{?dist}
Summary: A simple "Hello, World!" application.
License: Apache-2.0
URL: https://github.com/kwk/hello-world
Source0: myapp-%{version}.tar.bz2
BuildRequires: clang
BuildRequires: cmake
BuildRequires: git
%description
A simple "Hello, World!" application.
%prep
%autosetup -S git
%build
env
TMPDIR=%{_builddir}/raw-pgo-profdata2
export TMPDIR
mkdir -pv $TMPDIR
LLVM_PROFILE_FILE=%t/%{name}.llvm.%m.%p.profraw
export LLVM_PROFILE_FILE
env
%cmake -DCMAKE_BUILD_TYPE=Release
%cmake_build
llvm-profdata merge \
--compress-all-sections \
--sparse \
$(find /root/myapp/raw-pgo-profdata -type f -name '*.profraw') \
-o %{name}.llvm.profdata --text
%install
%cmake_install
%check
test "`%{buildroot}/%{_bindir}/myapp`" = "Hello, World!"
%files
%license LICENSE
%{_bindir}/myapp
%changelog
* Wed Mar 1 2023 Konrad Kleine <kkleine@redhat.com> - 1.0.0-1
- Building step1
This is the most simple specfile I could come up with for a "Hello, World!" application built with clang
.
The application code itself is similarly short and throughout this experiment we never change it:
#include <iostream>
int main(int argc, char *argv[]) {
std::cout << "Hello, World!" << std::endl;
return 0;
}
In order to build the RPM we use standard tools like fedpkg
from a step1-myapp/myapp/Makefile
:
# Prepare variables
TMP = $(CURDIR)/tmp
VERSION = $(shell grep ^Version myapp.spec | sed 's/.* //')
PACKAGE = myapp-$(VERSION)
FILES = LICENSE myapp.cpp \
myapp.spec CMakeLists.txt
.PHONY: source, tarball, rpm, srpm, clean
source:
mkdir -p $(TMP)/SOURCES
mkdir -p $(TMP)/$(PACKAGE)
cp -a $(FILES) $(TMP)/$(PACKAGE)
tarball: source
cd $(TMP) && tar vcfj ../$(PACKAGE).tar.bz2 $(PACKAGE)
rpm: tarball
fedpkg --release f37 --name myapp local -- --noclean
srpm: tarball
fedpkg --release f37 --name myapp srpm
clean:
rm -rf $(TMP) $(PACKAGE)* x86_64 .build-*.log
Within a Containerfile
we’re calling make rpm
to build the myapp-1.0.0-1.fc37.x86_64.rpm
RPM:
FROM fedora:rawhide
LABEL description="A basic specfile-to-RPM process demo"
# Install packages to build and package "myapp"
RUN dnf install -y cmake fedora-packager git clang
WORKDIR /root
COPY entrypoint.sh /root/entrypoint.sh
COPY ./myapp /root/myapp
USER root
ENTRYPOINT [ "/root/entrypoint.sh" ]
Once the build is done, we stay in the container (see bash
in the following shell script) and you have to manually exit it (e.g. using Ctrl+d). We do this to allow you to look around in the build directories etc.
#!/bin/bash
set -x
cd /root/myapp
make rpm || true
bash
Building Step 1
To build this step, run cd step1-myapp && make all
.
When you build this step, the output should look like this:
[...]
Wrote: /root/myapp/myapp-1.0.0-1.fc37.src.rpm
Wrote: /root/myapp/x86_64/myapp-debugsource-1.0.0-1.fc37.x86_64.rpm
Wrote: /root/myapp/x86_64/myapp-1.0.0-1.fc37.x86_64.rpm
Wrote: /root/myapp/x86_64/myapp-debuginfo-1.0.0-1.fc37.x86_64.rpm
+ bash
[root@7cf29caa0097 myapp]#
Step 2 - Automatically add subpackage
In this step we use the myapp
directory from step1
that doesn’t contain any information about the subpackage at all. And yet we’re still gonna get our subpackage with profile data. Let’s dive right in…
Building Step 2
To build this step, run cd step2-myapp-llvm-pgo-profdata && make all
.
When you build this step, the output should look like this:
[...]
Wrote: /root/myapp/myapp-1.0.0-1.fc37.src.rpm
Wrote: /root/myapp/x86_64/myapp-1.0.0-1.fc37.x86_64.rpm
Wrote: /root/myapp/x86_64/myapp-debugsource-1.0.0-1.fc37.x86_64.rpm
Wrote: /root/myapp/x86_64/myapp-debuginfo-1.0.0-1.fc37.x86_64.rpm
Wrote: /root/myapp/x86_64/myapp-llvm-pgo-profdata-1.0.0-1.fc37.x86_64.rpm
+ bash
[root@7cf29caa0097 myapp]#
How is it possible, that we got an additional myapp-llvm-pgo-profdata-1.0.0-1.fc37.x86_64.rpm
without changing the spec file?
We do this by installing a special macros package: llvm-pgo-instrumentation-macros
. This package is the home of many useful build-flags and macros but it also allows us to tap into the build process:
# Install the PGO instrumented (not PGO optimized!) LLVM Toolchain
# https://llvm.org/docs/HowToBuildWithPGO.html#building-clang-with-pgo We have
# to specify the version we want because rawhide could have moved on by now.
RUN dnf install -y 'dnf-command(copr)'
RUN dnf copr enable -y kkleine/llvm-pgo-instrumented
RUN dnf install -y \
clang-16.0.2-2.fc39 \
clang-libs-16.0.2-2.fc39 \
clang-resource-filesystem-16.0.2-2.fc39 \
llvm-16.0.2-2.fc39 \
llvm-libs-16.0.2-2.fc39 \
llvm-pgo-instrumentation-macros-16.0.2-2.fc39
We store our RPM macros for PGO in the /etc/rpm/
directory which is usually dedicated for per-host overrides. We need to override macros that exist only fairly late in the list of the macro path that is executed from left to right:
$ rpm --showrc|grep -i "macro path"
Macro path: /usr/lib/rpm/macros:/usr/lib/rpm/macros.d/macros.*:/usr/lib/rpm/platform/%{_target}/macros:/usr/lib/rpm/fileattrs/*.attr:/usr/lib/rpm/redhat/macros:/etc/rpm/macros.*:/etc/rpm/macros:/etc/rpm/%{_target}/macros:~/.rpmmacros
A natrual choice would have been to pick %{_rpmmacrodir}
which expands to /usr/lib/rpm/macros.d
but then we wouldn’t be able to override macros from the redhat-rpm-config
package (see this pr for more information).
Summary
It is important to note that in order to achieve the additional subpackage, we only had to modify the LLVM package and no other packages.
Step 3 - Build unmodified packages on Copr
Note
|
You don’t need to run this step manually. It has already been run and the results are in the Copr project kkleine/profile-data-collection. |
Up until this point all of our experiments look promising but how can we use Copr to build packages and produce <PACKAGE>-llvm-pgo-profdata
packages automatically for us?
Copr will become the storage for our profile data subpackages with all the rest of the regular packages.
After running this step using cd step3-myapp-on-copr && make all
, we’re gonna have a project called: kkleine/profile-data-collection.
In that project, there will be the myapp
package with the additional subpackage (myapp-llvm-pgo-profdata 1.0.0
) inside:
In order for the Copr project to use our PGO instrumented LLVM we’ve made the repo available in the step3-myapp-on-copr/Makefile
using the --repo
option:
.PHONY: create-copr-project
create-copr-project:
-copr create --chroot fedora-rawhide-x86_64 --unlisted-on-hp on --repo copr://$(fas_user)/llvm-pgo-instrumented $(copr_project)
copr modify --chroot fedora-rawhide-x86_64 --unlisted-on-hp on --repo copr://$(fas_user)/llvm-pgo-instrumented $(copr_project)
copr edit-chroot --packages llvm-pgo-instrumentation-macros $(fas_user)/profile-data-collection/fedora-rawhide-x86_64
Any package that will be built in the kkleine/profile-data-collection Copr project will automatically have a <package>-llvm-pgo-profdata
subpackage that we can download in a later step to merge and feed it in the final, optimized build of LLVM.
Optional: Build from distgit
If you want, you can build any project from Fedora’s distigt by doing:
$ cd step3-myapp-on-copr/
$ make distgit-<PACKAGE> # (1)
-
Replace
<PACKAGE>
with a real package name, e.g.retsnoop
, orchromium
.
Step 4 - Merge PGO Profiles
In order to optimize LLVM with the raw profile data that we’ve collected before we need to make it available to the Copr build of LLVM and we need to merge it using llvm-profdata merge
.
[Merging] takes several profile data files generated by PGO instrumentation and merges them together into a single indexed profile data file. (Source)
The <PACKAGE>-llvm-pgo-profdata
packages that we’ve build so far are installable standalone. When we build a PGO optimized version of LLVM we add a BuildRequires: myapp-llvm-pgo-profdata
to the spec file of a new package called llvm-pgo-profdata
.
BuildRequires: myapp-llvm-pgo-profdata
BuildRequires: retsnoop-llvm-pgo-profdata
The %build
section of our llvm-pgo-profdata
spec file merges the profiles provided by the above <PACKAGE>-llvm-pgo-profdata
packages to create a single PGO profile data file that we can later use for building a PGO optimized LLVM toolchain.
llvm-profdata merge \
--compress-all-sections \
--sparse \
%{_libdir}/llvm-pgo-profdata/myapp/* \
%{_libdir}/llvm-pgo-profdata/retsnoop/* \
-output llvm-pgo.profdata
%files
%license LICENSE
%{_libdir}/llvm-pgo-profdata/llvm-pgo.profdata
Caution
|
The Listing 14. step4-merge-profiles/llvm-pgo-profdata/llvm-pgo-profdata.spec
|
In Fedora as well as RHEL and CentOS Stream we use a build mode called "standalone-build". That means, we’re building each sub-project of LLVM (e.g. clang
, llvm
, lld
) with its own specfile. To avoid merging the PGO profile data into an indexed profile data file more than once we’re offloading the merge process into its own RPM. We call it llvm-pgo-profdata
.
Step 5 - Build PGO optimized LLVM
This step is similar to Step 0 - Build a PGO instrumented LLVM in which we’ve build the PGO instrumented LLVM. Here we’re adding a build requirement for llvm-pgo-profdata
:
%if %{with pgo_optimized_build}
BuildRequires: llvm-pgo-profdata
%endif
We then use the file %{_libdir}/llvm-pgo-profdata/llvm-pgo.profdata
provided by our llvm-prog-profdata
package as input to LLVM_PROFDATA_FILE
:
%if %{with pgo_optimized_build}
-DLLVM_PROFDATA_FILE=%{_libdir}/llvm-pgo-profdata/llvm-pgo.profdata \
%endif
Together with the proper --with pgo_optimized_build
build-conditional, we’re building the optimized llvm
, clang
and lld
packages:
.PHONY: create-copr-project
create-copr-project:
-copr create --chroot fedora-rawhide-x86_64 --unlisted-on-hp on --repo copr://$(fas_user)/profile-data-collection $(copr_project)
copr modify --chroot fedora-rawhide-x86_64 --unlisted-on-hp on --repo copr://$(fas_user)/profile-data-collection $(copr_project)
copr edit-chroot --rpmbuild-with pgo_optimized_build $(copr_project)/fedora-rawhide-x86_64
The resulting PGO optimized packages are available on kkleine/llvm-pgo-optimized.
Step 6 - Evaluation
What we test here is the LLVM shipped with rawhide at the time against a PGO optimized LLVM 16.0.2 that we’ve built here.
We test this using the LLVM test suite:
The test-suite contains benchmark and test programs. The programs come with reference outputs so that their correctness can be checked. The suite comes with tools to collect metrics such as benchmark runtime, compilation time and code size.
In the evaluation we keep an eye on the:
-
execution time
-
compile time
-
link time
$ /root/test-suite/utils/compare.py --metric exec_time --metric compile_time --metric link_time --lhs-name 16.0.3 --rhs-name 16.0.2-pgo /root/rawhide/results.json vs /root/pgo/results.json
Warning: 'test-suite :: SingleSource/UnitTests/X86/x86-dyn_stack_alloc_realign.test' has no metrics, skipping!
Warning: 'test-suite :: SingleSource/UnitTests/X86/x86-dyn_stack_alloc_realign2.test' has no metrics, skipping!
Warning: 'test-suite :: SingleSource/UnitTests/X86/x86-dyn_stack_alloc_realign.test' has no metrics, skipping!
Warning: 'test-suite :: SingleSource/UnitTests/X86/x86-dyn_stack_alloc_realign2.test' has no metrics, skipping!
Tests: 3052
Metric: exec_time,compile_time,link_time
Program exec_time compile_time link_time
16.0.3 16.0.2-pgo diff 16.0.3 16.0.2-pgo diff 16.0.3 16.0.2-pgo diff
920428-1.t 0.00 0.00 inf% 0.00 0.00 0.03 0.02 -27.8%
pr17078-1.t 0.00 0.00 inf% 0.00 0.00 0.03 0.03 -4.2%
enum-2.t 0.00 0.00 inf% 0.00 0.00 0.03 0.04 36.4%
doloop-1.t 0.00 0.00 inf% 0.00 0.00 0.03 0.04 30.0%
divconst-3.t 0.00 0.00 inf% 0.00 0.00 0.02 0.02 -17.9%
pr81556.t 0.00 0.00 inf% 0.00 0.00 0.03 0.03 24.6%
divcmp-4.t 0.00 0.00 inf% 0.00 0.00 0.03 0.04 13.9%
20020307-1.t 0.00 0.00 inf% 0.00 0.00 0.03 0.02 -26.5%
20020314-1.t 0.00 0.00 inf% 0.00 0.00 0.02 0.03 23.7%
divcmp-3.t 0.00 0.00 inf% 0.00 0.00 0.03 0.03 -20.3%
20020328-1.t 0.00 0.00 inf% 0.00 0.00 0.03 0.03 6.0%
20020406-1.t 0.00 0.00 inf% 0.00 0.00 0.03 0.03 27.0%
20020411-1.t 0.00 0.00 inf% 0.00 0.00 0.04 0.03 -20.1%
complex-4.t 0.00 0.00 inf% 0.00 0.00 0.03 0.03 1.4%
20020508-1.t 0.00 0.00 inf% 0.00 0.00 0.04 0.03 -14.0%
Geomean difference -100.0% -9.7% -1.2%
exec_time compile_time link_time
l/r 16.0.3 16.0.2-pgo diff 16.0.3 16.0.2-pgo diff 16.0.3 16.0.2-pgo diff
count 3034.000000 3034.000000 2401.000000 2505.000000 2505.000000 440.000000 2505.000000 2505.000000 2505.000000
mean 1091.690748 1074.387911 inf 0.259116 0.225875 -0.077137 0.049104 0.048398 0.014828
std 21120.154138 20962.649384 NaN 2.214408 1.988421 0.199779 0.032997 0.032546 0.237169
min 0.000000 0.000000 -1.000000 0.000000 0.000000 -0.494005 0.017100 0.017500 -0.551422
25% 0.000000 0.000000 -0.227273 0.000000 0.000000 -0.195129 0.029100 0.029100 -0.161290
50% 0.001100 0.001100 0.000000 0.000000 0.000000 -0.110612 0.034300 0.033600 -0.010672
75% 0.126725 0.123600 0.212121 0.000000 0.000000 0.011439 0.045700 0.044400 0.161049
max 817849.818925 828252.719527 inf 74.697400 69.996700 0.844595 0.206500 0.227000 0.980296
The most important line to look at is this:
Geomean difference -100.0% -9.7% -1.2%
In order to interpret the results one has to understand that all programs being tested are too fast to measure their execution time, hence the inf%
.
The compile time on the other hand shows a performance improvement of 9.7% when going from LLVM 16.0.3 to PGO optimized LLVM 16.0.2. The performance of linking was also improved by 1.2%.
Conclusion
We’ve seen how we can gather PGO profile data from building unmodified RPM packages and feed this data into a PGO-optimized recompilation of LLVM.
The most tricky part for me was the background merge script. Building an instrumented and optimized step was the most straight-forward part.
But looking at the almost 10% performance boost in compile time I really like how this experiment turned out. And I wonder how far we can take this if we build llvm and clang in one buildroot.
I hope you liked this article and follow us exploring the possibilities ahead of us! Don’t forget to leave a comment ;)
Frequently Asked Questions
How can I view the top 10 functions?
To view the top 10 functions profiled in a profile file you can use llvm-profdata
below:
$ podman run -it --rm fedora:rawhide bash # (1)
# dnf install -y 'dnf-command(copr)' # (2)
# dnf copr -y enable kkleine/profile-data-collection # (3)
# dnf install -y llvm llvm-pgo-profdata # (4)
# llvm-profdata show --topn=10 /usr/lib64/llvm-pgo-profdata/llvm-pgo.profdata | llvm-cxxfilt # (5)
Instrumentation level: IR entry_first = 0
Total functions: 36265
Maximum function count: 4301163594
Maximum internal block count: 321869494
Top 10 functions with the largest internal block counts:
llvm::hashing::detail::hash_combine_recursive_helper::hash_combine_recursive_helper(), max count = 4301163594
llvm::SmallPtrSetImplBase::insert_imp(void const*), max count = 606844728
llvm::SmallPtrSetImplBase::find_imp(void const*) const, max count = 337050642
llvm::MDNode::classof(llvm::Metadata const*), max count = 321592832
llvm::SmallVectorTemplateBase<unsigned int, true>::push_back(unsigned int), max count = 308883764
llvm::AttributeList::hasFnAttr(llvm::Attribute::AttrKind) const, max count = 292092119
llvm::APInt::APInt(unsigned int, unsigned long, bool), max count = 250279393
llvm::StringMapImpl::LookupBucketFor(llvm::StringRef), max count = 166379572
llvm::AttributeSetNode::findEnumAttribute(llvm::Attribute::AttrKind) const, max count = 164408905
llvm::AttributeSet::getMemoryEffects() const, max count = 161737452
-
Fire up a rawhide container.
-
Install the dnf plugin to enable Copr repos.
-
Enable the repository that contains the
llvm-pgo-profdata
package. -
Install
llvm
to get thellvm-profdata
andllvm-cxxfilt
binaries and install the profile packagellvm-pgo-profdata
which we use for optimizing LLVM. -
Show the top 10 hottest functions demangled by
llvm-cxxfilt
.
What is the LLVM_PROFILE_FILE environment variable?
By specifying export LLVM_PROFILE_FILE="%t/myapp.llvm.%m.%p.profraw"
we instruct clang
to create a raw profile file for each invocation under TMPDIR
(see %t
in the docs).
Caution
|
When experimenting with different templates I noticed that
Afterall, how can a function call be counted in a thread-safe manner? Let’s suppose you have four threads that all call a specific function |
Why do we merge profiles and not keep the raw ones?
Short answer: because size matters!
In the %install
section of the specfile we then find all raw profiles and merge them into the final %{buildroot}%{_libdir}/llvm-pgo-profdata/%{name}/%{name}.llvm.profdata
under the buildroot to be picked up by the %files
section of the %{name}-llvm-pgo-profdata
subpackage:
%__pgo_merge_profdata %[ 0%{__llvm_pgo_subpackage} > 0 ? "\\\
mkdir -pv %{buildroot}%{_libdir}/llvm-pgo-profdata/%{name} \\\
&& %{__pgo_env} \\\
&& llvm-profdata merge \\\
--compress-all-sections \\\
--sparse \\\
%{__pgo_background_merge_target} \\\
$(find %{_builddir}/raw-pgo-profdata -type f -name '*.profraw') \\\
-o %{buildroot}%{_libdir}/llvm-pgo-profdata/%{name}/%{name}.llvm.profdata \\\
" : "%{nil}" ]
The bigger a package gets, the more problematic disk space is going to be. For example, when compiling the chromium
project with an instrumented LLVM toolchain, I ran into these error messages after 1 hour:
LLVM Profile Error: Failed to write file "/builddir/build/BUILD/raw-pgo-profdata//chromium.llvm.1970228969820616430_0.24617.profraw": No space left on device
As a consequence, we cannot let the build process continue to run until it is done (until the end of the %build
section) only to then pick up the pieces and merge the raw profiles. We have to do this continuously in order to avoid disk space issues.
To imporove this situation, we’re starting a background merge script right before we enter the %build
section:
%__llvm_pgo_instrumented_spec_build_pre \
[ 0%{__llvm_pgo_subpackage} > 0 ] \\\
&& %{__pgo_env} \\\
&& /usr/lib/rpm/redhat/pgo-background-merge.sh \\\
-d %{__pgo_profdir} \\\
-f %{__pgo_background_merge_target} \\\
-p %{__pgo_pid_file} & \
In order to stop the background job before it gets killed by the __spec_build_post
macro, we’ve got this macro:
%__llvm_pgo_instrumented_spec_build_post \
if [ 0%{__llvm_pgo_subpackage} > 0 ]\
then\
echo 'please exit' > %{__pgo_shutdown_file};\
[ -e %{__pgo_pid_file} ] && inotifywait -e delete_self %{__pgo_pid_file} || true;\
fi\
We ask the background job to gracefully shut down by writing to a shutdown file. Then we wait using inotifywait
until the background job’s PID (process ID) file is deleted.
The
%__llvm_pgo_instrumented_spec_build_post
macro is used in the override of
%__spec_build_post
which among other situations is called at the end of each %build
macro:
# Overriding __spec_build_post macro from /usr/lib/rpm/macros
%__spec_build_post \
%{?__llvm_pgo_instrumented_spec_build_post} \
%{___build_post}
Important
|
But why not store the raw profiles? In the first incarnation of this experiment I did store the raw profiles and I noticed that the final |
Tip
|
You can call llvm-profdata merge on already merged profiles.
|
PGO background merge
The background script itself waits for close_write
events on *.profraw
files in a directory to be observed. It writes the filenames into a batch file:
# On every *.profraw file written to in the directory <observe_dir>,
# add the file name to list of files to process in a batch.
inotifywait -q -m -o $batch_file -e close_write \
--format '%f' \
--include $files_regex \
$observe_dir > /dev/null 2>&1 &
Once the batch size reaches the minimum size, we merge the profiles in the batch file and delete them when we’re done. This saves disk space when building large projects.
# llvm-profdata itself is instrumented as well so we need to
# tell it where to write its own profile data.
# TODO(kwk): Eventually use this in the final merge?
export TMPDIR=/tmp
export LLVM_PROFILE_FILE="%t/llvm-profdata.tmp"
pushd $observe_dir
llvm-profdata merge \
--compress-all-sections \
--sparse \
`[ -e $target_merge_file ] && echo "$target_merge_file"` \
$(cat $batch_file_in_process) \
-o $target_merge_file
# IMPORTANT: Free up disk space!
rm -fv $(cat $batch_file_in_process)
popd
rm -f $TMPDIR/llvm-profdata.tmp
Now, for the simple application in this experiment it might look like overkill, but trust me, we need this for building bigger projects like chromium
.
How to toggle off PGO profile package generation?
When installing the PGO instrumented LLVM we can still turn off the generation of profile files by putting %global __llvm_pgo_subpackage %{nil}
in the spec file (e.g. in myapp.spec
).
Important
|
Currently there’s no sanity checking of whether or not a package can even produce PGO profiles. If there’s no compiler or the compiler is not clang, my patch doesn’t work. But right now we don’t care so much about this and consider it an optimization for later. I just wanted to let you know. |
Do you PGO for cross-compilation?
No. By default we optimize for each individual architecture. We think that this is good for now. The cases in which you want to cross-compile on one architecture for another exists but are not considered here (for now).
Resources
Here’s a list of places to find out more about PGO and RPM Package building.
-
For building LLVM with PGO: https://llvm.org/docs/HowToBuildWithPGO.html#building-clang-with-pgo
-
PGO in general: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
-
llvm-profdata
: https://llvm.org/docs/CommandGuide/llvm-profdata.html#profdata-merge -
Source-based coverage: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program