Enabling GCC Graphite and LTO on Gentoo

Jul 08 2013

In this article, we will be enabling the GCC options marked with “To use this code transformation, GCC has to be configured with –with-ppl and –with-cloog to enable the Graphite loop transformation infrastructure. “, referred to as graphite. In addition, Link Time Optimization(LTO) refers to the options -flto and -fuse-linker-plugin. -fuse-linker-plugin tells GCC to use an external linker and -flto defers link time optimizations until the final link step so that the optimizer can work across the whole program rather than on individual object files. OpenMP is for parallelization and I save it for another article since I do not have benchmarks to show that it improves performance.
To enable graphite, LTO, and OpenMP it is recommended to first install the latest version of GCC. More recent versions of GCC contain bug fixes for features we are about to enable. The GCC upgrade process is detailed in the guide http://www.gentoo.org/doc/en/gcc-upgrading.xml.

After upgrading GCC, we start enabling the newer features. The steps follow the guidelines on the Gentoo wiki roughly. However,  it is unnecessary to rebuild everything after upgrading GCC as explained in the official GCC upgrade guide. libtool allows packages to be built against shared libraries without rebuilding every time the toolchain is upgraded.
Before adding the graphite USE flag, it is essential to build cloog

emerge dev-libs/cloog dev-libs/cloog-ppl

Then add graphite to CFLAGS and enable graphite USE flags

GRAPHITE="-floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block"
CFLAGS="-flto=8 ${GRAPHITE} -ftree-vectorize"
CXXFLAGS="${CFLAGS}"
LDFLAGS="${CFLAGS} -fuse-linker-plugin"
# GCC >= 4.8.2 requires the lto flag
USE="graphite lto"

These flags can be appended to the ones you’re already using, although certain flags may conflict with the graphite optimizations. Since graphite flags such as -ftree-loop-distribution is intended to enable further vectorization, I also enabled the -ftree-vectorize flag. The -fuse-linker-plugin is a linker flag while -flto=n turns on the standard link-time optimizer. I set n to 8 for 8 threads while linking. Another factor that could affect performance is -flto-partition, which sets the partitioning algorithm.
To use the linker plugin with LTO, set the linker to Gold:

binutils-config --linker ld.gold

In addition, before compiling the system it is necessary to build GCC with graphite:

emerge gcc
emerge -e world

Dealing with breakages

Several packages failed to compile with LTO and graphite enabled. These flags can be disabled on a per package basis. First create the files in /etc/portage/env/ called no-lto.conf and no-graphite.conf. In no-graphite.conf, disable your graphite flags:

CFLAGS="${CFLAGS} -fno-loop-interchange -fno-tree-loop-distribution -fno-loop-strip-mine -fno-loop-block"
CXXFLAGS="${CXXFLAGS} -fno-loop-interchange -fno-tree-loop-distribution -fno-loop-strip-mine -fno-loop-block"
LDFLAGS="${LDFLAGS} -fno-loop-interchange -fno-tree-loop-distribution -fno-loop-strip-mine -fno-loop-block"

Similary, disable your LTO flags in no-lto.conf:

CFLAGS="${CFLAGS} -fno-lto -fno-use-linker-plugin"
CXXFLAGS="${CXXFLAGS} -fno-lto -fno-use-linker-plugin"
LDFLAGS="${LDFLAGS} -fno-lto -fno-use-linker-plugin"

Every time a package breaks, add a line to /etc/portage/package.env that either disables graphite or LTO. For example, I made these adjustments:

net-misc/curl no-graphite.conf
media-libs/mesa no-graphite.conf
sys-apps/findutils no-lto.conf
sys-apps/gawk no-lto.conf

After adding a line you can resume the system rebuild with emerge --resume.

Notes

Before re-compiling your system, setting moving /var/tmp to RAM speeds things up. The Sabayon wiki has an article on performance, just make sure you reboot after changing /etc/fstab.

After building the current version of PPL, this message is displayed for upgrades:

* After an upgrade of PPL it is important that you rebuild
* dev-libs/cloog-ppl.
*
* If you use gcc-config to switch to an older compiler version than
* the one PPL was built with, PPL must be rebuilt with that version.
*
* In both cases failure to do this will get you this error when
* graphite flags are used:
*
* sorry, unimplemented: Graphite loop optimizations cannot be used

2 responses so far

  • Daniel

    Hi there!
    Thank you for information.
    I want to ask you how to set ‘n’ in -flto=n option.
    I use an core 2 duo (2 cores)
    It’s somehow related to the number of the cores?
    Thank you,
    Daniel

  • yuguang

    You would set it to the number of threads supported by your CPU plus one. In your case, it’s 3. If you want to check, run “grep “processor” /proc/cpuinfo | wc -l”.