Enabling GCC Graphite and LTO on Gentoo

Jul 08 2013 Published by under Linux

In this article, we will be enabling the GCC options marked with “To use this code transformation, GCC has to be configured with –with-ppl and –with-cloog to enable the Graphite loop transformation infrastructure. “, referred to as graphite. In addition, Link Time Optimization(LTO) refers to the options -flto and -fuse-linker-plugin. -fuse-linker-plugin tells GCC to use an external linker and -flto defers link time optimizations until the final link step so that the optimizer can work across the whole program rather than on individual object files. OpenMP is for parallelization and I save it for another article since I do not have benchmarks to show that it improves performance.
To enable graphite, LTO, and OpenMP it is recommended to first install the latest version of GCC. More recent versions of GCC contain bug fixes for features we are about to enable. The GCC upgrade process is detailed in the guide http://www.gentoo.org/doc/en/gcc-upgrading.xml.

After upgrading GCC, we start enabling the newer features. The steps follow the guidelines on the Gentoo wiki roughly. However,  it is unnecessary to rebuild everything after upgrading GCC as explained in the official GCC upgrade guide. libtool allows packages to be built against shared libraries without rebuilding every time the toolchain is upgraded.
Before adding the graphite USE flag, it is essential to build cloog

emerge dev-libs/cloog dev-libs/cloog-ppl

Then add graphite to CFLAGS and enable graphite USE flags

GRAPHITE="-floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block"
CFLAGS="-flto=8 ${GRAPHITE} -ftree-vectorize"
CXXFLAGS="${CFLAGS}"
LDFLAGS="${CFLAGS} -fuse-linker-plugin"
# GCC >= 4.8.2 requires the lto flag
USE="graphite lto"

These flags can be appended to the ones you’re already using, although certain flags may conflict with the graphite optimizations. Since graphite flags such as -ftree-loop-distribution is intended to enable further vectorization, I also enabled the -ftree-vectorize flag. The -fuse-linker-plugin is a linker flag while -flto=n turns on the standard link-time optimizer. I set n to 8 for 8 threads while linking. Another factor that could affect performance is -flto-partition, which sets the partitioning algorithm.
To use the linker plugin with LTO, set the linker to Gold:

binutils-config --linker ld.gold

In addition, before compiling the system it is necessary to build GCC with graphite:

emerge gcc
emerge -e world

Dealing with breakages

Several packages failed to compile with LTO and graphite enabled. These flags can be disabled on a per package basis. First create the files in /etc/portage/env/ called no-lto.conf and no-graphite.conf. In no-graphite.conf, disable your graphite flags:

CFLAGS="${CFLAGS} -fno-loop-interchange -fno-tree-loop-distribution -fno-loop-strip-mine -fno-loop-block"
CXXFLAGS="${CXXFLAGS} -fno-loop-interchange -fno-tree-loop-distribution -fno-loop-strip-mine -fno-loop-block"
LDFLAGS="${LDFLAGS} -fno-loop-interchange -fno-tree-loop-distribution -fno-loop-strip-mine -fno-loop-block"

Similary, disable your LTO flags in no-lto.conf:

CFLAGS="${CFLAGS} -fno-lto -fno-use-linker-plugin"
CXXFLAGS="${CXXFLAGS} -fno-lto -fno-use-linker-plugin"
LDFLAGS="${LDFLAGS} -fno-lto -fno-use-linker-plugin"

Every time a package breaks, add a line to /etc/portage/package.env that either disables graphite or LTO. For example, I made these adjustments:

net-misc/curl no-graphite.conf
media-libs/mesa no-graphite.conf
sys-apps/findutils no-lto.conf
sys-apps/gawk no-lto.conf

After adding a line you can resume the system rebuild with emerge --resume.

Notes

Before re-compiling your system, setting moving /var/tmp to RAM speeds things up. The Sabayon wiki has an article on performance, just make sure you reboot after changing /etc/fstab.

After building the current version of PPL, this message is displayed for upgrades:

* After an upgrade of PPL it is important that you rebuild
* dev-libs/cloog-ppl.
*
* If you use gcc-config to switch to an older compiler version than
* the one PPL was built with, PPL must be rebuilt with that version.
*
* In both cases failure to do this will get you this error when
* graphite flags are used:
*
* sorry, unimplemented: Graphite loop optimizations cannot be used

2 responses so far

Phoronix GCC Graphite and LTO Benchmarks

Jul 08 2013 Published by under Linux

GCC Graphite and Link Time Optimization are new features that many Gentoo users do not enable because they have not been proven to improve performance. I recently did a fresh install of Funtoo optimized for Core i7 and a recompile of the system to run Phoronix benchmarks. Before we look at the benchmarks, it is imperative to note that Phoronix is not intended to test the performance of programs compiled with certain sets of GCC flags. The test suite documentation states,

… for the software being tested that may be installed already on the system by the user, the Phoronix Test Suite will ignore those installations. The Phoronix Test Suite installs all tests within its configured environment to improve the reliability of the testing process.

which means the test suite disables system-wide GCC flags for test packages. So out of the tests I ran, only the ones that tested the packages compiled with Graphite and LTO enabled are worth looking at, and these are PyBench and gzip compression. Of the other tests, video and audio encoding as well as the lzma compression are built using the make command, which bypasses the Gentoo build system. If we look at the individual test packages, the lzma package downloads the lzma source

<?xml version="1.0"?>
<!--Phoronix Test Suite v3.0.0a3 (Iveland) [ http://www.phoronix-test-suite.com/ ]-->
<PhoronixTestSuite>
  <Downloads>
    <Package>
      <URL>http://mirror.internode.on.net/pub/gentoo-portage/distfiles/lzma-4.32.6.tar.gz, http://buildroot.uclibc.org/downloads/sources/lzma-4.32.6.tar.gz</URL>
      <MD5>211d6207fdd7f20eaaae1bbdeb357d3a</MD5>
      <FileSize>478661</FileSize>
    </Package>
  </Downloads>
</PhoronixTestSuite>

and builds it without using Gentoo’s package manager. That means it doesn’t use the GCC optimization flags.

#!/bin/sh

mkdir $HOME/lzma_

tar -zxvf lzma-4.32.6.tar.gz
cd lzma-4.32.6
./configure --prefix=$HOME/lzma_
make -j $NUM_CPU_JOBS
echo $? > ~/install-exit-status
make install
cd ..
rm -rf lzma-4.32.6

cat > compress-lzma <<EOT
#!/bin/sh
./lzma_/bin/lzma -q -c ./compressfile > /dev/null 2>&1
EOT

chmod +x compress-lzma

On the other hand, the gzip package doesn’t require any downloads and uses the system’s gzip

#!/bin/sh

cat > compress-gzip <<EOT
#!/bin/sh
cat compressfile | gzip -c > /dev/null 2>&1
EOT

chmod +x compress-gzip

At last, we are ready to understand the results, which do indicate that these new GCC features improve performance.
http://openbenchmarking.org/result/1307063-UT-GCCOPTIMI03

No responses yet