Skip navigation

Monthly Archives: November 2011

If you’re running Gentoo, you probably have “-march=native” in your CFLAGS, since that flag gives gcc permission to examine the running processor and decide for itself what CPU features are available.

If you have any interest in distcc or cross-compiling, though, you most definitely do not want to use that flag; gcc will be making assumptions based on the processor your programs are compiling on, not on the processor your programs will be executing on.

There’s a fairly neat solution to this. Run this line on the machine you’re compiling for, and it will emit the gcc arguments that -march=native would translate to:

gcc -march=native -E -v – </dev/null 2>&1 | sed -n ‘s/.* -v – //p’

You can take the output of that command, add any additional flags you like (in my case, -O2 and -ggdb), and add it to your CFLAGS on that system. Should you choose to use distcc in the future, the machines in your compile pool will use the appropriate set of architecture-specific optimizations.

Incidentally, variations on this line have been floating around the gentoo-users mailing list since this summer; this is the cleanest form I’ve seen of it, which I pulled from a comment in Stefan G. Weichinger’s make.conf file.

I got my hands on an awesome system recently; it has two E5345 Xeon processors in it, which means it has eight physical Xeon cores. Here’s what my CFLAGS looks like for it:

CFLAGS=”-O2 -pipe -D_FORTIFY_SOURCE=2 -march=core2 -mcx16 -msahf –param l1-cache-size=32 –param l1-cache-line-size=64 –param l2-cache-size=4096 -mtune=core2 -ggdb”

Now, if you’re using Gentoo, you’re probably utilizing parallel make via the -j parameter in MAKEOPTS in /etc/make.conf. To find out what value to pass -j, I go by N*2 or N*2+1, where N is the number of CPU cores. That works well for my Phenom 9650, and it happens that it works well for the Intel E5345 in my setup.

-j only gives you a benefit when the Makefile structure of a program can be parallelized. Most aren’t structured perfectly for this; there points where Make must get one part finished before it can do anything else. At that point, it doesn’t matter how many CPU cores you have; everything’s waiting on one instance of your compiler to finish executing before things may continue.

As it happens, the emerge  command also takes a -j parameter, and it behaves very much like Make’s, except it parallelizes the building of multiple packages, rather than of multiple components within the same package. Also, as it happens, package dependencies result in the same pattern of limitations as you see with -j in Make. Worse, if you have Make dispatching 16 jobs at once, and emerge dispatching 16 jobs at once, you’ll have up to 256 jobs flying at the same time. If you’ve only got eight cores, you’ll lose so much time to CPU context switches bouncing between different processes, you’ll see your builds take longer than if you’d told it to only dispatch one job at a time…

It turns out that Make has another useful option: -l

-l tells Make not to spawn another process if the system load average is above a certain value. Roughly speaking, your system load is calculated as the number of processes executing or waiting to be executed by the system scheduler.

My first “emerge -e world”, with -j16 in MAKEOPTS, took 103 minutes.

With:

MAKEOPTS=”-j16 -l10″

“emerge -e -j8 @world” took 89 minutes.

That’s a respectable improvement!

*thump* *thump*

Is this thing on?

And, uh, serif font? Really? Or is that just the editing interface?

I love my LUG