We have been monitoring most farm machines with munin for some time. This allows anyone to check whether a machine is heavily loaded or not before starting to use it, but it is time-consuming to check many graphs from several machines.
To quickly see the load of a machine, we now display a usage bar directly in the list of machines! We currently show 3 metrics: CPU usage, memory usage, and disk I/O load. The values are based on the last 48 hours of munin data and should give a good overview of current usage and expected performance.
Nethertheless, these values are indicative and only reflect the average usage. If you plan to use a machine for heavy tasks, you should check the munin graphs to better understand the usage pattern and make sure you don't disrupt the tasks of existing users. In addition, a few values are missing or incorrect: this happens either because the machine was not recently monitored by munin, or because the total number of CPU is incorrectly detected.
The value displayed in the usage bars is computed as a weighted average that gives more importance to high usage values. For instance:
In each case, the arithmetic average would be the same (20% usage). However, we consider that the machine is more loaded in the third case. Intuitively, if you would like to run a new task, using a machine that is already 100% busy (even only 20% of the time) is generally a bad idea: the new task might significantly interfere with the existing ones. In contrast, a machine that is 20% busy still has a lot of room for more tasks.
We are happy to announce that a new IBM server has been added to the farm, thanks to OSUOSL!
This POWER9 server extends the already available IBM hardware in the compile farm: POWER7 (gcc110, gcc111) and POWER8 (gcc112, gcc119). It features 128 CPU cores and 256 GB of RAM, as well as a high-performance I/O setup with 8 disks in a RAID5 configuration. Disk I/O was the main bottleneck for the existing POWER machines, so hopefully this new machine will provide much better performance in practice.
It is already possible to connect to this new machine through SSH at gcc135.fsffrance.org.
gcc123 recently suffered from a disk failure, just a few months after being operational. Today, OSUOSL changed the disk and reinstalled the machine. Because there is no RAID on this machine, any user data stored on the machine was lost.
This is a good opportunity to remind that the compile farm is "best effort": we make no promise whatsover about data integrity, so make sure to always have a copy of your important work somewhere else!
The SSH host keys of gcc123 have been restored, so SSH access should work just fine with the reinstalled system.
gcc12 just reached 3000 days of uptime, which amounts to a little more than 8 years!
Here is a bit of history on this machine hosted by FSF France:
2007-06-24 gcc11/12/13 installed 2007-07-22 gcc11/12/13 moved to datacenter, gcc01..09 stopped, gcc08 online at a temporary location 2007-11-25 gcc11/12/13 moved to new datacenter, downtime 1700 UTC to 2000 UTC 2008-05-18 gcc11 and gcc12 moved to new datacenter (same IP). 2009-08-14 planned downtime for gcc11/gcc12 at the end of august 2009-08-31 gcc12 is down, please use gcc13 until gcc12 is restored 2009-09-21 gcc11/12 are up in their new FSF France datacenter in Rennes
Since then, the machine (and its hosting facility) has been extremely stable. Of course, since it is still running Debian 5 "lenny" and has limited hardware resources for today's standards, its usage is quite low nowadays.
There were even older machines in the GCC compile farm, see the history of the project.
We are happy to announce that four new x86_64 servers have joined the farm, numbered gcc120 to gcc123. These are Open Compute Project servers from Facebook, with two 8-cores Xeon and 144 GB of RAM. Two of them are running CentOS 7, and the two others are running Debian 9.
Note that these machines are reachable on a non-default SSH port. The SSH ports are displayed in the list of machines, and there is a SSH configuration that you can copy-paste (click on
Show SSH config at the top of the page).
Many thanks to OSUOSL for setting up and hosting these servers!
We are happy to announce that 4 new x86_64 virtual machines have joined the farm:
- gcc300 runs NetBSD
- gcc301 runs Alpine Linux
- gcc302 runs OpenBSD
- gcc303 runs FreeBSD
Please note that their resources are quite limited in terms of disk, CPU & memory, so make sure to use them responsibly.
The machines are located in Calgary, Canada. Many thanks to House Gordon Software Company for providing these virtual machines!
We are pleased to announce the availability of two new machines in the compile farm: gcc210 and gcc211 are Solaris zones, hosted on a M3000 server with a SPARC64 CPU. gcc210 runs Solaris 10, while gcc211 runs Solaris 11.
We just upgraded gcc67/68 to the latest BIOS/AGESA, added some settings to BIOS and kernel (see below) and restored ssh access to all accounts.
Please test to see if the ryzen machines are at last stable under cfarm usage patterns.
Note: we had to blacklist ccp kernel module on gcc68 so no kvm there. We'll restore it when new kernel is available with the proper fix for gcc68 processor.
PS: here are the instructions we followed:
For a long time these magic kernel command line parameters and similar tricks were the only workarounds available, at least to me. However there had long been rumors of a magic AMD provided magic firmware option that could work around the problem, generally exposed to you and me in a BIOS setting called 'Power Supply Idle Control', which you allegedly wanted to set to 'Typical current idle'. This apparently became available starting with AGESA 126.96.36.199a, which various motherboard vendors rolled into their overall BIOS at very different times. For bonus fun, apparently not all BIOS vendors even expose these AMD firmware settings, although enthusiast motherboards usually do.
(where N is the number of CPUs you have minus one.)
gcc202, a SPARC64 LDOM running Debian, had been offline since May because of a hardware issue.
The issue has been found and worked around: the machine is back online, but there may remain instabilities.
gcc10, a 24-cores Magny-Cours Opteron, had been offline for almost one year. It was running a very old version of Debian.
The machine has been reinstalled on Debian stretch and is now available again, many thanks to FSF France for the dedicated trip to the datacenter! User data has been preserved while reinstalling the system.