Transparent public YUM repository caching

03.05.2012.

When system administrators maintain a large number of Linux based installations, it becomes quite demanding to keep them up to date and to enable quick installation of the new software packages. With the proliferation of virtualization solutions, a number of installations grows over time. For example, an installation of a new package under Linux can require many updates of dependent packages. Also the time to update a machine (whether virtual or physical) becomes very dependent on the available internet bandwidth, because even minor releases of operating systems measure several hundred megabytes. The net result: user frustration because of the slow Internet access, administrator’s frustration because of the long installation time.

Transparent mirroring

We took the inspiration from Red Hat Network Satellite and associated Smart Management Add-on for Red Hat Enterprise Linux, which (besides the package caching and management described in this blog) introduce central administration, management, provisioning and monitoring for enterprise installations. Many Red Hat Enterprise Linux derivatives (such as Fedora, CentOS, Scientific Linux, ClearOS etc.) use the same package management based on Yum repositories. The idea is to synchronize one machine to the public mirror YUM repository and then use the local mirror to synchronize all other machines locally. An additional twist in the described approach is that no change is required on the local installations. The mirroring mechanism described here is “transparent” to the clients. In our environment, it was especially important to avoid reconfiguration of all virtual machines and to support capability to perform software installations and updates even in cases when they are migrated to external location.

The main advantages of using this approach compared to local caching proxy such as squid are:
• Scheduled prefetching of packages during non-working hours
• Avoids size limitations usually configured by caching servers
• Represents a permanent cache that can be maintained indefinitely and independently of the rest of HTTP caching infrastructure
• Allows adding local packages with no updates to the local servers (but requires patching of package list)

What to mirror?

Most of the installations use primarily packages from “os” and “updates” repositories, depending on administrator’s preferences and business needs. Custom installations often use extra packages from other public repositories, such as EPEL, RPMFORGE, etc. However, the additional public repositories are not so heavily used when compared to repositories carrying the packages which are part of the operating system installation. Therefore we chose the repository which contains the most of the packages used in target environment to minimize the needed disk space usage and the network traffic. Caching only “os” and “updates” repositories makes a huge difference. Note that the described mechanism cannot be used for mirroring Red Hat Enterprise Linux installation packages. For them, it is possible to purchase separate products (Red Hat Network Proxy or Red Hat Network Satellite), that provide cost-effective solution for installation management.

Protocol choice

HTTP, RSYNC and FTP protocols are at disposal on public mirror sites. Choosing a mirror with high bandwidth and geographically close to the installation site is also recommended.
For FTP synchronization I recommend using lftp which has advanced features like bandwidth limiting and mirroring, or wget. It is a powerful tool which can mirror a site (-m option), download from FTP sites and many, many more. Working with *NIX system surely got you familiar with rsync command and protocol. Since most of the content added to package repositories is a new content, it is important to configure the synchronization tools to run in the “mirroring mode” in order to avoid the redundant data transfers during local mirror updates.

How it works?

Once the mirror site and protocol are chosen, copy all of the packages to a local directory which is the same directory structure as the original. To get all the base OS packages fast, copy them from the CD/DVD, no need to download them. This directory can be locally shared using, for example, Apache web server, NFS or Samba. The easiest way is to use Apache web server, since the default protocol in YUM repos is HTTP.

Mirrorlist statement from /etc/yum.repos.d/CentOS-Base.repo looks like this:

mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os

We changed local DNS server entry for mirrorlist.centos.org to point to the local YUM syncing server. On the syncing server (local mirror), the additional entry to /etc/hosts on YUM syncing server to point to the original mirror list. This way we made access to local mirror transparent to all local CentOS machines. The next step is to reconfigure Apache web server so that the updating will be transparent to end users. Note that you also need to address CGI, virtual servers and security settings, and update SELinux contexts after the change is made.

The last piece is a PHP script which returns appropriate path for targeted release version, architecture and repository (?release=$releasever&arch= $basearch&repo=os) and returns default site list for the repositories which aren’t mirrored. For the mirrored repositories the script must return the path to the local directories that are shared. We used LWP::Simple and CGI PHP modules to accomplish this:

#!/usr/bin/perl
use LWP::Simple qw(!head);
use CGI qw(:standard);

print "Content-type: text/plain; charset=iso-8859-1\n\n";
my $cgi = CGI->new();
my $release = $cgi->param( 'release' );
my $release_s = $release;
$release_s =~ s/(\d)(.+)?/$1/;
my $arch = $cgi->param( 'arch' );
my $repo = $cgi->param( 'repo' );

# print "$release $release_s $arch $repo";

# check for custom repor (cache)
if ($repo =~ /^(os|updates)$/) {
        if (($release_s . $arch) =~ /6i386|x6x86_64/) {
                print "http://yum-sync.domain.example.com/cachedir/$release_s/$arch/$repo\n";
                exit 0;
        }
# fallback to original
my $url = "http://mirrorlist.centos.org/?release=$release&arch=$arch&repo=$repo";
#print $url . "\n";
getprint($url);

Once yum update is run on your machines you’ll notice the package download speed is flashing fast.

Now your CentOS/SL package updates have been centralized and network traffic has been successfully lowered.

Putting the YUM syncing script into /etc/crontab to run in regular intervals makes the local mirror up to date and enables administrator to control the used bandwidth and timing of the mirroring job.