Tuesday, March 31, 2015

Populating a local RPM repository on demand.

My work internet connection is a pitiful dual T1 back to a central office.  What year are we living in?  The central office has local mirrors of some Linux distributions, but I am only accessing it over that tiny straw of a connection.  To avoid clogging the pipe, a local cache would cut down the traffic to update multiple machine.

This post is a dump of the config for future reference.  I originally found https://wiki.parabola.nu/Mirroring_On_Demand and modified it to work with Fedora.  Specifically Fedora 19 as the host to mirror the Fedora 19 repos.  You'll see nothing is Fedora specific.

The client boxes have their /etc/yum.repos.d/*.repo files point like so:
baseurl=http://cache.example.com/pub/fedora/linux/releases/$releasever/Everything/$basearch/os/

Things get a little convoluted.

mirror-east.example.com mirrors the regular /pub/fedora repository hierarchy. In actuality, mirror-east.example.com redirects to www.storage-array.example.com for the actual files... in a new subdirectory mirrors/sites/fedora. I think it was a HTTP 302 redirect.

To help with this, I symlinked the hierarchies together.
$ ls -l /srv/www/mirror.example.com/mirrors/
lrwxrwxrwx. 1 nginx nginx 6 Oct 15  2013 sites -> ../pub

This way files all end up under pub, even if they were fetched from the storage array. It avoids needing to rewrite the save pathnames.

Yum always requests /pub/fedora, but files in that path will not have been cached. The actual files were saved under /mirrors/sites/fedora. With the symlink, re-requested files are found under the /pub/fedora path.

I believe the only changes to /etc/nginx/nginx.conf were the following proxy_cache_path (and maybe proxy_temp_path) directives. The thought was to set a time limit for the repo metadata since cached versions of those would be invalidated overnight with the source mirror's rsync update. Repo data was set to expire in 16 (or 12?) hours - long enough that is was only fetched once per work day.

Config files:
 
/etc/nginx/nginx.conf

http {
    ... snip ...
    proxy_cache_path  /var/lib/nginx/tmp/repodata keys_zone=repodata:10m inactive=960m;
    proxy_temp_path   /srv/nginx/tmp;
    proxy_cache_path  /srv/nginx/cache  levels=1:2 keys_zone=cache_repodata:256m inactive=1d max_size=1g;
    ... snip ...
}


/etc/nginx/conf.d/repo-proxy.conf

# Frontend server for the mirror
upstream mirror-east {
    server mirror-east.example.com;
}

# Storage array with the files
upstream storage-array {
    server www.storage-array.example.com;
}

# Our mirror
server {
    listen       80;
    server_name  mirror.example.com mirror-east.example.com cache.example.com "";
    # access_log  off;
    # error_log off;
    root /srv/www/mirror.example.com;
    autoindex on;

    # Minimally cache databases
    location ~ repodata/.*$ {
        expires 16h;
        rewrite ^/pub/(.*)$ /mirrors/sites/$1 break;
        rewrite_log on;
        error_page 403 404 = @redir;
    }

    # Retrieve actual files.
    location /pub {
        expires 7d;
        rewrite ^/pub/(.*)$ /mirrors/sites/$1 break;
        rewrite_log on;
        error_page 403 404 = @get;
    }

    # Bogus location that redirects queries
    # Pass it to the repo upstream
    # We trick upstream into serving the main repo subdomain
    # Store the files in this format
    # Give them 664 permissions
    location @get {
        proxy_pass http://storage-array;
        proxy_set_header Host www.storage-array.example.com;
        proxy_store /srv/www/mirror.example.com$request_uri;
        proxy_store_access user:rw group:rw all:r;
    }

    # Bogus location for metadata.
    location @redir {
        proxy_cache cache_repodata;
        proxy_cache_valid 12h;
        proxy_pass http://storage-array;
        proxy_set_header Host www.storage-array.example.com;
        #proxy_store /srv/www/mirror.example.com$request_uri;
        #proxy_store_access user:rw group:rw all:r;
        expires 13h;
    }
}

You probably have to create the directories:
/srv/www/mirror.example.com/{pub,mirrors}
and then the symlink:
/srv/www/mirror.example.com/mirrors/sites -> ../pub

SELinux needs proper labeling.
I used system_u:object_r:httpd_sys_content_t:s0 for read only top level /srv/www/mirror.example.com and system_u:object_r:httpd_sys_rw_content_t:s0 for the populating subdirectory.  Originally I modified some boolean (I think...), but that seems to have been lost on reboot.
$ ls -ldZ /srv/www/mirror.example.com{,/pub}
drwxrwxr-x. root  nginx system_u:object_r:httpd_sys_content_t:s0 /srv/www/mirror.example.com
drwxrwxr-x. nginx nginx system_u:object_r:httpd_sys_rw_content_t:s0 /srv/www/mirror.example.com/pub

This scheme does not clean up after itself. i.e. updates don't evict the previous version. The following command will clear out old package revisions.
repomanage --old .

No comments: