cfvers introduction

Iustin Pop

   $Id: manual.xml 222 2005-10-30 12:58:22Z iusty $

   Copyright © 2003, 2004, 2005 Iustin Pop, <iusty@k1024.org>

   This document explains the concept and usage of cfvers version
   0.5.4, a system tool designed to help with the versioning the
   configuration files on a system.
     _________________________________________________________

   Table of Contents
   1. About this document
   2. Introduction
   3. Installation

        3.1. Database configuration

   4. Quick start
   5. Concepts

        5.1. The repository
        5.2. The areas
        5.3. The items

              5.3.1. Regular vs. virtual items

        5.4. The entries
        5.5. The revisions

   6. Common operations

        6.1. Repository initalization
        6.2. Area related
        6.3. File operations

              6.3.1. Storing files
              6.3.2. Searching for files
              6.3.3. Retrieving files

        6.4. Handling deletions

   7. Limitations

        7.1. POSIX VFS layer limitations

1. About this document

   This is the usermanual for the cfvers project; homepage is at
   http://www.nongnu.org/cfvers/. You can also get new versions of this
   document there.

   Revision: $Id: manual.xml 222 2005-10-30 12:58:22Z iusty $
     _________________________________________________________

2. Introduction

   Making backup is an important aspect of system administration. The
   techniques of backing up data are explained in any good document
   about system administration, and they won't be explained here again.

   However, the text configuration files are more suited to versioning
   systems than to full/incremental backups which are targeted at
   binary files and miscellaneous data. Unfortunately, the versioning
   systems are not very good at working directly live on the system:
   the main reasons are creation of extra-files, inability to cope with
   special files and with keeping permissions intact.

   The working model of the classic versioning systems is one (or more)
   composed of a central repository (very precious) and a multitude of
   developer's workspaces, which hold semi-important data; by this I
   mean it's ok to delete or otherwise break a developer's workspace
   when no changes have been performed to it - all information can be
   restored from central repository.

   In contrast, a versioning system designed for system configuration
   has its priorities almost reversed: the critical issue is with the
   filesystem, and the repository is secondary to that. This means that
   such a software must obey the following rules:

     * keep the system's integrity: the software must not do anything
       to the filesystem it hasn't been asked to do
     * treat the meta-data of versioned items to be as important as the
       data
     * when in doubt about the success of the operation, abort rather
       than do damage on the workspace

   cfvers has been designed with these objectives in mind[1].
     _________________________________________________________

3. Installation

   There are three components which need installing:

     * the python library
     * the command line utilities, cfv and cfvadmin
     * the cfversd server and its configuration files

   If you don't run the server, you can run the cfv/cfvadmin scripts
   from the install directory, since it contains the python library and
   it will be picked from there. However, the recommended way is to
   install the python library in its proper place and the scripts to
   /usr/local/bin or /usr/bin.

   The default ./configure invocation will install all these in their
   location: scripts in bin, server in sbin and the library in
   lib/python2.3

   The configuration files needed by the server (in /etc/cfvers, if not
   overriden by command line arguments) are:

     * the logger configuration file, logging.cfg
     * the server configuration file, cfversd.conf

   Note that all these are needed for proper functioning. Also, before
   running the server, you should set up a proper environment (the Pyro
   library which is used in the server/clients can customize some
   variables only through environment variables). The most important
   one is PYRO_STORAGE. This variable should point to a writable
   directory used for temporary files. If it does not exist, Pyro will
   use the current directory (which could be even / for a daemon
   started from the init scripts). The other variable are not needed,
   but if you want to customize some parameters of the client-server
   communication, please see the Pyro documentation. Available settings
   include for example whether to use compression, how many connections
   to accept, etc.
     _________________________________________________________

3.1. Database configuration

   If you will use the sqlite backend, no customization is necessary.
   Just choose a writable file in a writable directory; writable by the
   user who will be accessing the database (this is the server in
   remote configurationa and the tools in local configurations).

   If you are using the postgresql backend, you need to create a
   database and (preferably) a separate user for the database. Remember
   the username and password as you will need to fill them in the
   configuration files.

   Also, for the postgresql backend, the --name argument to cfv find
   works only if you install the plpythonu server-side language and
   create the following function in the database:
CREATE OR REPLACE FUNCTION fnmatch (text, text) RETURNS boolean
LANGUAGE plpythonu AS '
import fnmatch
return fnmatch.fnmatch(args[0], args[1])
';
     _________________________________________________________

4. Quick start

   How to create your first repository
    1.
         a. decide wheter to use a client-server setup or direct access
            to the repository (this can be also remote, in case of
            postgresql)
         b. decide on which back-end to use (either sqlite or
            postgresql for now)
    2. Based on the above answers, create the configuration files.
          + local repository, sqlite; just create the configuration
            file ~/.cfvers:
[server]
server_type=local
repo_meth=sqlite
repo_data=/path/to/file.db
area=default
          + local repository, postgresql (first create a postgresql
            database).
[server]
server_type=local
repo_meth=postgresql
repo_data=dbname=mydb user=myuser password=mypass
area=default
          + remote repository;create the server configuration file
            (e.g. /etc/cfvers/cfversd.conf):
               o for sqlite:
[server]
port = 9999
pidfile = /var/run/cfvers/cfversd.pid

[repository]
method=sqlite
connect=/var/lib/cfvers/database

[auth]
users=user1

[user_user1]
client_password=cpw
server_password=spw
valid_from=127.0.0.1,192.168.0.2
areas=default
admin=true
               o for postgresql:
[server]
port = 9999
pidfile = /var/run/cfvers/cfversd.pid

[repository]
method=postgresql
connect=dbname=mydb user=myuser password=mypass

[auth]
users=user1

[user_user1]
client_password=cpw
server_password=spw
valid_from=127.0.0.1,192.168.0.2
areas=default
admin=true
            then create the client configuration file (~/.cfvers):
[server]
server_type=remote
host=192.168.0.1
port=9999
username=user1
client_password=cpw
server_password=spw
area=default
       then start the server: /usr/sbin/cfversd -c
       /etc/cfvers/cfversd.conf
    3. run cfvadmin --init in order to create the initial repository.
    4. run cfv add ITEMS... in order to register the items you want
       versioned.
    5. run cfv store in order to store the first version.
    6. after every change to the system's configuration, rerun the
       cfvers store command in order to update the versioned items. New
       items you want stored must be given in a separate call (cfvers
       add).
    7. schedule a cron job to watch for differences or do automatic
       commits.
     _________________________________________________________

5. Concepts

   I tried to keep cfvers as simple as possible. But I don't think I
   succeeded.
     _________________________________________________________

5.1. The repository

   The repository is where the files are stored. The repository is
   manipulated using the cfvadmin command.

   Right now, there are two backends implemented for the repository:
   postgresql-based and sqlite-based. The sqlite backend is very useful
   for small or standalone installations.
     _________________________________________________________

5.2. The areas

   The repository contains areas in which files are stored; this allows
   to store files from different servers in the same repository. A
   repository must contain at least one area in order to be able to
   contain files. The areas are created with the cfvadmin create
   command and displayed with cfvadmin info.

   An area has the following attributes:

   name
          The name of the area; you use this when referring to the area
          from the client, either in configuration files or with the -a
          option to the cfv command

   root
          The root path on the filesystem for the files contained in
          this area; this allows you to define for example areas for
          chroot jails and refer to the files in the area using the
          path in the chroot.

          Default value: /

   description
          A text describing the area, anything you like

   ctime
          The creation time of the area
     _________________________________________________________

5.3. The items

   The files to be versioned are represented by items. Note that an
   item doesn't contain actual file information, it represents the
   intent to track a file.

   The attributes of an item:

   name
          The filename which this item represents; this is what will be
          tracked by cfvers;

   flags
          The entries of an item are affected by the item's flag
          attribute. Currently, the flags can affect the following:

          + Amount of information to store. An entry can store for a
            file:
               o metadata (name, type, size,
                 access/creation/modification times, owner/group, etc.)
               o checksum of the contents (for regular files, symbolic
                 links and directories)
               o file contents (for regular files, symbolic links and
                 directories)
            An entry can store only metadata, metadata and checksum, or
            all information about a file. This is selected at
            registration time using cfv add --store=level command,
            where level is one of metadata, checksum, full.
          + The kind of the item:
               o Regular file: if the flags is one of metadata,
                 checksum or contents, the file will be stored as a
                 regular file.
               o Virtual file: if the flags is virtual, the file will
                 be stored as a virtual file.

   ctime
          Creation time (=registration time) for this item.

   area
          The area to which this item belongs.

   command
          If the item is a virtual one, this is the command line used
          to generate the contents.
     _________________________________________________________

5.3.1. Regular vs. virtual items

   Usually you will want to track regular files. This is acomplished by
   defining an item with a certain name and that name will be used as
   the name of the file to store in the repository.

   However, there is another posibility: a virtual file. A virtual file
   is one whose contents is taken from the output of a command, not
   from a file in the filesystem. This can be useful for versioning
   system state, for example: partition tables, either as dd
   if=/dev/hda bs=512 count=1 or as sfdisk -d /dev/hda, system hardware
   configuration, as lspci -v, etc.

   The command attribute of the item is used to generate the contents
   of the file. For the moment, both the standard output and the
   standard error are saved together. The exit code of the command is
   saved in the entry's exitcode attribute.
     _________________________________________________________

5.4. The entries

   An entry represents the information about an item at a certain point
   in time.

   The properties of an entry can be split into two group: own
   attributes and the attributes of the file it represents. Its own
   attributes are:

   item
          The item to which this entry belongs

   revno
          The revision number of the revision this entry belongs

   status
          The status of this entry, meaning what kind of change to the
          file it represents. Currently, it can take one of the
          following values:

          + A - the entry represents the addition of an item to the
            area; it does not have any other contents (i.e. the file
            properties haven't been stored yet)
          + M - modified; this is a regular entry about a file being
            update
          + D - deleted; this is an entry about a file which can no
            longer be found in the filesystem; see Section 6.4 for more
            details about deletions

   If the entry has the status "M", the file properties will contain:

   filetype, size, mode, atime, mtime, ctime, inode, device, nlink,
          uid/gid, uname/gname, rdev, blocks, blksize
          metadata properties of the file

   sha1sum
          the checksum of the file contents; applicable to regular
          files, symbolic links and directories;

   filecontents
          the file contents; applicable to regulare files, symbolic
          links and directories; for directories, the contents is the
          list of filenames separated by newlines
     _________________________________________________________

5.5. The revisions

   A revision groups togheter entries which represent the state of the
   items tracked at a certain moment in time.

   area
          The area to thich revision belongs.

   revno
          The revision number of this revision.

   server
          The server on which this revision was made.

   logmsg
          The log message.

   ctime
          The creation time of this revision.

   uid, uname, gid, gname
          The numeric and textual representation of the credentials of
          the process which created this revision.

   commiter
          A textual description of the person or process of this
          revision; useful when the revision are made from root but you
          need a more detailed description.
     _________________________________________________________

6. Common operations

6.1. Repository initalization

   This should be done only once, otherwise it destroys your data.

   Example 1. Repository initialization
$ cfvadmin init

   Example 2. Forced repository initalization
$ cfvadmin init --force
     _________________________________________________________

6.2. Area related

   Generally, you only work with areas at the initial setup of your
   repositories, or when adding new servers to the setup. There are
   only two operations posibile on area: creation of a new area and
   displaying area information.

   Example 3. Area creation
$ cfvadmin create -d "my area" -p / area45


   Example 4. Displaying area information
$ cfvadmin info
Local repository has 1 area(s)
-------------------------
Name: default
Created at 2004-09-26 04:49:03+0300
Root path: /
Description: Default area
Revision number: 2
Number of items: 102529
$
     _________________________________________________________

6.3. File operations

   The item/entry operations can be split roughly in three groups:

   storing files
   searching for files
   retrieving files
     _________________________________________________________

6.3.1. Storing files

   The first step in order to track a file is to register it with the
   system:

   Example 5. Registering files
$ cfv add -m "Log message" /etc/passwd /etc/group /etc/hostname
Status: Added, revision 1
Time begin: 2004-09-26 15:35:02 EEST
Time end:   2004-09-26 15:35:03 EEST
Total skipped (error): 0
Total registered: 3
Total skipped (item already registered): 0
Total skipped (invalid name): 0
$

   Then you need to actually order the system to store the contents of
   those files:

   Example 6. Storing files
$ ./cfv store -m "Stored files"
Status: Stored revision 2
Time begin: 2004-09-26 15:37:01 EEST
Time end:   2004-09-26 15:37:02 EEST
Total stored: 3
Total skipped (not changed): 0
Total skipped (error): 0
Total skipped (not registered): 0
Total marked deleted: 0
$

   This is all there is to storing files.
     _________________________________________________________

6.3.2. Searching for files

   You can make two kinds of searches: for files with a certain
   attributes, or for files for which the filesystem is not in sync
   with the repository.

   Example 7. Search files by attribute
$ cfv find --name passwd -l
-rw-r--r--     2 root     root           92 2004-04-30 00:32:04 /etc/pa
m.d/passwd
-rw-r--r--     2 root     root         1594 2004-07-20 23:01:57 /etc/pa
sswd
$ cfv find --regex '.*[a-k]nes[^/]'
/etc/X11/xkb/geometry/kinesis
/etc/gconf/schemas/glines.schemas
/etc/snmp/mib2c.column_defines.conf
/etc/xpdf/xpdfrc-japanese
$ cfv find --size '>' 950000 -d
-------------------------
Entry for /etc/gconf/schemas/gnome-terminal.schemas
File registerd at: 2004-09-26T15:45:18+0
Available revisions: 2

-------------------------
Entry for /etc/gconf/schemas/metacity.schemas
File registerd at: 2004-09-26T15:45:18+0
Available revisions: 2

$

   Example 8. Searching for modified files
s$ ./cfv diff -l
/tmp/a
$ ./cfv diff
===== Item /tmp/a (rev 2 -> current)
File contents:
--- /tmp/a Sun Sep 26 15:59:05 2004 (rev 2)
+++ /tmp/a Sun Sep 26 15:59:17 2004 (current)
@@ -1,1 +1,1 @@
-Sun Sep 26 15:59:05 EEST 2004
+Test


Attribute mtime:
- 2004-09-26 15:59:05 EEST
+ 2004-09-26 15:59:17 EEST

Attribute ctime:
- 2004-09-26 15:59:05 EEST
+ 2004-09-26 15:59:17 EEST

Attribute size:
- 30
+ 5

Attribute sha1sum:
- dc926ccb39a0c823680bdfeefe59057a6af727fc
+ 1c68ea370b40c06fcaf7f26c8b1dba9d9caf5dea

$ ./cfv diff -l -c mtime
/tmp/a
$
     _________________________________________________________

6.3.3. Retrieving files

   Once you have found the files you want to retrieve, there are
   several things you can do with them:

     * restore them to the filesystem
     * display their contents
     * display information about their metadata (like stat)
     * export them in a tar archive
     * create a checksum file (SHA1SUM) for external tools to check

   Example 9. Retrieving files
$ cfv retrieve /tmp/a
Total retrieved (fully): 1
$ cfv cat /tmp/a
Sun Sep 26 15:59:05 EEST 2004
$ cfv export -Ftar -o /tmp/x.tar
$ cfv export -Fsha1sum
dc926ccb39a0c823680bdfeefe59057a6af727fc  tmp/a
$
     _________________________________________________________

6.4. Handling deletions

   When a file which is tracked has been removed from the filesystem,
   cfvers will notice this at the next store command and will register
   this deletion. The item in question will be displayed (by default)
   in the output of the command. Then, as long as the file hasn't been
   recreated, cfvers will ignore it. As soon as the file exists again,
   it will be tracked normally.

   The deletion of a file is registered as an entry with status "D" in
   the repository. When it appears again, it will have a new status "M"
   entry.
     _________________________________________________________

7. Limitations

   This section should be very big. It's small because I didn't have
   time to fill it, not because cfvers is complete :-)
     _________________________________________________________

7.1. POSIX VFS layer limitations

   These are limitations or design decisions inherent to the POSIX
   specification or the GNU/Linux implementation. While developing
   cfvers, I found:

     * You can't change the ctime of an inode. This is by design in the
       POSIX filesystem layer: the ctime is for metadata modifications,
       and the mtime/atime pair for data write/read accesses. Thus a
       ctime modification would trigger a ctime modification, since the
       ctime itself is part of metadata, rendering useless the ctime
       modification :). A read attribute for the metadata would be
       innapropriate, I think, because such reads are made in a great
       amount.
     * utimes(2) and chmod(2) acts on the destination of a symlink
       (when given an argument which is a symlink). I can't think why
       anyone would like this (you could always expand the symlink
       using readlink, but right now you can't act on the symlink!).

  Notes

   [1]

   However, nobody said it attained these goals - after all, it
   software!