cfvers introduction Iustin Pop $Id: manual.xml 222 2005-10-30 12:58:22Z iusty $ Copyright © 2003, 2004, 2005 Iustin Pop, This document explains the concept and usage of cfvers version 0.5.4, a system tool designed to help with the versioning the configuration files on a system. _________________________________________________________ Table of Contents 1. About this document 2. Introduction 3. Installation 3.1. Database configuration 4. Quick start 5. Concepts 5.1. The repository 5.2. The areas 5.3. The items 5.3.1. Regular vs. virtual items 5.4. The entries 5.5. The revisions 6. Common operations 6.1. Repository initalization 6.2. Area related 6.3. File operations 6.3.1. Storing files 6.3.2. Searching for files 6.3.3. Retrieving files 6.4. Handling deletions 7. Limitations 7.1. POSIX VFS layer limitations 1. About this document This is the usermanual for the cfvers project; homepage is at http://www.nongnu.org/cfvers/. You can also get new versions of this document there. Revision: $Id: manual.xml 222 2005-10-30 12:58:22Z iusty $ _________________________________________________________ 2. Introduction Making backup is an important aspect of system administration. The techniques of backing up data are explained in any good document about system administration, and they won't be explained here again. However, the text configuration files are more suited to versioning systems than to full/incremental backups which are targeted at binary files and miscellaneous data. Unfortunately, the versioning systems are not very good at working directly live on the system: the main reasons are creation of extra-files, inability to cope with special files and with keeping permissions intact. The working model of the classic versioning systems is one (or more) composed of a central repository (very precious) and a multitude of developer's workspaces, which hold semi-important data; by this I mean it's ok to delete or otherwise break a developer's workspace when no changes have been performed to it - all information can be restored from central repository. In contrast, a versioning system designed for system configuration has its priorities almost reversed: the critical issue is with the filesystem, and the repository is secondary to that. This means that such a software must obey the following rules: * keep the system's integrity: the software must not do anything to the filesystem it hasn't been asked to do * treat the meta-data of versioned items to be as important as the data * when in doubt about the success of the operation, abort rather than do damage on the workspace cfvers has been designed with these objectives in mind[1]. _________________________________________________________ 3. Installation There are three components which need installing: * the python library * the command line utilities, cfv and cfvadmin * the cfversd server and its configuration files If you don't run the server, you can run the cfv/cfvadmin scripts from the install directory, since it contains the python library and it will be picked from there. However, the recommended way is to install the python library in its proper place and the scripts to /usr/local/bin or /usr/bin. The default ./configure invocation will install all these in their location: scripts in bin, server in sbin and the library in lib/python2.3 The configuration files needed by the server (in /etc/cfvers, if not overriden by command line arguments) are: * the logger configuration file, logging.cfg * the server configuration file, cfversd.conf Note that all these are needed for proper functioning. Also, before running the server, you should set up a proper environment (the Pyro library which is used in the server/clients can customize some variables only through environment variables). The most important one is PYRO_STORAGE. This variable should point to a writable directory used for temporary files. If it does not exist, Pyro will use the current directory (which could be even / for a daemon started from the init scripts). The other variable are not needed, but if you want to customize some parameters of the client-server communication, please see the Pyro documentation. Available settings include for example whether to use compression, how many connections to accept, etc. _________________________________________________________ 3.1. Database configuration If you will use the sqlite backend, no customization is necessary. Just choose a writable file in a writable directory; writable by the user who will be accessing the database (this is the server in remote configurationa and the tools in local configurations). If you are using the postgresql backend, you need to create a database and (preferably) a separate user for the database. Remember the username and password as you will need to fill them in the configuration files. Also, for the postgresql backend, the --name argument to cfv find works only if you install the plpythonu server-side language and create the following function in the database: CREATE OR REPLACE FUNCTION fnmatch (text, text) RETURNS boolean LANGUAGE plpythonu AS ' import fnmatch return fnmatch.fnmatch(args[0], args[1]) '; _________________________________________________________ 4. Quick start How to create your first repository 1. a. decide wheter to use a client-server setup or direct access to the repository (this can be also remote, in case of postgresql) b. decide on which back-end to use (either sqlite or postgresql for now) 2. Based on the above answers, create the configuration files. + local repository, sqlite; just create the configuration file ~/.cfvers: [server] server_type=local repo_meth=sqlite repo_data=/path/to/file.db area=default + local repository, postgresql (first create a postgresql database). [server] server_type=local repo_meth=postgresql repo_data=dbname=mydb user=myuser password=mypass area=default + remote repository;create the server configuration file (e.g. /etc/cfvers/cfversd.conf): o for sqlite: [server] port = 9999 pidfile = /var/run/cfvers/cfversd.pid [repository] method=sqlite connect=/var/lib/cfvers/database [auth] users=user1 [user_user1] client_password=cpw server_password=spw valid_from=127.0.0.1,192.168.0.2 areas=default admin=true o for postgresql: [server] port = 9999 pidfile = /var/run/cfvers/cfversd.pid [repository] method=postgresql connect=dbname=mydb user=myuser password=mypass [auth] users=user1 [user_user1] client_password=cpw server_password=spw valid_from=127.0.0.1,192.168.0.2 areas=default admin=true then create the client configuration file (~/.cfvers): [server] server_type=remote host=192.168.0.1 port=9999 username=user1 client_password=cpw server_password=spw area=default then start the server: /usr/sbin/cfversd -c /etc/cfvers/cfversd.conf 3. run cfvadmin --init in order to create the initial repository. 4. run cfv add ITEMS... in order to register the items you want versioned. 5. run cfv store in order to store the first version. 6. after every change to the system's configuration, rerun the cfvers store command in order to update the versioned items. New items you want stored must be given in a separate call (cfvers add). 7. schedule a cron job to watch for differences or do automatic commits. _________________________________________________________ 5. Concepts I tried to keep cfvers as simple as possible. But I don't think I succeeded. _________________________________________________________ 5.1. The repository The repository is where the files are stored. The repository is manipulated using the cfvadmin command. Right now, there are two backends implemented for the repository: postgresql-based and sqlite-based. The sqlite backend is very useful for small or standalone installations. _________________________________________________________ 5.2. The areas The repository contains areas in which files are stored; this allows to store files from different servers in the same repository. A repository must contain at least one area in order to be able to contain files. The areas are created with the cfvadmin create command and displayed with cfvadmin info. An area has the following attributes: name The name of the area; you use this when referring to the area from the client, either in configuration files or with the -a option to the cfv command root The root path on the filesystem for the files contained in this area; this allows you to define for example areas for chroot jails and refer to the files in the area using the path in the chroot. Default value: / description A text describing the area, anything you like ctime The creation time of the area _________________________________________________________ 5.3. The items The files to be versioned are represented by items. Note that an item doesn't contain actual file information, it represents the intent to track a file. The attributes of an item: name The filename which this item represents; this is what will be tracked by cfvers; flags The entries of an item are affected by the item's flag attribute. Currently, the flags can affect the following: + Amount of information to store. An entry can store for a file: o metadata (name, type, size, access/creation/modification times, owner/group, etc.) o checksum of the contents (for regular files, symbolic links and directories) o file contents (for regular files, symbolic links and directories) An entry can store only metadata, metadata and checksum, or all information about a file. This is selected at registration time using cfv add --store=level command, where level is one of metadata, checksum, full. + The kind of the item: o Regular file: if the flags is one of metadata, checksum or contents, the file will be stored as a regular file. o Virtual file: if the flags is virtual, the file will be stored as a virtual file. ctime Creation time (=registration time) for this item. area The area to which this item belongs. command If the item is a virtual one, this is the command line used to generate the contents. _________________________________________________________ 5.3.1. Regular vs. virtual items Usually you will want to track regular files. This is acomplished by defining an item with a certain name and that name will be used as the name of the file to store in the repository. However, there is another posibility: a virtual file. A virtual file is one whose contents is taken from the output of a command, not from a file in the filesystem. This can be useful for versioning system state, for example: partition tables, either as dd if=/dev/hda bs=512 count=1 or as sfdisk -d /dev/hda, system hardware configuration, as lspci -v, etc. The command attribute of the item is used to generate the contents of the file. For the moment, both the standard output and the standard error are saved together. The exit code of the command is saved in the entry's exitcode attribute. _________________________________________________________ 5.4. The entries An entry represents the information about an item at a certain point in time. The properties of an entry can be split into two group: own attributes and the attributes of the file it represents. Its own attributes are: item The item to which this entry belongs revno The revision number of the revision this entry belongs status The status of this entry, meaning what kind of change to the file it represents. Currently, it can take one of the following values: + A - the entry represents the addition of an item to the area; it does not have any other contents (i.e. the file properties haven't been stored yet) + M - modified; this is a regular entry about a file being update + D - deleted; this is an entry about a file which can no longer be found in the filesystem; see Section 6.4 for more details about deletions If the entry has the status "M", the file properties will contain: filetype, size, mode, atime, mtime, ctime, inode, device, nlink, uid/gid, uname/gname, rdev, blocks, blksize metadata properties of the file sha1sum the checksum of the file contents; applicable to regular files, symbolic links and directories; filecontents the file contents; applicable to regulare files, symbolic links and directories; for directories, the contents is the list of filenames separated by newlines _________________________________________________________ 5.5. The revisions A revision groups togheter entries which represent the state of the items tracked at a certain moment in time. area The area to thich revision belongs. revno The revision number of this revision. server The server on which this revision was made. logmsg The log message. ctime The creation time of this revision. uid, uname, gid, gname The numeric and textual representation of the credentials of the process which created this revision. commiter A textual description of the person or process of this revision; useful when the revision are made from root but you need a more detailed description. _________________________________________________________ 6. Common operations 6.1. Repository initalization This should be done only once, otherwise it destroys your data. Example 1. Repository initialization $ cfvadmin init Example 2. Forced repository initalization $ cfvadmin init --force _________________________________________________________ 6.2. Area related Generally, you only work with areas at the initial setup of your repositories, or when adding new servers to the setup. There are only two operations posibile on area: creation of a new area and displaying area information. Example 3. Area creation $ cfvadmin create -d "my area" -p / area45 Example 4. Displaying area information $ cfvadmin info Local repository has 1 area(s) ------------------------- Name: default Created at 2004-09-26 04:49:03+0300 Root path: / Description: Default area Revision number: 2 Number of items: 102529 $ _________________________________________________________ 6.3. File operations The item/entry operations can be split roughly in three groups: storing files searching for files retrieving files _________________________________________________________ 6.3.1. Storing files The first step in order to track a file is to register it with the system: Example 5. Registering files $ cfv add -m "Log message" /etc/passwd /etc/group /etc/hostname Status: Added, revision 1 Time begin: 2004-09-26 15:35:02 EEST Time end: 2004-09-26 15:35:03 EEST Total skipped (error): 0 Total registered: 3 Total skipped (item already registered): 0 Total skipped (invalid name): 0 $ Then you need to actually order the system to store the contents of those files: Example 6. Storing files $ ./cfv store -m "Stored files" Status: Stored revision 2 Time begin: 2004-09-26 15:37:01 EEST Time end: 2004-09-26 15:37:02 EEST Total stored: 3 Total skipped (not changed): 0 Total skipped (error): 0 Total skipped (not registered): 0 Total marked deleted: 0 $ This is all there is to storing files. _________________________________________________________ 6.3.2. Searching for files You can make two kinds of searches: for files with a certain attributes, or for files for which the filesystem is not in sync with the repository. Example 7. Search files by attribute $ cfv find --name passwd -l -rw-r--r-- 2 root root 92 2004-04-30 00:32:04 /etc/pa m.d/passwd -rw-r--r-- 2 root root 1594 2004-07-20 23:01:57 /etc/pa sswd $ cfv find --regex '.*[a-k]nes[^/]' /etc/X11/xkb/geometry/kinesis /etc/gconf/schemas/glines.schemas /etc/snmp/mib2c.column_defines.conf /etc/xpdf/xpdfrc-japanese $ cfv find --size '>' 950000 -d ------------------------- Entry for /etc/gconf/schemas/gnome-terminal.schemas File registerd at: 2004-09-26T15:45:18+0 Available revisions: 2 ------------------------- Entry for /etc/gconf/schemas/metacity.schemas File registerd at: 2004-09-26T15:45:18+0 Available revisions: 2 $ Example 8. Searching for modified files s$ ./cfv diff -l /tmp/a $ ./cfv diff ===== Item /tmp/a (rev 2 -> current) File contents: --- /tmp/a Sun Sep 26 15:59:05 2004 (rev 2) +++ /tmp/a Sun Sep 26 15:59:17 2004 (current) @@ -1,1 +1,1 @@ -Sun Sep 26 15:59:05 EEST 2004 +Test Attribute mtime: - 2004-09-26 15:59:05 EEST + 2004-09-26 15:59:17 EEST Attribute ctime: - 2004-09-26 15:59:05 EEST + 2004-09-26 15:59:17 EEST Attribute size: - 30 + 5 Attribute sha1sum: - dc926ccb39a0c823680bdfeefe59057a6af727fc + 1c68ea370b40c06fcaf7f26c8b1dba9d9caf5dea $ ./cfv diff -l -c mtime /tmp/a $ _________________________________________________________ 6.3.3. Retrieving files Once you have found the files you want to retrieve, there are several things you can do with them: * restore them to the filesystem * display their contents * display information about their metadata (like stat) * export them in a tar archive * create a checksum file (SHA1SUM) for external tools to check Example 9. Retrieving files $ cfv retrieve /tmp/a Total retrieved (fully): 1 $ cfv cat /tmp/a Sun Sep 26 15:59:05 EEST 2004 $ cfv export -Ftar -o /tmp/x.tar $ cfv export -Fsha1sum dc926ccb39a0c823680bdfeefe59057a6af727fc tmp/a $ _________________________________________________________ 6.4. Handling deletions When a file which is tracked has been removed from the filesystem, cfvers will notice this at the next store command and will register this deletion. The item in question will be displayed (by default) in the output of the command. Then, as long as the file hasn't been recreated, cfvers will ignore it. As soon as the file exists again, it will be tracked normally. The deletion of a file is registered as an entry with status "D" in the repository. When it appears again, it will have a new status "M" entry. _________________________________________________________ 7. Limitations This section should be very big. It's small because I didn't have time to fill it, not because cfvers is complete :-) _________________________________________________________ 7.1. POSIX VFS layer limitations These are limitations or design decisions inherent to the POSIX specification or the GNU/Linux implementation. While developing cfvers, I found: * You can't change the ctime of an inode. This is by design in the POSIX filesystem layer: the ctime is for metadata modifications, and the mtime/atime pair for data write/read accesses. Thus a ctime modification would trigger a ctime modification, since the ctime itself is part of metadata, rendering useless the ctime modification :). A read attribute for the metadata would be innapropriate, I think, because such reads are made in a great amount. * utimes(2) and chmod(2) acts on the destination of a symlink (when given an argument which is a symlink). I can't think why anyone would like this (you could always expand the symlink using readlink, but right now you can't act on the symlink!). Notes [1] However, nobody said it attained these goals - after all, it software!