README for filedupe v1.0: What is it? =========== filedupe is a utility to assist with those times when you're acquiring a lot files from a less than sane source, which can't seen to avoid duplicating them constantly (e.g. usenet)... It keeps a DBM database of files it's seen, indexed by md5sum and file size. All things considered, I'm willing to miss that one file which has the same size and 128bit checksum as one I've already had... Compling ======== Before compiling, please edit filedupe.c and change DBNAME to be the path/filename where you'd actually like to keep your database. I made this a compile time definition because, well, it was a quick stupid hack, and it made the command like syntax brainless. To compile, just type make. It should work fine under a linux/glibc system, there's not much there, I imagine porting difficulties will be minimal. Usage ===== filedupe basically, the only things which should go on the commmand line are the names of directories to be dupechecked. filedupe will only work with regular files, all others will be happily skipped. The names of all files which are duplicates are printed to stdout. Any other informative messages go to stderr. Bugs ==== None seen yet. However, there is the chance that when it gets 2 files with the same md5sum, it could mangle the size check. I've not actually encountered a real-world condition in which this happens. Contact ======= I can be reached at for comments/patches/bugs, whatever. Copyright ========= filedupe.c and this README are Copyright 1998 Sam Creasey. They are copyrighted under the GNU Public Licence, v2 or newer. md5.c and md5.h are copyrighted by the Free Software Foundation, and are also covered by the GPL. The versions of md5.h and md5.c here were taken from the package textutils-1.22, available from your favorite GNU mirror. Consult the file COPYING for more information.