and

                       IExtract
                       ========


Utility to extract the description out of certain file-types
Copyright (C) 2002 - 2007 Markus Schwab <g17m0@lycos.com>
-----------------------------------------------------------------------


At the moment the following documents are supported (unless disabled while
configuring).

  - PNG: PNG images can contain text-chunks with keyword-value pairs. The
         content of the keywords "Title", "Author" and "Description" is
         extracted.


  - GIF: GIF images can contain a "comment extension". The contents of this
         block(s) is extracted. There are no title or author entries.


  - JPEG: The program recognizes three types of comments:
 	    - After an JPEG comment marker (0xfffe)
            - In an APP1-Exif marker (as used by Windows XP)
	    - In an APPD marker (as written by PhotoShop)


  - HTML: The text in between <title> and </title> and/or the contents of
	  the meta tags (supported are the tags according HTML 4.0 and
	  Dublin Core) are extracted.


  - PDF: The content of the "document information dictionary" is extracted
          (the content of the "Subject" key is returned as comment).
          Note that encrypted information is not decrypted!


  - OpenOffice documents (*.sxw, *.sxc, *.sxd, *.sdi, *.sxm); the entries of
          the properties dialog are extracted


  - OpenOffice 2 documents (*.odt, *.ods, *.odp, *.odg); the entries of the
          properties dialog are extracted


  - StarOffice documents (*.sdw, *.sdc, *.sdd, *.sda); the entries of the
          properties dialog are extracted


  - MS office documents (*.doc, *.xls, *.ppt); the entries of the properties
	  dialog are extracted.

          Thanks to the Apache Jakarta POI project for publishing their
	  insights about the MS Office format(s) (see
	  http://jakarta.apache.org/poi/index.html for details)


  - MP3 files: Extracts the contents of the ID3 tag.


  - OGG vorbis files: Extracts the contents of the comment header


  - RTF documents: Extracts the contents of the \info section


  - Abiword documents (*.abw): Extracts the contents of the property dialog

  - Other types can easily be added by the use of plugins (see below)


Feel free to report/send me any not-working documents or suggest other files,
which you like to be supported.

The found results can be written either in (human-readable) text (separated by
spaces), quoted comma-separated text (to be machine-interpreted, like imported
into a database or a spreadsheet), HTML (table), XML (defaults to XHTML) or
LaTeX (tabular) format. Note that some extra formatting of the text might be
done (like checking for LaTeX or HTML special characters).

The extraction of the description can be performed with threads. Note that this
can actually cause the program to be slower, if the thread searching for files
doesn't find enough files to be processed. And it does definitely *not* speed
things up on single-processor systems!


Plugins
-------

IExtract can use plugins to support further file-formats. A plugin is a
shared library (DLL), which must contain a function called "processFile" and
- if the contents of the file determines its type - another one called
"getFileType" - which either extracts respectively checks a file. An example
can be found in src/Plugins/Text.cpp

These shared libraries are added by a "Handler"-section in an INI-file:

  [Handler]
  txt=libText
  db=libDB


Installation
------------

See the file INSTALL


Windows
-------

Is supported (principally), though because of missing standards not that easy
(at least, if you have a different setup than me). See the (end of the) file
INSTALL.


Documentation
-------------

Can be found in the doc subdirectory (in HTML-format).


How to report bugs and/or send patches
--------------------------------------

Bug reports and patches should be send to the e-mail address of the author
(g17m0@lycos.com). Feel also free to send comments.

If you report a bug, please be sure to add anything which might be of use! Like

  - The version of the utility.
  - The version of the used libYGP library.
  - How to reproduce the bug; a file provocing it would be great.
  - In case of a crash there should also be a stackdump in your systemlog which
    might help in localising the bug.
  - Anything else you think might be helpful.