Log File Normalisation

Log File Normalisation
Prev	Chapter 1. Architecture Overview	Next

Figure 1.2. The Log Normalisation Process

The first process of the Lire log analysis framework is

the log file normalisation process. That process is summarized in the Figure 1.2 figure. This process is centered around the DLF concept which is kind of a universal log format. DLF stands for Distilled Log Format. The concept is that each product specific log file is transformed into a log format that can be common to all the products providing similar functionalities. In Lire's terminology, a class of applications providing similar functionality (e.g. MTA's supplying email) is called a superservice. Still in Lire's terminology, the service from which the super is derived (e.g. postfix or sendmail) refers to the native log format that is converted in the superservice's DLF. One can view the DLF as a table where the rows are the logged events and the fields are logged information related to each event.

Since the information logged by an email server is totally different from a web server, each superservice should have its own data models. In Lire, the data model is called a DLF schema. The DLF schemas are defined in XML files using the DLF Schema Markup Language. The schema describes what fields are available for each logged events.

One interesting aspect of Lire, is that altough the email DLF is used by all email servers, the email DLF data model isn't restricted to the lowest common denominator across the log formats supported by each email servers. In the Lire's architecture, the superservice's schema can represent the information logged by the most sophisticated product. When some part of the information isn't available in one log format, the DLF log file will contain this information and the reports that needs this information won't be included.

This architecture means that to support a new service, i.e. a new log format, in Lire you just need to write a plugin, called a DLF converter. This is just a simple perl script that parses the native log format and maps the information according to the schema.