=head1 NAME
Html2Wml -- Program that can convert HTML pages to WML pages
=head1 SYNOPSIS
Html2Wml can be used as either a shell command:
$ html2wml file.html
or as a CGI:
/cgi-bin/html2wml.cgi?url=/index.html
In both cases, the file can be either a local file or a URL.
=head1 DESCRIPTION
Html2Wml converts HTML pages to WML decks, suitable for being viewed on a
Wap device. The program can be launched from a shell to statically convert
a set of pages, or as a CGI to convert a particular (potentially dynamic)
HTML resource.
Althought the result is not guarantied to be valid WML, it should be the
case for most pages. Good HTML pages will most probably produce valid
WML decks. To check and correct your pages, you can use W3C's softwares:
the I, available online at http://validator.w3.org
and I, written by Dave Raggett.
Html2Wml provides the following features:
=over 4
=item *
translation of the links
=item *
limitation of the cards size by splitting the result into several cards
=item *
inclusion of files (similar to the SSI)
=item *
compilation of the result (using the WML Tools, see L<"LINKS">)
=item *
a debug mode to check the result using validation functions
=back
=head1 OPTIONS
Please note that most of these options are also available when calling
Html2Wml as a CGI. In this case, boolean options are given the value
"1" or "0", and other options simply receive the value they
expect. For example, C<--ascii> becomes C or C. See the
file F for an example on how to call Html2Wml as a CGI.
=head2 Conversion Options
=over 4
=item -a, --ascii
When this option is on, named HTML entities and non-ASCII characters are
converted to US-ASCII characters using the same 7 bit approximations as
Lynx. For example, C<©> is translated to "(c)", and C<ß> is
translated to "ss". This option is off by default.
=item --[no]collapse
This option tells Html2Wml to collapse redundant whitespaces,
tabulations, carriage returns, lines feeds and empty paragraphs. The aim
is to reduce the size of the WML document as much as possible. Collapsing
empty paragraphs is necessary for two reasons. First, this avoids empty
screens (and on a device with only 4 lines of display, an empty screen can
be quite ennoying). Second, Html2wml creates many empty paragraphs when
converting, because of the way the syntax reconstructor is programmed.
Deleting these empty paragraphs is necessary like cleaning the kitchen :-)
If this really bother you, you can desactivate this behaviour with the
B<--nocollapse> option.
=item --ignore-images
This option tells Html2Wml to completly ignore all image links.
=item --[no]img-alt-text
This option tells Html2Wml to replace the image tags with their
corresponding alternative text (as with a text mode web browser).
This option is on by default.
=item --[no]linearize
This option is on by default. This makes Html2Wml flattens the HTML
tables (they are linearized), as Lynx does. I think this is better than
trying to use the native WML tables. First, they have extremely limited
features and possibilities compared to HTML tables. In particular, they
can't be nested. In fact this is normal because Wap devices are not
supposed to have a big CPU running at some zillions-hertz, and the
calculations needed to render the tables are the most complicated and
CPU-hogger part of HTML.
Second, as they can't be nested, and as typical HTML pages heavily use
imbricated tables to create their layout, it's impossible to decide which
one could be kept. So the best thing is to keep none of them.
B<[Note]> Although you can desactivate this behaviour, and although
there is internal support for tables, the unlinearized mode has not
been heavily tested with nested tables, and it may produce unexpected
results.
=item -n, --numeric-non-ascii
This option tells Html2wml to convert all non-ASCII characters to
numeric entities, i.e., "E" becomes C<é>, and "E"
becomes C<ß>. By default, this option is off.
=item -p, --nopre
This options tells Html2Wml not to use the EpreE
tag. This option was added because the compiler from WML Tools 0.0.4
doesn't support this tag.
=back
=head2 Links Reconstruction Options
=over 4
=item --hreftmpl=I
This options sets the template that will be used to reconstruct the
C-type links. See L<"LINKS RECONSTRUCTION"> for more information.
=item --srctmpl=I
This option sets the template that will be used to reconstruct the
C-type links. See L<"LINKS RECONSTRUCTION"> for more information.
=back
=head2 Splitting Options
=over 4
=item -s, --max-card-size=I
This option allows you to limit the size (in bytes) of the generated
cards. Default is 1,500 bytes, which should be small enought to be loaded
on most Wap devices. See L<"DECK SLICING"> for more information.
=item -t, --card-split-threshold=I
This option sets the threshold of the split event, which can occur
when the size of the current card is between C -
C and C. Default value is
50. See L<"DECK SLICING"> for more information.
=item --next-card-label=I
This options sets the label of the link that points to the next card.
Default is "[>>]", which whill be rendered as "[EE]".
=item --prev-card-label=I
This options sets the label of the link that points to the previous card.
Default is "[<<]", which whill be rendered as "[EE]".
=back
=head2 HTTP Authentication
=over 4
=item -U, --http-user=I
Use this option to set the username for an authenticated request.
=item -P, --http-passwd=I
Use this option to set the password for an authenticated request.
=back
=head2 Proxy Support
=over 4
=item -[no]Y, --[no]proxy
Use this option to activate proxy support. By default, proxy support
is activated. See L<"PROXY SUPPORT">.
=back
=head2 Output Options
=over 4
=item -k, --compile
Setting this option tells Html2Wml to use the compiler from WML Tools
to compile the WML deck. If you want to create a real Wap site, you should
seriously use this option in order to reduce the size of the WML decks.
Remember that Wap devices have very little amount of memory. If this is
not enought, use the splitting options.
Take a look in F for more information on how to use
a WML compiler with Html2Wml.
=item -o, --output
Use this option (in shell mode) to specify an output file.
By default, Html2Wml prints the result to standard output.
=back
=head2 Debugging Options
=over 4
=item -d, --debug[=I]
This option activates the debug mode. This prints the output result
with line numbering and with the result of the XML check. If the WML
compiler was called, the result is also printed in hexadecimal an ascii
forms. When called as a CGI, all of this is printed as HTML, so that can
use any web browser for that purpose.
=item --xmlcheck
When this option is on, it send the WML output to XML::Parser to check
its well-formedness.
=back
=head1 DECK SLICING
The I is a feature that Html2Wml provides in order to
match the low memory capabilities of most Wap devices. Many can't handle
cards larger than 2,000 bytes, therefore the cards must be sufficiently
small to be viewed by all Wap devices. To achieve this, you should compile
your WML deck, which reduce the size of the deck by 50%, but even then your
cards may be too big. This is where Html2Wml comes with the deck slicing
feature. This allows you to limit the size of the cards, currently only
I the compilation stage.
=head2 Slice by cards or by decks
On some Wap phones, slicing the deck is not sufficient: the WML browser
still tries to download the whole deck instead of just picking one
card at a time. A solution is to slice the WML document by decks.
See the figure below.
_____________ _____________
| deck | | deck #1 |
| _________ | | _________ |
| | card #1 | | | | card | |
| |_________| | | |_________| |
| _________ | |_____________|
| | card #2 | |
| |_________| | . . .
| _________ |
| | ... | | _____________
| |_________| | | deck #n |
| _________ | | _________ |
| | card #n | | | | card | |
| |_________| | | |_________| |
|_____________| |_____________|
WML document WML document
sliced by cards sliced by decks
What this means is that Html2Wml generates several WML documents.
In CGI mode, only the appropriate deck is sent, selected by the id
given in parameter. If no id was given, the first deck is sent.
=head2 Note on size calculation
Currently, Html2Wml estimates the size of the card on the fly, by
summing the length of the strings that compose the WML output, texts and
tags. I say "estimates" and not "calculates" because computing the exact
size would require many more calculations than the way it is done now.
One may objects that there are only additions, which is correct, but knowing
the I size is not necessary. Indeed, if you compile the WML, most of
the strings of the tags will be removed, but not all.
For example, take an image tag:
Cimg src="images/dog.jpg" alt="Photo of a dog"E>.
When compiled, the string C<"img"> will be replaced by a one byte value.
Same thing for the strings C<"src"> and C<"alt">, and the spaces, double
quotes and equal signs will be stripped. Only the text between double quote
will be preserved... but not in every cases.
Indeed, in order to go a step further, the compiler can also encode
parts of the arguments as binary. For example, the string C<"http://www.">
can be encoded as a single byte (C<8F> in this case). Or, if the attribute
is C, the string C can become the byte C<4B>.
As you see, it doesn't matter to know exactly the size of the textual
form of the WML, as it will always be far superior to the size of the
compiled form. That's why I don't count all the characters that may be
actually written.
Also, it's because I'm quite lazy ;-)
=head2 Why compiling the WML deck?
If you intent to create real WML pages, you should really
consider to always compile them. If you're not convinced, here is an
illustration.
Take the following WML code snipet:
Yahoo!
It's the basic and classical way to code an hyperlink. It takes 42 bytes
to code this, because it is presented in a human-readable form.
The WAP Forum has defined a compact binary representation of WML in its
specification, which is called "compiled WML". It's a binary format,
therefore you, a mere human, can't read that, but your computer can. And
it's much faster for it to read a binary format than to read a textual
format.
The previous example would be, once compiled (and printed here as
hexadecimal):
1C 4A 8F 03 y a h o o 00 85 01 03 Y a h o o ! 00 01
This only takes 21 bytes. Half the size of the human-readable form.
For a Wap device, this means both less to download, and easier things
to read. Therefore the processing of the document can be achieved in
a short time compared to the tectual version of the same document.
There is a last argument, and not the less important: many Wap devices
only read binary WML.
=head1 ACTIONS
Actions are a feature similar to (but with far less functionalities!) the
SSI (Server Side Includes) available on good servers like Apache. In order
not to interfere with the real SSI, but to keep the syntax easy to learn,
it differs in very few points.
=head2 Syntax
Basically, the syntax to execute an action is:
Note that the angle brackets are part of the syntax. Except for that
point, Actions syntax is very similar to SSI syntax.
=head2 Available actions
Only few actions are currently available, but more can be implemented
on request.
=over 4
=item include
=over 8
=item Description
Includes a file in the document at the current point. Please note
that Html2Wml doesn't check nor parse the file, and if the file
cannot be found, will silently die (this is the same behavior as SSI).
=item Parameters
C -- The file is get by http.
C -- The file is read from the local disk.
=back
=item fsize
=over 8
=item Description
Returns the size of a file at the current point of the document.
=item Parameters
C -- The file is get by http.
C -- The file is read from the local disk.
=item Notes
If you use the file parameter, an absolute path is recommend.
=back
=item skip
=over 8
=item Description
Skips everything until the first C action.
=back
=back
=head2 Generic parameters
The following parameters can be used for any action.
=over 4
=item for=I