wdcnt [-p|-z] [-e] files ...
wdcnt [-p|-z] [-e] < file
wdcnt -v
wdcnt counts reports English or Japanese words in files or
standard input. wdcnt ignores punctuation, digits, quote signs
or HTML tags. The output is sorted in the order of the occurrence
frequency and can be plotted directly by gnuplot(1) as follows.
gnuplot> set log xy
gnuplot> plot "< wdcnt file"
- -p
-
Reports probability instead of number of occurrences.
Each frequency is normalized by 1.0.
- -z
-
Reports relative frequency instead of number of occurrences.
1.0 for the most occurring word.
- -e
-
Does not use KAKASI. This option is NOT useful to Japanese documents.
- -v, -h
-
Prints usage and version then exit.
For English document, a traditional one-liner is known:
% tr -s '\040' '\012' files ... | sort -n | uniq -c | sort -n -r
Ruby/KAKASI <URL:http://www.ruby-lang.org/en/raa.html#Ruby%2FKAKASI>,
ruby(1) <URL:http://www.ruby-lang.org/>,
kakasi(1) <URL:http://kakasi.namazu.org/>,
gnuplot(1), tr(1), sort(1), uniq(1)
Word separation is not accurate.
Gotoken <URL:mailto:gotoken@notwork.org>
|