2002-7-24 David A. Wheeler * Released version 2.14. Improved Pascal detection, improved Pascal counting, added a reference to CCCC. 2002-7-24 David A. Wheeler * Modified Pascal counting; the older (*..*) commenting structure is now supported. Note that the Pascal counter is still imperfect; it doesn't handle the prioritization between these two commenting systems, and can be fooled by strings that include comment start indicator. Rewrites welcome, however, for most people the current code is sufficient. * Documented the weaknesses in the Pascal counter as BUGS. 2002-7-24 Ian West * Improved heuristic for detecting Pascal programs in break_filelist. Sloccount will now categorize files as Pascal if they have the file type ".pas" as well as ".p", though it still checks the contents to make sure it's really pascal. The heuristic was modified so that it's also considered Pascal if it contains "module" and "end.", or "program", "begin", and "end." in addition to the existing cases. (Ian West used sloccount to analyze a system containing about 1.2 million lines of code in almost 10,000 files; ninety percent of it is Ada, and the bulk of the remainder is split between Pascal and SQL. The following is Ian's more detailed explanation for the change): VAX Pascal uses "module" instead of "program" for files that have no program block and therefore no "begin". There is also no requirement for a Pascal file to have procedures or functions, which is the case for files that are equivalents of C headers. So I modified the function to allow files to be accepted that only contain either: "module" and "end."; or "program", "begin", and "end.". I considered adding checks for "const", "type", and "var" but decided they were not necessary. I have added the extra cases without changing the existing logic so as not to upset any cases for "unit". It is possible to optimize the logic somewhat, but I felt clarity was better than efficiency. I found that some of my Pascal files were getting through only because the word "unit" appeared in certain comments. So I moved the line for filtering out comments above the lines that look for the keywords. Pascal in general allows comments in the form (*...*) as well as {...}, so I added a line to remove these. After making these changes, all my files were correctly categorized. I also verified that the sample Pascal files from p2c still had the same counts. Thank you for developing SLOCCount. It is a very useful tool. 2002-7-15 David A. Wheeler * Added a reference to CCCC; http://cccc.sourceforge.net/ 2002-5-31 David A. Wheeler * Released version 2.13. * Code cleanups. Turned on gcc warnings ("-Wall" option) and cleaned up all code that set off a warning. This should make the code more portable as well as cleaner. Made a minor speed optimization on an error branch. 2002-3-30 David A. Wheeler * Released version 2.12. * Added a "testcode" directory with some sample source code files for testing. It's small now, but growth is expected. Contributions for this test directory (especially for edge/oddball cases) are welcome. 2002-3-25 David A. Wheeler * Changed first-line recognizers so that the first line (#!) will matched ignoring case. For most Unix/Linux systems uppercase script statements won't work, but Windows users. * Now recognize SpeedyCGI, a persistent CGI interface for Perl. SpeedyCGI has most of the speed advantages of FastCGI, but has the security advantages of CGI and has the CGI interface (from the application writer's point of view). SpeedyCGI perl scripts have #!/usr/bin/speedy lines instead of #!/usr/bin/perl. More information about SpeedyCGI can be found at http://daemoninc.com/speedycgi/ Thanks to Priyadi Iman Nurcahyo for noticing this. 2002-3-15 David A. Wheeler * Added filter to remove calls to sudo, so "#!/usr/bin/sudo /usr/bin/python" etc as the first line are correctly identified. 2002-3-7 David A. Wheeler * Added cross-references to LOCC and CodeCount. They don't do what I want.. which is why I wrote my own! .. but others may find them useful. 2002-2-28 David A. Wheeler * Released version 2.11. * Added support for C#. Any ".cs" file is presumed to be a C# file. The C SLOC counter is used to count SLOC. Note that C# doesn't have a "header" type (Java doesn't either), so disambiguating headers isn't needed. * Added support for regular Haskell source files (.hs). Their syntax is sufficiently similar that just the regular C SLOC counter works. Note that literate Haskell files (.lhs) are _not_ supported, so be sure to process .lhs files into .hs files before counting. There are two different .lhs conventions; for more info, see: http://www.haskell.org/onlinereport/literate.html * Tweaked COBOL counter slightly. Added support in fixed (default) format for "*" and "/" as comment markers in column 1. * Modified list of file extensions known not to be source code, based on suffixes(7). This speeds things very slightly, but the main goal is to make the "unknown" list smaller. That way, it's much easier to see if many source code files were incorectly ignored. In particular, compressed formats (e.g., ".tgz") and multimedia formats (".wav") were added. * Modified documentation to make things clear: If you want source in a compressed file to be counted (e.g. .zip, .tar, .tgz), you need to uncompress the file first!! * Modified documentation to clarify that literate programming files must be expanded first. * Now recognize ".ph" as Perl (it's "Perl header" code). Please let me know if this creates many false positives (i.e., if there are programs using ".ph" in other ways). * File count_unknown_ext modified slightly so that it now examines ~/.slocdata. Modified documentation so that its use is recommended and explained. It's been there for a while, but with poor documentation I bet few understand its value. * Modified output to clearly say that it's Open Source Software / Free Software, licensed under the GPL. It was already stated that way in the documentation and code, but clearly stating this on every run makes it even harder to miss. 2002-2-27 David A. Wheeler * Released version 2.10. * COBOL support added! Now ".cbl" and ".cob" are recognized as COBOL extensions, as well as their uppercase ".CBL" and ".COB". The COBOL counter works as follows: it detects if a "freeform" command has been given. Unless a freeform command's given, a comment has "*" or "/" in column 7, and a SLOC is a non-comment line with at least one non-whitespace in column 8 or later (including columns 72 or greater; it's arguable if a line that's empty before column 72 is really a line or a comment, but I've decided to count such odd things as lines). If we've gone free-format, a comment is a line that has optional whitespace and then "*".. otherwise, a line with nonwhitespace is a SLOC. Is this good enough? I think so, but I'm not a major COBOL user. Feedback from real COBOL users would be welcome. A source for COBOL test programs is: http://www.csis.ul.ie/cobol/examples/default.htm Information on COBOL syntax gathered from various locations, inc.: http://cs.hofstra.edu/~vmaffea1/cobol.html http://support.merant.com/websupport/docs/microfocus/books/ nx31books/lrintr.htm * Modified handling of uppercase filename extensions so they'll be recognized as well as the more typicaly lowercase extensions. If a file has one or more uppercase letters - and NO lowercase letters - it's assumed that it may be a refugee from an old OS that supported only uppercase filenames. In that circumstance, if the filename extension doesn't match the set of known extensions, it's made into lowercase and recompared against the set of extensions for source code files. This heuristic should improve recognition of source file types for "old" programs using upper-case-only characters. I do have concern that this may be "too greedy" an algorithm, i.e., it might claim that some files that aren't really source code are now source code. I don't think it will be a problem, though; many people create filename extensions that only differ by case in most circumstances; the ".c" vs. ".C" thing is an exception, and since Windows folds case it's not a very portable practice. This is a pretty conservative heuristic; I found Cobol programs with lowercase filenames and uppercase extensions ("x.CBL"), which wouldn't be matched by this heuristic. For Cobol and Fortran I put in special ".F", ".CBL", and ".COB" patterns to catch them. With those two actions, the program should manage to correctly identify more source files without incorrectly matching non-source files. * ".f77" is now also accepted as a Fortran77 extension. Thanks to http://www.webopedia.com/quick_ref/fileextensionsfull.html which has lots of extension information. * Fixed a bug in handling top-level directories where there were NO source files at all; in certain cases this would create spurious error messages. (Fix in compute_all). 2002-1-7 David A. Wheeler * Released version 2.09. 2002-1-9 David A. Wheeler * Added support for the Ruby programming language, thanks to patches from Josef Spillner. * Documentation change: added more discussion about COCOMO, in particular why its cost estimates appeared so large. Some programmers think of just the coding part, and only what they'd get paid directly.. but that's less than 10% of the costs. 2002-1-7 David A. Wheeler * Minor documentation fix - the example for --effort in sloccount.html wasn't quite right (the base documentation for --effort was right, it was just the example that was wrong). My thanks to Kevin the Blue for pointing this out. 2002-1-3 David A. Wheeler * Released version 2.08. 2002-1-3 David A. Wheeler * Based on suggestions by Greg Sjaardema : * Modified c_count.c, function count_file to close the stream after the file is analyzed. Otherwise, this can cause problems with too many open files on some systems, particularly on operating systems with small limits (e.g., Solaris). * Added '.F' as a Fortran extension. 2002-1-2 David A. Wheeler * Released version 2.07. 2002-1-2 Vaclav Slavik * Modified the RPM .spec file in the following ways: * By default the RPM package now installs into /usr (so binaries go into /usr/bin). Note that those who use the makefile directly ("make install"), including tarball users, will still default to /usr/local instead. You can still make the RPM install to /usr/local by using the prefix option, e.g.: rpm -Uvh --prefix=/usr/local sloccount*.rpm * Made it use %{_prefix} variable, i.e. changing it to install in /usr/local or /usr is a matter of changing one line * Use wildcards in %files section, so that you don't have to modify the specfile when you add new executable * Mods to make it possible to build the RPM as non-root (i.e. BuildRoot support, %defattr in %files, PREFIX passed to make install) 2002-1-2 Jesus M. Gonzalez Barahona * Added support for Modula-3 (.m3, .i3). * ".sc" files are counted as Lisp. * Modified sloccount to handle EVEN LARGER systems (i.e., so sloccount will scale even more). In a few cases, parameters were passed on the command line and large systems could be so large that the command line was too long. E.G., Debian GNU/Linux. This caused a large number of changes to different files to remove these scaleability limitations. * All *_count programs now accept "-f filename" and "-f -" options, where 'filename' is a file with a list of filenames to count. Internally the "-f" option with a filename is always used, so that an arbitrarily long list of files can be measured and so that "ps" will show more status information. * compute_sloc_lang modified accordingly. * get_sloc now has a "--stdin" option. * Some small fixes here and there. * This closes Debian bug #126503. 2001-12-28 David A. Wheeler * Released sloccount 2.06. 2001-12-27 David A. Wheeler * Fixed a minor bug in break_filelist, which caused (in extremely unusual circumstances) a problem when disambiguating C from C++ files in complicated situations where this difference was hard to tell. The symptom: When analyzing some packages (for instance, afterstep-1.6.10 as packaged in Debian 2.2) you would get the following error: Use of uninitialized value in pattern match (m//) at /usr/bin/break_filelist line 962. This could only happen after many other disambiguating rules failed to determine if a file was C or C++ code, so the problem was quite rare. My thanks to Jesus M. Gonzalez-Barahona (in Mostoles, Spain) for the patch that fixes this problem. * Modified man page, explaining the problems of filenames with newlines, and also noting the problems with directories beginning with "-" (they might be confused as options). * Minor improvements to Changelog text, so that the changes over time were documented more clearly. * Note that CEPIS "Upgrade" includes a paper that depends on sloccount. This is "Counting Potatoes: the Size of Debian 2.2" which counts the size of Debian 2.2 (instead of Red Hat Linux, which is what I counted). The original release is at: . I understand that they'll make some tweaks and release a revision of the paper on the Debian website. It's interesting; Debian 2.2 (released in 2000, and which did NOT have KDE), has 56 million physical SLOC and would have cost $1.8 billion USD to develop traditionally. That's more than Red Hat; see . Top languages: C (71.12%), C++ (9.79%), LISP, Shell, Perl, Fotran, Tcl, Objective-C, Assembler, Ada, and Python in that order. My thanks to the authors! 2001-10-25 David A. Wheeler * Released sloccount 2.05. * Added support for detecting and counting PHP code. This was slightly tricky, because PHP's syntax has a few "gotchas" like "here document" strings, closing working even in C++ or sh style comments, and so on. Note - HTML files (.html, .htm, etc) are not examined for PHP code. You really shouldn't put a lot of PHP code in HTML documents, because it's a maintenance problem later anyway. The tool assigns every file a single type.. which is a problem, because HTML files could have multiple simultaneous embedded types (PHP, javascript, and HTML text). If the tool was modified to assign multiple languages to a single file, I'm not sure how to handle the file counts (counts of files for each language). For the moment, I just assign HTML to "html". * Modified output so that it adds a header before the language list. 2001-10-23 David A. Wheeler * Released sloccount 2.01 - a minor modification to support Cygwin users. * Modified compute_all to make it more portable (== became =); in particular this should help users using Cygwin. * Modified documentation to note that, if you install Cygwin, you HAVE to use Unix newlines (not DOS newlines) for the Cygwin install. Thanks to Mark Ericson for the bug report & for helping me track that down. * Minor cleanups to the ChangeLog. 2001-08-26 David A. Wheeler * Released sloccount 2.0 - it's getting a new version number because its internal data format changed. You'll have to re-analyze your system for the new sloccount to work. * Improved the heuristics to identify files (esp. .h files) as C, C++, or objective-C. The code now recognizes ".H" (as well as ".h") as header files. The code realizes that ".cpp" files that begin with .\" or ,\" aren't really C++ files - XFree86 stores many man pages with these extensions (ugh). * Added the ability to "--append" analyses. This means that you can analyze some projects, and then repeatedly add new projects. sloccount even stores and recovers md5 checksums, so it even detects duplicates across the projects (the "first" project gets the duplicate). * Added the ability to mark a data directory so that it's not erased (just create a file named "sloc_noerase" in the data directory). From then on, sloccount won't erase it until you remove the file. * Many changes made aren't user-visible. Completely re-organized break_filelist, which was getting incredibly baroque. I've improved the sloccount code so that adding new languages is much simpler; before, it required a number of changes in different places, which was bad. * SLOCCount now creates far fewer files, which is important for analyzing big systems (I was starting to run out of inodes when analyzing entire GNU/Linux distributions). Previous versions created stub files in every child directory for every possible language, even those that weren't used; since most projects only use a few languages, this was costly in terms of inodes. Also, the totals for each language for a given child directory are now in a single file (all-physical.sloc) instead of being in separate files; this not only reduces inode counts, but it also greatly simplifies later processing & eliminated a bug (now, to process all physical SLOC counts in a given child directory, just process that one file). 2001-06-22 David A. Wheeler * Per Prabhu Ramachandran's suggestion, recognize ".H" files as ".h"/".hpp" files (note the upper case). 2001-06-20 David A. Wheeler * Released version 1.9. This eliminates installation errors with "sql_count" and "makefile_count", detects PostgreSQL embedded C (in addition to Oracle and Informix), improves detection of Pascal code, and includes support for analyzing licenses (if a directory has the file PROGRAM_LICENSE, the file's contents are assumed to have the license name for that top-level program). It eliminates a portability problem, so hopefully it'll be easier to run it on Unix-like systems. It _still_ requires the "md5sum" program to run. 2001-06-14 David A. Wheeler * Changed the logic in make_filelists. This version doesn't require a "-L" option to test which GNU programs supported but which others (e.g., Solaris) didn't. It still doesn't normally follow symlinks. Not following subordinate symlinks is important for handling oddities such as pine's build directory /usr/src/redhat/BUILD/pine4.33/ldap in Red Hat 7.1, which includes symlinks to directories not actually inside the package at all (/usr/include and /usr/lib). * Added display of licenses in the summary form, if license information is available. * Added undocumented programs rpm_unpacker and extract_license. These are not installed at this time, they're just provided as a useful starting point if someone wants them. 2001-06-12 David A. Wheeler * Added support for license counting. If the top directory of a program has a file named "PROGRAM_LICENSE", it's copied to the .slocdata entry, and it's reported as part of a licensing total. Note that the file LICENSE is ignored, that's often more complex. 2001-06-08 David A. Wheeler * Fixed RPM spec file - it accidentally didn't install makefile_count and sql_count. This would produce spurious errors and inhibited the option of counting makefiles and SQL. Also fixed the makefile to include sql_count in the executable list. 2001-05-16 David A. Wheeler * Added support for auto-detecting ".pgc" files, which are embedded PostgreSQL - they are assumed to be C files (they COULD be C++ instead; while this will affect categorization it won't affect final SLOC counts). Also, if there's a ".c" with a corresponding ".pgc" file, the ".c" file is assumed to be auto-generated. * Thus, SLOCCount now supports embedded database commands for Oracle, Informix, and PostgreSQL. MySQL doesn't use an "embedded" approach, but uses a library approach that SLOCCount could already handle. * Fixed documentation: HTML reserved characters misused, sql_count undocumented. 2001-05-14 David A. Wheeler * Added modifications from Gordon Hart to improve detection of Pascal source code files. Pascal files which only have a "unit" in them (not a full program), or have "interface" or "implementation", are now detected as Pascal programs. The original Pascal specification didn't support units, but there are Pascal programs which use them. This should result in more accurate counts of Pascal software that uses units. He also reminded me that Pascal is case-insensitive, spurring a modification in the detection routines (for those who insist on uppercase keywords.. a truly UGLY format, but we need to support it to correctly identify such source code as Pascal). * Modified the documentation to note that I prefer unified diffs. I also added a reference to the TODO file, and from here on I'll post the TODO file separately on my web site. 2001-05-02 David A. Wheeler * Released version 1.8. Added several features to support measuring programs with embedded database commands. This includes suporting many Oracle & Informix embedded file types (.pc, .pcc, .pad, .ec, .ecp). It also optionally counts SQL files (.sql) and makefiles (makefile, Makefile, etc.), though by default they are NOT included in lines-of-code counts. See the (new) TODO file for limitations on makefile identification. 2001-04-30 David A. Wheeler * Per suggestion from Gary Myer, added optional "--addlang" option to add languages not NORMALLY counted. Currently it only supports "makefile" and "sql". The scheme for detecting automatically generated makefiles could use improvement. Normally, makefiles and sql won't be counted in the final reports, but the front-end will make the calculations and if requested their values will be provided. * Added an "SQL" counter and a "makefile" counter. * Per suggestions from Gary Myer, added detection for files where database commands (Oracle and Informix) are embedded in the code: .pc -> Oracle Preprocessed C code .pcc -> Oracle preprocessed C++ Code .pad -> Oracle preprocessed Ada Code .ec -> Informix preprocessed C code .ecp -> Informix preprocessed C code which calls the C preprocessor before calling the Informix preprocessor. Handling ".pc" has heuristics, since many use ".pc" to mean "stuff about PCs". Certain filenames not counted as C files (e.g., "makefile.pc" and "README.pc") if they end in ".pc". Note that if you stick C++ code into .pc files, it's counted as C. These embedded files are normal source files of the respective language, with database commands stuck into them, e.g., EXEC SQL select FIELD into :variable from TABLE; which performs a select statement and puts the result into the variable. The database preprocessor simply reads this file, and converts all "EXEC SQL" statements into the appropriate calls and outputs a normal program. Currently the "automatically generated" detectors don't detect this case. For the moment, just make sure the generated files aren't around while running SLOCCount. Currently the following are not handled (future release?): .pco -> Oracle preprocessed Cobol Code .pfo -> Oracle preprocessed Fortran Code I don't have a Cobol counter. The Fortran counter only works for f77, and I doubt .pfo is limited to that. 2001-04-27 David A. Wheeler * Per suggestions from Gary Myer, added ".a" and ".so" to the "not" list, since these are libraries not source, and added the filename "Root" to the "not" file list ("Root" has special meaning to CVS). * Added a note about needing "md5sum" (Gary Myer) * Added a TODO file. If something's on the TODO list that you'd like, please write the code and send it in. * Noted that running on Cygwin is MUCH slower than when running on Linux. Truth in advertizing is only fair. 2001-04-26 David A. Wheeler * Release version 1.6: the big change is support for running on Windows. Windows users must install Cygwin first. * Modified makefile so that SLOCCount can run on Windows systems if "Cygwin" is installed. The basic modifications to do this were developed by John Clezy -- Thanks!!! I spent time merging his makefile and mine so that a single makefile could be used on both Windows and Unix. * Documented how to install and run SLOCCount on Windows using cygwin. * Changed default prefix to /usr/local; you can set PREFIX to change this, e.g., "make PREFIX=/usr". * When counting a single project, sloccount now also reports "Estimated average number of developers", which is simply the person-months divided by months. As with all estimates, take it with an ocean of salt. This isn't reported for multiproject queries; properly doing this would require "packing" to compensate for the fact that small projects complete before large ones if started simultaneously. * Improved man page (fixed a typo, etc.). 2001-01-10 David A. Wheeler * Released version 1.4. This is an "ease of use" release, greatly simplifying the installation and use of SLOCCount. The new front-end tool "sloccount" does all the work in one step - now just type "sloccount DIRECTORY" and it's all counted. An RPM makes installation trivial for RPM-based systems. A man page is now available. There are now rules for "make install" and "make uninstall" too. Other improvements include a schedule estimator and options to control the effort and schedule estimators. 2001-01-07 David A. Wheeler * Added an estimator of schedule as well as effort. * Added various options to control the effort and cost estimation: "--effort", "--personcost", "--overhead", and "--schedule". Now people can (through options) control the assumptions made in the effort and cost estimations from the command line. The output now shows the effort estimation model used. * Changed the output slightly to pretty it up and note that it's development EFFORT not TIME that is shown. * Added a note at bottom asking for credit. I don't ask for any money, but I'd like some credit if you refer to the data the tool generates; a gentle reminder in the output seemed like the easiest way to ask for this credit. * Created an RPM package; now RPM-based systems can EASILY install it. It's a relocatable package, so hopefully "alien" can easily translate it to other formats (such as Debian's .deb format). * Created a "man" page for sloccount. 2001-01-06 David A. Wheeler * Added front-end tool "sloccount", GREATLY improving ease-of-use. The tool "sloccount" invokes all the other SLOCCount tools in the right order, performing a count of a typical project or set of projects. From now on, this is expected to be the "usual" interface, though the pieces will still be documented to help those with more unusual needs. From now on, "SLOCCount" is the entire package, and "sloccount" is this front-end tool. * Added "--datadir" option to make_filelists (to support "sloccount"). * get_sloc: No longer displays languages with 0 counts. * Documentation: documented "sloccount"; this caused major changes, since "sloccount" is now the recommended interface for all but those with complicated requirements. * compute_filecount: minor optimization/simplication 2001-01-05 David A. Wheeler * Released vesion 1.2. * Changed the name of many programs, as part of a general clean-up. I changed "compute_all" to "compute_sloc", and eliminated most of the other "compute_*" files (replacing it with "compute_sloc_lang"). I also changed "get_data" to "get_sloc". This is part of a general clean-up, so that if someone wants to package this program for installation they don't have a thousand tiny programs polluting the namespace. Adding "sloc" to the names makes namespace collisions less likely. I also worked to make the program simpler. * Made a number of documentation fixes - my thanks to Clyde Roby for giving me feedback. * Changed all "*_count" programs to consistently print at the end "Total:" on a line by itself, followed on the next line by the total lines of code all by itself. This makes the new program get_sloc_detail simpler to implement, and also enables get_sloc_detail to perform some error detection. * Changed name of compressed file to ".tar.gz" and modified docs appropriately. The problem is a bug in Netscape 4.7 clients running on Windows; it appears that ".tgz" files don't get fully downloaded from my hosting webserver because no type information is provided. Originally, I tried to change the website to fix this by creating ".htaccess" files, but that didn't work with either: AddEncoding x-gzip gz tgz AddType application/x-tar .tgz or: AddEncoding application/octet-stream tgz So, we'll switch to .tar.gz, which works. My thanks to Christopher Lott for this feedback. * Removed a few garbage files. * Added information to documentation on how to handle HUGE sets of data directory children, i.e., where you can't even use "*" to list the data directory children. I don't have a directory of that kind of scale, so I can't test it directly, but I can at least discuss how to do it; it SHOULD work. * Changed makefile so that "ChangeLog" is now visible on the web. 2001-01-04 David A. Wheeler * Minor fixes to documentation. * Added "--crossdups" option to break_filelist. * Documented count_unknown_ext. * Created new tool, "get_sloc_detail", and documented it. Now you can get a complete report of all the SLOC data in one big file (e.g., for exporting to another tool for analysis). 2001-01-03 David A. Wheeler * First public release, version "1.0", of "SLOCCount". Main website: http://www.dwheeler.com/sloccount