Web spider: crawls the SGN website(s) for invalid markup, broken links, and orphan pages

Linklint © Copyright 1997-2001 James B. Bowlin http://www.linklint.org/
Tidy © Copyright 1998-2003 World Wide Web Consortium http://tidy.sourceforge.net/ 

to run: perl spider.pl [commandfile] [-options]

The recommended command file is "localhostcommand.in"; See Linklint documentation for information on command files - http://www.linklint.org/doc/index.html

Other arguments are optional.
-r means check remote links

Necessary files:

spider.pl
localhostcommand.in Command file for Linklint 
                    change -root (where localhost files live) and -doc (output) for each user
linklint.pl
tidy needs to be downloaded (apt-get install tidy or from http://tidy.sourceforge.net)

Optional files:
file_list.in       Pages which may be orphans, but it doesn't matter (generic index files, included files, etc)
known_orphans.in   Pages which are known to be orphans (http://internal.sgn.cornell.edu/orphan.html)

Output files:
index.html          Hyperlinked index to all site-check files created.
urlindex.html       Hyperlinked index to all remote-url-check files created (if remote links are checked)
see http://www.linklint.org/doc/outputs.html for a detailed description of every output file created by Linklint
html.out            List of all html files
orphan.out          List of possible orphan pages with pages from file_list.in removed
orphan_unknown.out  List of possible orphan pages with previously-seen orphan pages removed (might not be created)
orphan_fixed.out    List of possibly fixed former-orphan pages (might not be created)
tidy.out            Result from Tidy
