lintian: r1066 - in trunk: checks debian lib testset testset/binary/debian

To: debian-lint-maint@lists.debian.org
Subject: lintian: r1066 - in trunk: checks debian lib testset testset/binary/debian
From: rra@debian.org
Date: Sat, 08 Dec 2007 06:23:45 +0100
Message-id: <[🔎] E1J0sAD-0005lI-QH@mordor.wolffelaar.nl>
Author: rra
Date: 2007-12-08 06:23:28 +0100 (Sat, 08 Dec 2007)
New Revision: 1066

Added:
   trunk/lib/Spelling.pm
Removed:
   trunk/checks/spelling
   trunk/checks/spelling.desc
Modified:
   trunk/checks/changelog-file
   trunk/checks/changelog-file.desc
   trunk/checks/copyright-file
   trunk/checks/copyright-file.desc
   trunk/checks/debian-readme
   trunk/checks/debian-readme.desc
   trunk/checks/description
   trunk/checks/description.desc
   trunk/checks/menus
   trunk/checks/menus.desc
   trunk/debian/changelog
   trunk/testset/binary/debian/NEWS.Debian
   trunk/testset/binary/debian/doc-base
   trunk/testset/tags.binary
   trunk/testset/tags.manpages
Log:
* checks/changelog-file{.desc,}:
  + [RA] Check the latest entry of the Debian changelog and any
    NEWS.Debian file for common spelling errors.  (Closes: #36017)
* checks/copyright-file{.desc,}:
  + [RA] Moved spelling-error-in-copyright check to here.
* checks/debian-readme{.desc,}:
  + [RA] Moved spelling-error-in-readme-debian check to here.
* checks/description{.desc,}:
  + [RA] Moved spelling-error-in-description check to here.
* checks/menus{.desc,}:
  + [RA] Substantial overhaul and expansion of the doc-base control file
    checks.  Patch from Robert Luberda.  (Closes: #448783)
* checks/spelling{.desc,}:
  + [RA] Subsumed into other check scripts and lib/Spelling.pm.
* lib/Spelling.pm:
  + [RA] New module to do general spelling checks for specific
    misspellings.  Based on the previous checks/spelling and a patch by
    Robert Luberda.

Modified: trunk/checks/changelog-file
===================================================================
--- trunk/checks/changelog-file	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/changelog-file	2007-12-08 05:23:28 UTC (rev 1066)
@@ -20,6 +20,7 @@
 
 package Lintian::changelog_file;
 use strict;
+use Spelling;
 use Tags;
 use Util;
 use Parse::DebianChangelog;
@@ -172,6 +173,7 @@
         if ($entry->Distribution =~ /unreleased/i) {
             tag "debian-news-entry-has-strange-distribution", $entry->Distribution;
         }
+        spelling_check('spelling-error-in-news-debian', $entry->Changes);
     }
 }
 
@@ -309,6 +311,7 @@
     while ($entry =~ /(closes\s*(?:bug)?\#?\s?\d{6,})[^\w]/ig) {
 	tag "possible-missing-colon-in-closes", "$1" if $1;
     }
+    spelling_check('spelling-error-in-changelog', $entry);
 }
 
 # read the changelog itself

Modified: trunk/checks/changelog-file.desc
===================================================================
--- trunk/checks/changelog-file.desc	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/changelog-file.desc	2007-12-08 05:23:28 UTC (rev 1066)
@@ -193,3 +193,15 @@
  matter, but it may be confusing to users reading the entry if the
  distribution doesn't match the distribution for the same entry in the
  Debian changelog file.
+
+Tag: spelling-error-in-changelog
+Type: warning
+Info: Lintian found a spelling error in the latest entry of the Debian
+ changelog.  Lintian has a list of common misspellings that it looks for.
+ It does not have a dictionary like a spelling checker does.
+
+Tag: spelling-error-in-news-debian
+Type: warning
+Info: Lintian found a spelling error in the latest entry of the
+ NEWS.Debian file.  Lintian has a list of common misspellings that it
+ looks for.  It does not have a dictionary like a spelling checker does.

Modified: trunk/checks/copyright-file
===================================================================
--- trunk/checks/copyright-file	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/copyright-file	2007-12-08 05:23:28 UTC (rev 1066)
@@ -21,6 +21,7 @@
 package Lintian::copyright_file;
 use strict;
 use Dep;
+use Spelling;
 use Tags;
 use Util;
 
@@ -230,6 +231,8 @@
     tag "copyright-contains-dh_make-todo-boilerplate", "";
 }
 
+spelling_check('spelling-error-in-copyright', $_);
+
 } # </run>
 
 # -----------------------------------

Modified: trunk/checks/copyright-file.desc
===================================================================
--- trunk/checks/copyright-file.desc	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/copyright-file.desc	2007-12-08 05:23:28 UTC (rev 1066)
@@ -189,3 +189,9 @@
  file, which indicates that you either didn't check the whole source
  to find additional copyright/license, or that you didn't remove that
  paragraph after having done so.
+
+Tag: spelling-error-in-copyright
+Type: warning
+Info: Lintian found a spelling error in the copyright file.  Lintian has a
+ list of common misspellings that it looks for.  It does not have a
+ dictionary like a spelling checker does.

Modified: trunk/checks/debian-readme
===================================================================
--- trunk/checks/debian-readme	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/debian-readme	2007-12-08 05:23:28 UTC (rev 1066)
@@ -20,6 +20,7 @@
 
 package Lintian::debian_readme;
 use strict;
+use Spelling;
 use Tags;
 
 sub run {
@@ -52,6 +53,8 @@
     tag("readme-debian-contains-debmake-default-email-address");
 }
 
+spelling_check('spelling-error-in-readme-debian', $readme);
+
 }
 
 1;

Modified: trunk/checks/debian-readme.desc
===================================================================
--- trunk/checks/debian-readme.desc	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/debian-readme.desc	2007-12-08 05:23:28 UTC (rev 1066)
@@ -23,3 +23,9 @@
 Type: warning
 Info: The README.Debian file contains an email address (&lt;..@unknown&gt;)
  that was not updated to the maintainer's real address.
+
+Tag: spelling-error-in-readme-debian
+Type: warning
+Info: Lintian found a spelling error in the README.Debian file.  Lintian
+ has a list of common misspellings that it looks for.  It does not have a
+ dictionary like a spelling checker does.

Modified: trunk/checks/description
===================================================================
--- trunk/checks/description	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/description	2007-12-08 05:23:28 UTC (rev 1066)
@@ -20,6 +20,7 @@
 
 package Lintian::description;
 use strict;
+use Spelling;
 use Tags;
 use Util;
 
@@ -35,6 +36,7 @@
 my $template = 0;
 my $unindented_list = 0;
 my $synopsis;
+my $description;
 
 # description?
 unless (-f $cf) {
@@ -45,7 +47,9 @@
 open(IN, '<', $cf) or fail("cannot open $cf for reading: $!");
 
 # 1st line contains synopsis
-chop($synopsis = <IN>);
+$synopsis = <IN>;
+$description = $synopsis;
+chomp $synopsis;
 
 if ($synopsis =~ m/^\s*$/) {
     tag "description-synopsis-is-empty", "";
@@ -73,7 +77,8 @@
 }
 
 while (<IN>) {
-    chop;
+    $description .= $_;
+    chomp;
     next if m/^\s*$/o;
     next if m/^\.\s*$/o;
 
@@ -131,6 +136,10 @@
     tag "extended-description-is-empty", "" unless $type eq 'udeb';
 }
 
+if ($description) {
+    spelling_check('spelling-error-in-description', $description);
 }
 
+}
+
 1;

Modified: trunk/checks/description.desc
===================================================================
--- trunk/checks/description.desc	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/description.desc	2007-12-08 05:23:28 UTC (rev 1066)
@@ -104,3 +104,9 @@
  dpkg now supports Homepage: as a regular field in
  <tt>debian/control</tt>.  This header should be moved from the extended
  description to the fields for the relevant source or binary packages.
+
+Tag: spelling-error-in-description
+Type: warning
+Info: Lintian found a spelling error in the package description.  Lintian
+ has a list of common misspellings that it looks for.  It does not have a
+ dictionary like a spelling checker does.

Modified: trunk/checks/menus
===================================================================
--- trunk/checks/menus	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/menus	2007-12-08 05:23:28 UTC (rev 1066)
@@ -24,6 +24,7 @@
 use strict;
 use lib "$ENV{'LINTIAN_ROOT'}/checks/";
 use common_data;
+use Spelling;
 use Tags;
 use Util;
 
@@ -31,6 +32,21 @@
 my %all_files = ();
 my %all_links = ();
 
+# Known fields for doc-base files.  The value is 1 for required fields and 0
+# for optional fields.
+my %known_docbase_main_fields = (
+	'document' => 1,
+	'title'    => 1,
+	'section'  => 1,
+	'abstract' => 0,
+	'author'   => 0
+);
+my %known_docbase_format_fields = (
+	'format'  => 1,
+	'files'   => 1,
+	'index'   => 0
+);
+
 sub run {
 
 $pkg = shift;
@@ -160,97 +176,11 @@
 
     # check the contents of the doc-base file(s)
     opendir DOCBASEDIR, "doc-base" or fail("cannot read doc-base directory.");
-    while (my $dbfile = readdir DOCBASEDIR) {
+    my $dbfile;
+    while (defined ($dbfile = readdir DOCBASEDIR)) {
 	# don't try to parse executables, plus we already warned about it
 	next if -x "doc-base/$dbfile";
-	open (IN, '<', "doc-base/$dbfile") or
-	    fail("cannot open doc-base file $dbfile for reading.");
-
-	# Check if files referenced by doc-base are included in the package.
-	# The Index field should refer to only one file without wildcards.
-	# The Files field is a whitespace-separated list of files and may
-	# contain wildcards.  We skip without validating wildcard patterns
-	# containing character classes since otherwise we'd need to deal with
-	# wildcards inside character classes and aren't there yet.
-	#
-	# Defer checking files until we've read all possible continuation
-	# lines for the field.	As a result, all tags will be reported on the
-	# last continuation line of the field, rather than possibly where the
-	# offending file name is.
-	my (@files, $field, $sawindex, $sawdocument, $format, $insection);
-	while (1) {
-	    $_ = <IN>;
-	    if ((!defined ($_) || /^\S/ || /^$/) && $field) {
-		# Figure out the right line number.  It's actually the
-		# previous line, since we read ahead for continuation lines,
-		# unless we're at the end of the file.
-		my $line = $. - 1 + (defined ($_) ? 0 : 1);
-		if ($field eq 'index' && @files > 1) {
-		    tag "doc-base-index-references-multiple-files", "$dbfile:$line";
-		}
-		for my $file (@files) {
-		    if ($file =~ m%^/usr/doc%) {
-			tag "doc-base-file-references-usr-doc", "$dbfile:$line";
-		    }
-		    my $realfile = delink ($file);
-
-		    # openoffice.org-dev-doc has thousands of files listed so
-		    # try to use the hash if possible.
-		    my $found;
-		    if ($realfile =~ /[*?]/) {
-			my $regex = quotemeta ($realfile);
-			unless ($field eq 'index') {
-			    next if $regex =~ /\[/;
-			    $regex =~ s%\\\*%[^/]*%g;
-			    $regex =~ s%\\\?%[^/]%g;
-			    $regex .= '/?';
-			}
-			$found = grep { /^$regex\z/ } keys %all_files;
-		    } else {
-			$found = $all_files{$realfile} || $all_files{"$realfile/"};
-		    }
-		    unless ($found) {
-			tag "doc-base-file-references-missing-file", "$dbfile:$line", $file;
-		    }
-		}
-		undef @files;
-		undef $field;
-	    }
-	    if (defined ($_) && /^(Index|Files)\s*:\s*(.*?)\s*$/i) {
-		$field = lc $1;
-		@files = split (' ', $2);
-		if ($field eq 'index') {
-		    $sawindex = 1;
-		}
-	    } elsif (defined ($_) && /^Format\s*:\s*(.*?)\s*$/i) {
-		$format = lc $1;
-		tag "doc-base-file-unknown-format", "$dbfile:$.", $format
-		    unless $known_doc_base_formats{$format};
-	    } elsif (defined ($_) && /^Document\s*:/i) {
-		$sawdocument = 1;
-                tag "doc-base-document-field-ends-in-whitespace", "$dbfile:$."
-                    if /[ \t]$/;
-	    } elsif (defined ($_) && /^\s/ && $field) {
-		push (@files, split ' ');
-	    }
-	    if (defined ($_) && /^\s*\S/) {
-		$insection = 1;
-	    }
-	    if (!defined ($_) || /^$/) {
-		tag "doc-base-file-no-format", "$dbfile:$."
-		    if ($insection && !($format || $sawdocument));
-		if ($format && ($format eq 'html' || $format eq 'info')) {
-		    tag "doc-base-file-no-index", "$dbfile:$."
-			unless $sawindex;
-		}
-		last unless defined $_;
-		undef $format;
-		undef $sawdocument;
-		undef $sawindex;
-		undef $insection;
-	    }
-	}
-	close IN;
+	check_doc_base_file($dbfile);
     }
     closedir DOCBASEDIR;
 } else {
@@ -286,6 +216,244 @@
 
 # -----------------------------------
 
+sub check_doc_base_file {
+    my $dbfile = shift;
+
+    open (IN, '<', "doc-base/$dbfile")
+        or fail("cannot open doc-base file $dbfile for reading.");
+
+    my (@files, $field, @vals);
+    my $knownfields = \%known_docbase_main_fields;
+    my $line        = 0;  # global
+    my %sawfields   = (); # local for each section of control file
+    my %sawformats  = (); # global for control file
+
+    while (<IN>) {
+        chomp;
+
+        # New field.  check previous field, if we have any.
+        if (/^(\S+)\s*:\s*(.*)$/) {
+            my (@new) = ($1, $2);
+            if ($field) {
+                check_doc_base_field($dbfile, $line, $field, \@vals,
+                                     \%sawfields, \%sawformats, $knownfields);
+            }
+            $field = lc $new[0];
+            @vals  = ($new[1]);
+            $line  = $.;
+
+        # Continuation of previously defined field.
+        } elsif ($field && /^\s+\S/) {
+            push (@vals, $_);
+
+            # All tags will be reported on the last continuation line of the
+            # doc-base field.
+            $line  = $.;
+
+        # Sections' separator.
+        } elsif (/^(\s*)$/) {
+            tag "doc-base-file-separator-extra-whitespaces", "$dbfile:$."
+                if $1;
+            next unless $field; # skip successive empty lines
+
+            # Check previously defined field and section.
+            check_doc_base_field($dbfile, $line, $field, \@vals, \%sawfields,
+                                 \%sawformats, $knownfields);
+            check_doc_base_file_section($dbfile, $line + 1, \%sawfields,
+                                        \%sawformats, $knownfields);
+
+            # Intialize variables for new section.
+            undef $field;
+            undef $line;
+            @vals      = ();
+            %sawfields = ();
+
+            # Each section except the first one is format section.
+            $knownfields = \%known_docbase_format_fields;
+
+        # Everything else is a syntax error.
+        } else {
+            tag "doc-base-file-syntax-error", "$dbfile:$.";
+        }
+    }
+
+    # Check the last field/section of the control file.
+    if ($field) {
+        check_doc_base_field($dbfile, $line, $field, \@vals, \%sawfields,
+                             \%sawformats, $knownfields);
+        check_doc_base_file_section($dbfile, $line, \%sawfields, \%sawformats,
+                                    $knownfields);
+    }
+
+    # Make sure we saw at least one format.
+    tag "doc-base-file-no-format-section", "$dbfile:$." unless %sawformats;
+
+    close IN;
+}
+
+# Checks one field of a doc-base control file.  $vals is array ref containing
+# all lines of the field.  Modifies $sawfields and $sawformats.
+sub check_doc_base_field {
+    my ($dbfile, $line, $field, $vals, $sawfields, $sawformats,
+        $knownfields) = @_;
+
+    tag "doc-base-file-unknown-field", "$dbfile:$line", "$field"
+        unless defined $knownfields->{$field};
+    tag "doc-base-file-duplicated-field", "$dbfile:$line", "$field"
+        if $sawfields->{$field};
+    $sawfields->{$field} = 1;
+
+    # Index/Files field.
+    #
+    # Check if files referenced by doc-base are included in the package.  The
+    # Index field should refer to only one file without wildcards.  The Files
+    # field is a whitespace-separated list of files and may contain wildcards.
+    # We skip without validating wildcard patterns containing character
+    # classes since otherwise we'd need to deal with wildcards inside
+    # character classes and aren't there yet.
+    if ($field eq 'index' or $field eq 'files') {
+        my @files = map { split ('\s+', $_) } @$vals;
+
+        if ($field eq 'index' && @files > 1) {
+            tag "doc-base-index-references-multiple-files", "$dbfile:$line";
+        }
+        for my $file (@files) {
+            if ($file =~ m%^/usr/doc%) {
+                tag "doc-base-file-references-usr-doc", "$dbfile:$line";
+            }
+            my $realfile = delink ($file);
+
+            # openoffice.org-dev-doc has thousands of files listed so try to
+            # use the hash if possible.
+            my $found;
+            if ($realfile =~ /[*?]/) {
+                my $regex = quotemeta ($realfile);
+                unless ($field eq 'index') {
+                    next if $regex =~ /\[/;
+                    $regex =~ s%\\\*%[^/]*%g;
+                    $regex =~ s%\\\?%[^/]%g;
+                    $regex .= '/?';
+                }
+                $found = grep { /^$regex\z/ } keys %all_files;
+            } else {
+                $found = $all_files{$realfile} || $all_files{"$realfile/"};
+            }
+            unless ($found) {
+                tag "doc-base-file-references-missing-file", "$dbfile:$line",
+                    $file;
+            }
+        }
+        undef @files;
+
+    # Format field.
+    } elsif ($field eq 'format') {
+        my $format = join (' ', @$vals);
+        $format =~ s/^\s+//o;
+        $format =~ s/\s+$//o;
+        $format = lc $format;
+        tag "doc-base-file-unknown-format", "$dbfile:$line", $format
+            unless $known_doc_base_formats{$format};
+        tag "doc-base-file-duplicated-format", "$dbfile:$line", $format
+            if $sawformats->{$format};
+        $sawformats->{$format} = 1;
+
+        # Save the current format for the later section check.
+        $sawformats->{' *current* '} = $format;
+
+    # Document field.
+    } elsif ($field eq 'document') {
+        $_ = join (' ', @$vals);
+
+        tag "doc-base-invalid-document-field", "$dbfile:$line", "$_"
+            unless /^[a-z0-9+.-]+$/;
+        tag "doc-base-document-field-ends-in-whitespace", "$dbfile:$line"
+            if /[ \t]$/;
+        tag "doc-base-document-field-not-in-first-line", "$dbfile:$line"
+            unless $line == 1;
+
+    # Title field.
+    } elsif ($field eq 'title') {
+        if (@$vals) {
+            spelling_check("spelling-error-in-doc-base-title-field",
+                           join (' ', @$vals), "$dbfile:$line");
+        }
+
+    # Abstract field.
+    } elsif ($field eq 'abstract') {
+        # The three following variables are used for checking if the field is
+        # correctly phrased.  We detect if each line (except for the first
+        # line and lines containing single dot) of the field starts with the
+        # same number of spaces, not followed by the same non-space character,
+        # and the number of spaces is > 1.
+        #
+        # We try to match fields like this:
+        #  ||Abstract: The Boost web site provides free peer-reviewed portable
+        #  ||  C++ source libraries.  The emphasis is on libraries which work
+        #  ||  well with the C++ Standard Library.  One goal is to establish
+        #
+        # but not like this:
+        #  ||Abstract:  This is "Ding"
+        #  ||  * a dictionary lookup program for Unix,
+        #  ||  * DIctionary Nice Grep,
+        my $leadsp    = undef; # string with leading spaces from second line
+        my $charafter = undef; # first non-whitespace char of second line
+        my $leadsp_ok = 1;     # are spaces OK?
+
+        # Intentionally skipping the first line.
+        for my $idx (1 .. $#{@$vals}) {
+            $_ = $vals->[$idx];
+            if (/manage\s+online\s+manuals\s.*Debian/o) {
+                tag "doc-base-abstract-field-is-template", "$dbfile:$line"
+                    unless $pkg eq "doc-base";
+            } elsif (/^(\s+)\.(\s*)$/o and $1 ne " " or $2) {
+                tag "doc-base-abstract-field-separator-extra-whitespaces",
+                    "$dbfile:" . ($line - $#{@$vals} + $idx);
+            } elsif (!$leadsp && /^(\s+)(\S)/o) {
+                # The regexp should always match.
+                ($leadsp, $charafter) = ($1, $2);
+                $leadsp_ok = $leadsp eq " ";
+            } elsif (!$leadsp_ok && /^(\s+)(\S)/o) {
+                # The regexp should always match.
+                undef $charafter if $charafter && $charafter ne $2;
+                $leadsp_ok = 1
+                    if ($1 ne $leadsp) || ($1 eq $leadsp && $charafter);
+            }
+        }
+        unless ($leadsp_ok) {
+            tag "doc-base-abstract-might-contain-extra-leading-whitespaces",
+                "$dbfile:$line";
+        }
+
+        # Check spelling.
+        if (@$vals) {
+            spelling_check("spelling-error-in-doc-base-abstract-field",
+                           join (' ', @$vals), "$dbfile:$line");
+        }
+    }
+}
+
+# Checks the section of the doc-base control file.  Tries to find required
+# fields missing in the section.
+sub check_doc_base_file_section {
+    my ($dbfile, $line, $sawfields, $sawformats, $knownfields) = @_;
+
+    tag "doc-base-file-no-format", "$dbfile:$line"
+        if ((defined $sawfields->{'files'} || defined $sawfields->{'index'})
+            && !(defined $sawfields->{'format'}));
+
+    # The current format is set by check_doc_base_field.
+    if ($sawfields->{'format'}) {
+        my $format =  $sawformats->{' *current* '};
+        tag "doc-base-file-no-index", "$dbfile:$line"
+            if ($format && ($format eq 'html' || $format eq 'info')
+                && !$sawfields->{'index'});
+    }
+    for my $field (sort keys %$knownfields) {
+        tag "doc-base-file-lacks-required-field", "$dbfile:$line", "$field"
+            if ($knownfields->{$field} == 1 && !$sawfields->{$field});
+    }
+}
+
 # Add file and link to %all_files and %all_links.  Note that both files and
 # links have to include a leading /.
 sub add_file_link_info {

Modified: trunk/checks/menus.desc
===================================================================
--- trunk/checks/menus.desc	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/menus.desc	2007-12-08 05:23:28 UTC (rev 1066)
@@ -161,7 +161,7 @@
 Info: The Index field in a doc-base file should reference the single index
  file for that document.  Any other files belonging to the same document
  should be listed in the Files field.
-Ref: Debian doc-base Manual section 2.3
+Ref: Debian doc-base Manual section 2.3.2.2
 
 Tag: doc-base-file-references-missing-file
 Type: error
@@ -175,7 +175,7 @@
 Info: The Format field in this doc-base control file declares a format
  that is not supported.  Recognized formats are "HTML", "Text", "PDF",
  "PostScript", "Info", "DVI", and "DebianDoc-SGML" (case-insensitive).
-Ref: Debian doc-base Manual section 2.3
+Ref: Debian doc-base Manual section 2.3.2.2
 
 Tag: doc-base-file-no-format
 Type: error
@@ -183,6 +183,12 @@
  format.  Each section after the first must specify a format.
 Ref: Debian doc-base Manual section 2.3.2.2
 
+Tag: doc-base-file-no-format-section
+Type: error
+Info: This doc-base control file didn't specify any format
+ section.
+Ref: Debian doc-base Manual section 2.3.2.2
+
 Tag: doc-base-file-no-index
 Type: error
 Info: Format sections in doc-base control files for HTML or Info documents
@@ -197,3 +203,86 @@
  doc-base (at least as of 0.8.5) cannot cope with such fields and
  debhelper 5.0.57 or earlier may create files ending in whitespace when
  installing such files.
+
+Tag: doc-base-document-field-not-in-first-line
+Type: error
+Info: The Document field in doc-base control file must be located at
+ first line of the file.  While unregistering documents, doc-base 0.8
+ and later parses only the first line of the control file for performance
+ reasons.
+Ref: Debian doc-base Manual section 2.3.2.1
+
+Tag: doc-base-file-unknown-field
+Type: error
+Info: The doc-base control file contains field which is either unknown
+ or not valid for the section where was found.  Possible reasons for this
+ error are: a typo in field name, missing empty line between control file
+ sections, or an extra empty line separating sections.
+Ref: Debian doc-base Manual sections 2.3.2.1 and 2.3.2.2
+
+Tag: doc-base-file-duplicated-field
+Type: error
+Info: The doc-base control file contains duplicated field.
+
+Tag: doc-base-file-duplicated-format
+Type: error
+Info: The doc-base control file contains a duplicated format.  Doc-base
+ files must not register different documents in one control file.
+Ref: Debian doc-base Manual section 2.3.2.2
+
+Tag: doc-base-file-lacks-required-field
+Type: error
+Info: The doc-base control file does not contain a required field for the
+ appropriate section.
+Ref: Debian doc-base Manual sections 2.3.2.1 and 2.3.2.2
+
+Tag: doc-base-invalid-document-field
+Type: error
+Info: The Document field should consists only of letters (a-z), digits
+ (0-9), plus (+) or minus (-) signs, and dots (.).  In particular,
+ uppercase letters are not allowed.
+Ref: Debian doc-base Manual section 2.2
+
+Tag: doc-base-abstract-field-is-template
+Type: warning
+Info: The Abstract field of doc-base contains a "manage online manuals"
+ phrase, which was copied verbatim from an example control file found in
+ section 2.3.1 of the Debian doc-base Manual.
+
+Tag: doc-base-abstract-might-contain-extra-leading-whitespaces
+Type: warning
+Info: Continuation lines of the Abstract field of doc-base control file
+ should start with only one space unless they are meant to be displayed
+ verbatim by fontends.
+Ref: Debian doc-base Manual section 2.3.2
+
+Tag: doc-base-abstract-field-separator-extra-whitespaces
+Type: warning
+Info: Unnecessary spaces were found in the paragraph separator line of the
+ doc-base's Abstract field.  The separator line should consist of a single
+ space followed by a single dot.
+Ref: Debian doc-base Manual section 2.3.2
+
+Tag: spelling-error-in-doc-base-title-field
+Type: error
+Info: Lintian found a spelling error in the Title field of this doc-base
+ control file.  Lintian has a list of common misspellings that it looks
+ for.  It does not have a dictionary like a spelling checker does.
+
+Tag: spelling-error-in-doc-base-abstract-field
+Type: error
+Info: Lintian found a spelling error in the Abstract field of this
+ doc-base control file.  Lintian has a list of common misspellings that
+ looks for.  It does not have a dictionary like a spelling checker does.
+
+Tag: doc-base-file-syntax-error
+Type: error
+Info: Lintian found a syntax error in the doc-base control file.
+Ref: Debian doc-base Manual section 2.3.2.2
+
+Tag: doc-base-file-separator-extra-whitespaces
+Type: warning
+Info: Unnecessary spaces were found in the doc-base file sections'
+ separator.  The section separator is an empty line and should not contain
+ any whitespace.
+Ref: Debian doc-base Manual section 2.3.2

Deleted: trunk/checks/spelling
===================================================================
--- trunk/checks/spelling	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/spelling	2007-12-08 05:23:28 UTC (rev 1066)
@@ -1,387 +0,0 @@
-# spelling -- lintian check script -*- perl -*-
-
-# Look for common spelling errors in the package description and the
-# copyright file.
-
-# Copyright (C) 1998 Richard Braakman
-#
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
-# (at your option) any later version.
-#
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program.  If not, you can find it on the World Wide
-# Web at http://www.gnu.org/copyleft/gpl.html, or write to the Free
-# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
-# MA 02110-1301, USA.
-
-package Lintian::spelling;
-use strict;
-use Tags;
-
-# All spelling errors that have been observed "in the wild" in package
-# descriptions are added here, on the grounds that if they occurred
-# once they are more likely to occur again.
-
-# Misspellings of "compatibility", "separate", and "similar" are 
-# particularly common.
-
-# Be careful with corrections that involve punctuation, since the check
-# is a bit rough with punctuation.  For example, I had to delete the
-# correction of "builtin" to "built-in".
-
-my %corrections = qw(
-		     accesnt accent
-		     accelleration acceleration
-		     accessable accessible
-		     accomodate accommodate
-		     acess access
-		     acording according
-		     additionaly additionally
-		     adress address
-		     adresses addresses
-		     adviced advised
-		     albumns albums
-		     alegorical allegorical
-		     algorith algorithm
-		     allpication application
-		     altough although
-		     alows allows
-		     amoung among
-		     amout amount
-		     analysator analyzer
-		     ang and
-		     appropiate appropriate
-		     arraival arrival
-		     artifical artificial
-		     artillary artillery
-		     attemps attempts
-		     authentification authentication
-		     automaticly automatically
-		     automatize automate
-		     automatized automated
-		     automatizes automates
-		     auxilliary auxiliary
-		     availavility availability
-		     availble available
-		     avaliable available
-		     availiable available
-		     backgroud background
-		     baloons balloons
-		     becomming becoming
-		     becuase because
-		     calender calendar
-		     cariage carriage
-		     challanges challenges
-		     changable changeable
-		     charachters characters
-		     charcter character
-		     choosen chosen
-		     colorfull colorful
-		     comand command
-		     commerical commercial
-		     comminucation communication
-		     commoditiy commodity
-		     compability compatibility
-		     compatability compatibility
-		     compatable compatible
-		     compatibiliy compatibility
-		     compatibilty compatibility
-		     compleatly completely
-		     complient compliant
-		     compres compress
-		     containes contains
-		     containts contains
-		     contence contents
-		     continous continuous
-		     contraints constraints
-		     convertor converter
-		     convinient convenient
-		     cryptocraphic cryptographic
-		     deamon daemon
-		     debain Debian
-		     debians Debian\'s
-		     decompres decompress
-		     definate definite
-		     definately definitely
-		     dependancies dependencies
-		     dependancy dependency
-		     dependant dependent
-		     developement development
-		     developped developed
-		     deveolpment development
-		     devided divided
-		     dictionnary dictionary
-		     diplay display
-		     disapeared disappeared
-		     dissapears disappears
-		     documentaion documentation
-		     docuentation documentation
-		     documantation documentation
-		     dont don\'t
-		     easilly easily
-		     ecspecially especially
-		     edditable editable
-		     editting editing
-		     eletronic electronic
-		     enchanced enhanced
-		     encorporating incorporating
-		     enlightnment enlightenment
-		     enterily entirely
-		     enviroiment environment
-		     environement environment
-		     excellant excellent
-		     exlcude exclude
-		     exprimental experimental
-		     extention extension
-		     failuer failure
-		     familar familiar
-		     fatser faster
-		     fetaures features
-		     forse force
-		     fortan fortran
-		     framwork framework
-		     fuction function
-		     fuctions functions
-		     functionnality functionality
-		     functonality functionality
-		     functionaly functionally
-		     futhermore furthermore
-		     generiously generously
-		     grahical graphical
-		     grahpical graphical
-		     grapic graphic
-		     guage gauge
-		     halfs halves
-		     heirarchically hierarchically
-		     helpfull helpful
-		     hierachy hierarchy
-		     hierarchie hierarchy
-		     howver however
-		     implemantation implementation
-		     incomming incoming
-		     incompatabilities incompatibilities
-		     indended intended
-		     indendation indentation
-		     independant independent
-		     informatiom information
-		     initalize initialize
-		     inofficial unofficial
-		     integreated integrated
-		     integrety integrity
-		     integrey integrity
-		     intendet intended
-		     interchangable interchangeable
-		     intermittant intermittent
-		     jave java
-		     langage language
-		     langauage language
-		     langugage language
-		     lauch launch
-		     lesstiff lesstif
-		     libaries libraries
-		     libary library
-		     licenceing licencing
-		     loggin login
-		     logile logfile
-		     loggging logging
-		     maintainance maintenance
-		     maintainence maintenance
-		     makeing making
-		     managable manageable
-		     manoeuvering maneuvering
-		     mathimatic mathematic
-		     mathimatics mathematics
-		     mathimatical mathematical
-		     ment meant
-		     modulues modules
-		     monochromo monochrome
-		     multidimensionnal multidimensional
-		     navagating navigating
-		     nead need
-		     neccesary necessary
-		     neccessary necessary
-		     necesary necessary
-		     nescessary necessary
-		     noticable noticeable
-		     optionnal optional
-		     orientatied orientated
-		     orientied oriented
-		     pacakge package
-		     pachage package
-		     packacge package
-		     packege package
-		     packge package
-		     pakage package
-		     particularily particularly
-		     persistant persistent
-		     plattform platform
-		     ploting plotting
-		     protable portable
-		     posible possible
-		     powerfull powerful
-		     prefered preferred
-		     prefferably preferably
-		     prepaired prepared
-		     princliple principle
-		     priorty priority
-		     proccesors processors
-		     proces process
-		     processsing processing
-		     processessing processing
-		     progams programs
-		     programers programmers
-		     programm program
-		     programms programs
-		     promps prompts
-		     pronnounced pronounced
-		     prononciation pronunciation
-		     pronouce pronounce
-		     protcol protocol
-		     protocoll protocol
-		     recieve receive
-		     recieved received
-		     redircet redirect
-		     regulamentations regulations
-		     remoote remote
-		     repectively respectively
-		     replacments replacements
-		     requiere require
-		     runnning running
-		     safly safely
-		     savable saveable
-		     searchs searches
-		     separatly separately
-		     seperate separate
-		     seperated separated
-		     seperately separately
-		     seperatly separately
-		     serveral several
-		     setts sets
-		     similiar similar
-		     simliar similar
-		     speach speech
-		     splitted split
-		     standart standard
-		     staically statically
-		     staticly statically
-		     succesful successful
-		     succesfully successfully
-		     suplied supplied
-		     suport support
-		     suppport support
-		     supportin supporting
-		     synchonized synchronized
-		     syncronize synchronize
-		     syncronizing synchronizing
-		     syncronus synchronous
-		     syste system
-		     sythesis synthesis
-		     taht that
-		     throught through
-		     useable usable
-		     usefull useful
-		     usera users
-		     usetnet Usenet
-		     utilites utilities
-		     utillities utilities
-		     utilties utilities
-		     utiltity utility
-		     utitlty utility
-		     variantions variations
-		     varient variant
-		     verson version
-		     vicefersa vice-versa
-		     yur your
-		     wheter whether
-		     wierd weird
-		     xwindows X
-		    );
-# The format above doesn't allow spaces
-$corrections{'alot'} = 'a lot';
-
-my %corrections_language_names = qw(
-				    english English
-				    french French
-				    german German
-				    russian Russian
-				   );
-
-sub run {
-
-my $pkg = shift;
-my $type = shift;
-
-# Read in entire files at one gulp.
-local $/ = undef;
-
-# Check defined(), because for some reason <CPY> returns the undefined
-# value if the file is length 0.
-
-if (open(DESC, '<', "fields/description")) {
-    my $description = <DESC>;
-    close(DESC);
-    spelling_check("spelling-error-in-description", $description)
-	if defined($description);
-}
-
-if (open(CPY, '<', "copyright")) {
-    my $copyright = <CPY>;
-    close(CPY);
-    spelling_check("spelling-error-in-copyright", $copyright)
-	if defined($copyright);
-}
-
-if (open(RMD, '<', "README.Debian")) {
-    my $readme = <RMD>;
-    close(RMD);
-    spelling_check("spelling-error-in-readme-debian", $readme)
-	if defined($readme);
-}
-
-#if (open(CHG, '<', "changelog.Debian")) {
-#    $changelog = <CHG>;
-#    close(CHG);
-#    spelling_check("spelling-error-in-debian-changelog", $changelog)
-#	if defined($changelog);
-#}
-
-}
-
-# -----------------------------------
-
-sub spelling_check {
-    my $tag = shift;
-    my $file = shift;
-
-    foreach my $word (split(/\s+/, $file)) {
-	# before lowercasing the word, check if it's a non-uppercased
-	# language name
-	if (exists $corrections_language_names{$word}) {
-	    tag($tag, $word, $corrections_language_names{$word});
-        }
-	$word = lc $word;
-	# try deleting the non-alphabetic parts from the word.
-	# Treat apostrophes specially: only delete them if they occur
-	# at the beginning or end of the word.
-	$word =~ s/(^\')|[^\w\xc0-\xd6\xd8-\xf6\xf8-\xff\']+|(\'$)//g;
-	if (exists $corrections{$word}) {
-	    tag($tag, $word, $corrections{$word});
-        }
-    }
-    # special case for correcting a multi-word string
-    # $corrections{'Debian/GNU Linux'} = 'Debian GNU/Linux';
-    if ($file =~ m,Debian/GNU Linux,) {
-	tag($tag, "Debian/GNU Linux", "Debian GNU/Linux");
-    }
-}
-
-1;
-
-# vim: syntax=perl

Deleted: trunk/checks/spelling.desc
===================================================================
--- trunk/checks/spelling.desc	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/checks/spelling.desc	2007-12-08 05:23:28 UTC (rev 1066)
@@ -1,25 +0,0 @@
-Check-Script: spelling
-Author: Richard Braakman <dark@xs4all.nl>
-Abbrev: spl
-Type: binary, udeb
-Unpack-Level: 1
-Info: This script looks for common spelling errors.
-Needs-Info: copyright-file, debian-readme
-
-Tag: spelling-error-in-description
-Type: error
-Info: Lintian found a spelling error in the package description.
- Lintian has a list of common misspellings that it looks for;
- it does not have a dictionary like a spelling checker does.
-
-Tag: spelling-error-in-copyright
-Type: error
-Info: Lintian found a spelling error in the copyright file.
- Lintian has a list of common misspellings that it looks for;
- it does not have a dictionary like a spelling checker does.
-
-Tag: spelling-error-in-readme-debian
-Type: error
-Info: Lintian found a spelling error in the README.Debian file.
- Lintian has a list of common misspellings that it looks for;
- it does not have a dictionary like a spelling checker does.

Modified: trunk/debian/changelog
===================================================================
--- trunk/debian/changelog	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/debian/changelog	2007-12-08 05:23:28 UTC (rev 1066)
@@ -2,11 +2,20 @@
 
   * checks/*.desc:
     + [RA] Remove the unused Standards-Version header.
+  * checks/changelog-file{.desc,}:
+    + [RA] Check the latest entry of the Debian changelog and any
+      NEWS.Debian file for common spelling errors.  (Closes: #36017)
+  * checks/copyright-file{.desc,}:
+    + [RA] Moved spelling-error-in-copyright check to here.
   * checks/debconf:
     + [RA] Go back to not warning about "no" in boolean debconf
       questions.  The word is too common in normal English prose for
       reasons other than assuming a particular debconf interface.  Thanks,
       Rafael Laboissiere.  (Closes: #453177)
+  * checks/debian-readme{.desc,}:
+    + [RA] Moved spelling-error-in-readme-debian check to here.
+  * checks/description{.desc,}:
+    + [RA] Moved spelling-error-in-description check to here.
   * checks/fields:
     + [RA] Python documentation packages should still be in section doc.
       Thanks, Michal Čihař.  (Closes: #454688)
@@ -28,12 +37,17 @@
       Ubuntu patch.
     + [RA] Fix the malformed-override long description.  Thanks, Stefan
       Fritsch.
+  * checks/menus{.desc,}:
+    + [RA] Substantial overhaul and expansion of the doc-base control file
+      checks.  Patch from Robert Luberda.  (Closes: #448783)
   * checks/nmu:
     + [RA] No packages with ubuntu in the version number are NMUs.  Merged
       from the Ubuntu patch.
   * checks/patch-systems:
     + [RA] Ignore blank lines in 00list and don't report them as patches
       without descriptions.  Thanks, Julien BLACHE.  (Closes: #454730)
+  * checks/spelling{.desc,}:
+    + [RA] Subsumed into other check scripts and lib/Spelling.pm.
 
   * frontend/lintian:
     + [RA] If the version number indicates an Ubuntu package, check
@@ -45,8 +59,13 @@
       optional again.  Thanks, Stefan Fritsch.  (Closes: #454790)
     + [RA] Check overrides for implausible tags.
 
- -- Russ Allbery <rra@debian.org>  Fri, 07 Dec 2007 16:22:42 -0800
+  * lib/Spelling.pm:
+    + [RA] New module to do general spelling checks for specific
+      misspellings.  Based on the previous checks/spelling and a patch by
+      Robert Luberda.
 
+ -- Russ Allbery <rra@debian.org>  Fri, 07 Dec 2007 21:22:55 -0800
+
 lintian (1.23.38) unstable; urgency=low
 
   * The "HE's brown paper bag bug" release

Added: trunk/lib/Spelling.pm
===================================================================
--- trunk/lib/Spelling.pm	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/lib/Spelling.pm	2007-12-08 05:23:28 UTC (rev 1066)
@@ -0,0 +1,367 @@
+# -*- perl -*-
+# Spelling -- check for common spelling errors
+
+# Copyright (C) 1998 Richard Braakman
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, you can find it on the World Wide
+# Web at http://www.gnu.org/copyleft/gpl.html, or write to the Free
+# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+package Spelling;
+use strict;
+use Tags;
+
+use Exporter;
+our @ISA = qw(Exporter);
+our @EXPORT = qw(spelling_check);
+
+# All spelling errors that have been observed "in the wild" in package
+# descriptions are added here, on the grounds that if they occurred once they
+# are more likely to occur again.
+
+# Misspellings of "compatibility", "separate", and "similar" are particularly
+# common.
+
+# Be careful with corrections that involve punctuation, since the check is a
+# bit rough with punctuation.  For example, I had to delete the correction of
+# "builtin" to "built-in".
+
+our %CORRECTIONS = qw(
+                      accesnt accent
+                      accelleration acceleration
+                      accessable accessible
+                      accomodate accommodate
+                      acess access
+                      acording according
+                      additionaly additionally
+                      adress address
+                      adresses addresses
+                      adviced advised
+                      albumns albums
+                      alegorical allegorical
+                      algorith algorithm
+                      allpication application
+                      altough although
+                      alows allows
+                      amoung among
+                      amout amount
+                      analysator analyzer
+                      ang and
+                      appropiate appropriate
+                      arraival arrival
+                      artifical artificial
+                      artillary artillery
+                      attemps attempts
+                      authentification authentication
+                      automaticly automatically
+                      automatize automate
+                      automatized automated
+                      automatizes automates
+                      auxilliary auxiliary
+                      availavility availability
+                      availble available
+                      avaliable available
+                      availiable available
+                      backgroud background
+                      baloons balloons
+                      becomming becoming
+                      becuase because
+                      calender calendar
+                      cariage carriage
+                      challanges challenges
+                      changable changeable
+                      charachters characters
+                      charcter character
+                      choosen chosen
+                      colorfull colorful
+                      comand command
+                      commerical commercial
+                      comminucation communication
+                      commoditiy commodity
+                      compability compatibility
+                      compatability compatibility
+                      compatable compatible
+                      compatibiliy compatibility
+                      compatibilty compatibility
+                      compleatly completely
+                      complient compliant
+                      compres compress
+                      containes contains
+                      containts contains
+                      contence contents
+                      continous continuous
+                      contraints constraints
+                      convertor converter
+                      convinient convenient
+                      cryptocraphic cryptographic
+                      deamon daemon
+                      debain Debian
+                      debians Debian\'s
+                      decompres decompress
+                      definate definite
+                      definately definitely
+                      dependancies dependencies
+                      dependancy dependency
+                      dependant dependent
+                      developement development
+                      developped developed
+                      deveolpment development
+                      devided divided
+                      dictionnary dictionary
+                      diplay display
+                      disapeared disappeared
+                      dissapears disappears
+                      documentaion documentation
+                      docuentation documentation
+                      documantation documentation
+                      dont don\'t
+                      easilly easily
+                      ecspecially especially
+                      edditable editable
+                      editting editing
+                      eletronic electronic
+                      enchanced enhanced
+                      encorporating incorporating
+                      enlightnment enlightenment
+                      enterily entirely
+                      enviroiment environment
+                      environement environment
+                      excellant excellent
+                      exlcude exclude
+                      exprimental experimental
+                      extention extension
+                      failuer failure
+                      familar familiar
+                      fatser faster
+                      fetaures features
+                      forse force
+                      fortan fortran
+                      framwork framework
+                      fuction function
+                      fuctions functions
+                      functionnality functionality
+                      functonality functionality
+                      functionaly functionally
+                      futhermore furthermore
+                      generiously generously
+                      grahical graphical
+                      grahpical graphical
+                      grapic graphic
+                      guage gauge
+                      halfs halves
+                      heirarchically hierarchically
+                      helpfull helpful
+                      hierachy hierarchy
+                      hierarchie hierarchy
+                      howver however
+                      implemantation implementation
+                      incomming incoming
+                      incompatabilities incompatibilities
+                      indended intended
+                      indendation indentation
+                      independant independent
+                      informatiom information
+                      initalize initialize
+                      inofficial unofficial
+                      integreated integrated
+                      integrety integrity
+                      integrey integrity
+                      intendet intended
+                      interchangable interchangeable
+                      intermittant intermittent
+                      jave java
+                      langage language
+                      langauage language
+                      langugage language
+                      lauch launch
+                      lesstiff lesstif
+                      libaries libraries
+                      libary library
+                      licenceing licencing
+                      loggin login
+                      logile logfile
+                      loggging logging
+                      maintainance maintenance
+                      maintainence maintenance
+                      makeing making
+                      managable manageable
+                      manoeuvering maneuvering
+                      mathimatic mathematic
+                      mathimatics mathematics
+                      mathimatical mathematical
+                      ment meant
+                      modulues modules
+                      monochromo monochrome
+                      multidimensionnal multidimensional
+                      navagating navigating
+                      nead need
+                      neccesary necessary
+                      neccessary necessary
+                      necesary necessary
+                      nescessary necessary
+                      noticable noticeable
+                      optionnal optional
+                      orientatied orientated
+                      orientied oriented
+                      pacakge package
+                      pachage package
+                      packacge package
+                      packege package
+                      packge package
+                      pakage package
+                      particularily particularly
+                      persistant persistent
+                      plattform platform
+                      ploting plotting
+                      protable portable
+                      posible possible
+                      powerfull powerful
+                      prefered preferred
+                      prefferably preferably
+                      prepaired prepared
+                      princliple principle
+                      priorty priority
+                      proccesors processors
+                      proces process
+                      processsing processing
+                      processessing processing
+                      progams programs
+                      programers programmers
+                      programm program
+                      programms programs
+                      promps prompts
+                      pronnounced pronounced
+                      prononciation pronunciation
+                      pronouce pronounce
+                      protcol protocol
+                      protocoll protocol
+                      recieve receive
+                      recieved received
+                      redircet redirect
+                      regulamentations regulations
+                      remoote remote
+                      repectively respectively
+                      replacments replacements
+                      requiere require
+                      runnning running
+                      safly safely
+                      savable saveable
+                      searchs searches
+                      separatly separately
+                      seperate separate
+                      seperated separated
+                      seperately separately
+                      seperatly separately
+                      serveral several
+                      setts sets
+                      similiar similar
+                      simliar similar
+                      speach speech
+                      splitted split
+                      standart standard
+                      staically statically
+                      staticly statically
+                      succesful successful
+                      succesfully successfully
+                      suplied supplied
+                      suport support
+                      suppport support
+                      supportin supporting
+                      synchonized synchronized
+                      syncronize synchronize
+                      syncronizing synchronizing
+                      syncronus synchronous
+                      syste system
+                      sythesis synthesis
+                      taht that
+                      throught through
+                      useable usable
+                      usefull useful
+                      usera users
+                      usetnet Usenet
+                      utilites utilities
+                      utillities utilities
+                      utilties utilities
+                      utiltity utility
+                      utitlty utility
+                      variantions variations
+                      varient variant
+                      verson version
+                      vicefersa vice-versa
+                      yur your
+                      wheter whether
+                      wierd weird
+                      xwindows X
+                     );
+
+# The format above doesn't allow spaces.
+$CORRECTIONS{'alot'} = 'a lot';
+
+# Corrections to apply before lowercasing the word.  Be careful about adding
+# things to this list, since currently there's no detection of literal text
+# and one might get false positives on, for example, configuration fragments
+# in README.Debian.
+our %CORRECTIONS_CASE = qw(
+                           english English
+                           french French
+                           german German
+                           russian Russian
+                          );
+
+# -----------------------------------
+
+sub _tag {
+    my @args = grep { defined($_) } @_;
+    tag(@args);
+}
+
+# Check spelling of $text and report the tag $tag if we find anything.
+# $filename, if included, is given as the first argument to the tag.  If it's
+# not defined, it will be omitted.
+sub spelling_check {
+    my ($tag, $text, $filename) = @_;
+
+    for my $word (split(/\s+/, $text)) {
+        if (exists $CORRECTIONS_CASE{$word}) {
+            _tag($tag, $filename, $word, $CORRECTIONS_CASE{$word});
+            next;
+        }
+        $word = lc $word;
+
+        # Try deleting the non-alphabetic parts from the word.  Treat
+        # apostrophes specially: only delete them if they occur at the
+        # beginning or end of the word.
+        #
+        # FIXME: Should do something that's aware of Unicode character
+        # classes rather than only handling ISO 8859-15 characters.
+        $word =~ s/(^\')|[^\w\xc0-\xd6\xd8-\xf6\xf8-\xff\']+|(\'$)//g;
+        if (exists $CORRECTIONS{$word}) {
+            _tag($tag, $filename, $word, $CORRECTIONS{$word});
+        }
+    }
+
+    # Special case for correcting a multi-word string.
+    if ($text =~ m,Debian/GNU Linux,) {
+        _tag($tag, $filename, "Debian/GNU Linux", "Debian GNU/Linux");
+    }
+}
+
+1;
+
+# Local Variables:
+# indent-tabs-mode: nil
+# cperl-indent-level: 4
+# End:
+# vim: syntax=perl sw=4 sts=4 ts=4 et shiftround

Modified: trunk/testset/binary/debian/NEWS.Debian
===================================================================
--- trunk/testset/binary/debian/NEWS.Debian	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/testset/binary/debian/NEWS.Debian	2007-12-08 05:23:28 UTC (rev 1066)
@@ -1,6 +1,7 @@
 binary (4-1.1) UNRELEASED; urgency=low
 
   This is a Debian NEWS entry that isn't encoded properly in UTF-8: �It also has a usefull speling error.
 
  -- Russ Allbery <rra@debian.org>  Sun, 14 Oct 2007 17:11:36 -0700
 

Modified: trunk/testset/binary/debian/doc-base
===================================================================
--- trunk/testset/binary/debian/doc-base	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/testset/binary/debian/doc-base	2007-12-08 05:23:28 UTC (rev 1066)
@@ -2,8 +2,8 @@
 Title: Broken binary doc-base control file
 Author: Russ Allbery
 Abstract: This control file exercises various tests of doc-base control
- files, including several things that aren't tested yet.  The third and
- fourth one has trailing whitespace.
+  files, including several things that aren't tested yet.  The third and
+  fourth one has trailing whitespace.
 Section: Non/Existant
 Unknown: Some field
 

Modified: trunk/testset/tags.binary
===================================================================
--- trunk/testset/tags.binary	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/testset/tags.binary	2007-12-08 05:23:28 UTC (rev 1066)
@@ -5,14 +5,22 @@
 E: binary: depends-on-x-metapackage depends: xorg
 E: binary: desktop-entry-missing-required-key /usr/share/applications/goodbye.desktop Name
 E: binary: doc-base-document-field-ends-in-whitespace binary:1
+E: binary: doc-base-file-duplicated-format binary:32 html
+E: binary: doc-base-file-lacks-required-field binary:18 files
+E: binary: doc-base-file-lacks-required-field binary:22 files
+E: binary: doc-base-file-lacks-required-field binary:22 format
 E: binary: doc-base-file-no-format binary:22
+E: binary: doc-base-file-no-format-section space :0
 E: binary: doc-base-file-no-index binary:31
 E: binary: doc-base-file-references-missing-file binary:13 /usr/share/doc/binary/binary.sgml.gz
 E: binary: doc-base-file-references-missing-file binary:17 /usr/share/doc/binary/binary.txt
 E: binary: doc-base-file-references-missing-file binary:21 /usr/share/doc/binary/html/ch4.html
 E: binary: doc-base-file-references-missing-file binary:27 /usr/share/doc/binary/hml/*.html
 E: binary: doc-base-file-references-missing-file binary:30 /usr/share/info/binary.info.gz
+E: binary: doc-base-file-unknown-field binary:14 unknown
+E: binary: doc-base-file-unknown-field binary:8 unknown
 E: binary: doc-base-index-references-multiple-files binary:21
+E: binary: doc-base-invalid-document-field binary:1 binary!docs  
 E: binary: executable-desktop-file /usr/share/applications/goodbye.desktop 0755
 E: binary: file-directly-in-usr-share usr/share/baz
 E: binary: lengthy-symlink usr/share/doc/binary/html/ch2.html ../html/./ch1.html
@@ -62,6 +70,8 @@
 W: binary: desktop-entry-contains-unknown-key /usr/share/applications/goodbye.desktop:7 icon
 W: binary: desktop-entry-invalid-category WeirdStuff /usr/share/applications/goodbye.desktop
 W: binary: desktop-entry-uses-reserved-category Screensaver /usr/share/applications/goodbye.desktop
+W: binary: doc-base-abstract-field-separator-extra-whitespaces binary:5
+W: binary: doc-base-abstract-field-separator-extra-whitespaces binary:6
 W: binary: doc-base-file-unknown-format binary:16 esp
 W: binary: executable-not-elf-or-script ./usr/bin/iminusrbin
 W: binary: executable-not-elf-or-script ./usr/share/applications/goodbye.desktop
@@ -93,12 +103,13 @@
 W: binary: old-fsf-address-in-copyright-file
 W: binary: package-contains-hardlink usr/bar2 -> usr/share/baz
 W: binary: package-contains-upstream-install-documentation usr/share/doc/binary/INSTALL
+W: binary: spelling-error-in-news-debian usefull useful
 W: binary: su-to-root-with-usr-sbin /usr/lib/menu/binary:4
 W: binary: su-to-root-with-usr-sbin /usr/share/menu/binary:4
 W: binary: symlink-should-be-relative usr/share/doc/binary/html/ch3.html /usr/share/doc/binary/htm/ch1.html
 W: binary: syntax-error-in-debian-changelog line 16 "couldn't parse date The, 15 Apr 2004 23:33:51 +0200"
-W: binary: syntax-error-in-debian-news-file line 11 "badly formatted trailer line"
-W: binary: syntax-error-in-debian-news-file line 11 "found eof where expected more change data or trailer"
+W: binary: syntax-error-in-debian-news-file line 12 "badly formatted trailer line"
+W: binary: syntax-error-in-debian-news-file line 12 "found eof where expected more change data or trailer"
 W: binary: unquoted-string-in-menu-item /usr/lib/menu/binary needs:1
 W: binary: unquoted-string-in-menu-item /usr/lib/menu/binary needs:2
 W: binary: unquoted-string-in-menu-item /usr/share/menu/binary needs:1

Modified: trunk/testset/tags.manpages
===================================================================
--- trunk/testset/tags.manpages	2007-12-08 00:46:55 UTC (rev 1065)
+++ trunk/testset/tags.manpages	2007-12-08 05:23:28 UTC (rev 1066)
@@ -59,3 +59,4 @@
 W: manpages: manpage-has-bad-whatis-entry usr/share/man/man6/usr-games-binary.6.gz
 W: manpages: manpage-has-useless-whatis-entry usr/share/man/man1/true.1.gz
 W: manpages: package-contains-empty-directory usr/share/man/man1/not-a-man-page.1.gz/
+W: manpages: spelling-error-in-changelog english English
Reply to:
Prev by Date: Processed: severity of 454238 is wishlist ...
Next by Date: Processed: tagging 36017
Previous by thread: Processed: severity of 454238 is wishlist ...
Next by thread: Processed: tagging 36017
Index(es):
- Date
- Thread