Bug#448783: marked as done (lintian: more doc-base checks)

To: Russ Allbery <rra@debian.org>
Subject: Bug#448783: marked as done (lintian: more doc-base checks)
From: owner@bugs.debian.org (Debian Bug Tracking System)
Date: Sat, 08 Dec 2007 06:51:07 +0000
Message-id: <[🔎] handler.448783.D448783.11970966281079.ackdone@bugs.debian.org>
References: <E1J0tEJ-0005vQ-4r@ries.debian.org> <20071031221053.GA10764@vox.robbo.home>

Your message dated Sat, 08 Dec 2007 06:32:03 +0000
with message-id <E1J0tEJ-0005vQ-4r@ries.debian.org>
and subject line Bug#448783: fixed in lintian 1.23.39
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: lintian: more doc-base checks
From: Robert Luberda <robert@debian.org>
Date: Wed, 31 Oct 2007 23:10:53 +0100
Message-id: <20071031221053.GA10764@vox.robbo.home>

Package: lintian
Version: 1.23.36
Severity: wishlist
Tags: patch

Hi,

I prepared patch which makes lintian more robust about contents 
of doc-base control files. The following obvious checks are included:
- missing required fields,
- unrecognised fields,
- spelling errors,
- duplicated fields or formats.

The patch also adds check for possible incorrectness of the
continuation lines in the Abstract field. Many packages put 
extra spaces in the front of the lines, causing the field 
to be incorrectly displayed verbatim by dwww or dhelp. The 
check is based on heuristic, but it seems it's correct (i.e.
there're no false positives).

Additionally I implemented checks for invalid characters in
the Document field and unnecessary spaces in separator lines.
The checks might be rather controversial since many control
files fail on them.  (Especially the "invalid characters" check
makes me think about allowing uppercase letters in the Document 
field.)

Please find the patch attached to this mail. The spelling_common.pm 
module was split out from the lintian/spelling check, the major
difference is for the additional argument of spelling_check() routine.

To check the changes I run both unpatched and patched `lintian -C menus' 
on almost all packages containing at least one doc-base file, and didn't 
find any errors.I put the logs at 
http://people.debian.org/~robert/lintian-doc-base-logs.tar.bz2 .

I would be grateful if you could apply the patch for lintian.

Best Regards,
robert



-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (990, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.22
Locale: LANG=pl_PL, LC_CTYPE=pl_PL (charmap=ISO-8859-2)
Shell: /bin/sh linked to /bin/pdksh

Versions of packages lintian depends on:
ii  binutils            2.18.1~cvs20071027-1 The GNU assembler, linker and bina
ii  diffstat            1.45-2               produces graph of changes introduc
ii  dpkg-dev            1.14.7               package building tools for Debian
ii  file                4.21-3               Determines file type using "magic"
ii  gettext             0.16.1-2             GNU Internationalization utilities
ii  intltool-debian     0.35.0+20060710.1    Help i18n of RFC822 compliant conf
ii  libparse-debianchan 1.1.1-1              parse Debian changelogs and output
ii  man-db              2.5.0-3              on-line manual pager
ii  perl [libdigest-md5 5.8.8-11.1           Larry Wall's Practical Extraction 

lintian recommends no packages.

-- no debconf information

diff -Nur checks.old/menus checks/menus
--- checks.old/menus	2007-10-16 05:41:15.000000000 +0200
+++ checks/menus	2007-10-30 19:54:28.000000000 +0100
@@ -24,6 +24,7 @@
 use strict;
 use lib "$ENV{'LINTIAN_ROOT'}/checks/";
 use common_data;
+use spelling_common;
 use Tags;
 use Util;
 
@@ -31,6 +32,21 @@
 my %all_files = ();
 my %all_links = ();
 
+
+my %known_docbase_main_fields = ( 
+	'document' => 1,
+	'title'    => 1,
+	'section'  => 1,
+	'abstract' => 0,
+	'author'   => 0
+);
+my %known_docbase_format_fields = (
+	'format'  => 1,
+	'files'   => 1,
+	'index'   => 0
+);	
+
+
 sub run {
 
 $pkg = shift;
@@ -163,94 +179,7 @@
     while (my $dbfile = readdir DOCBASEDIR) {
 	# don't try to parse executables, plus we already warned about it
 	next if -x "doc-base/$dbfile";
-	open (IN, '<', "doc-base/$dbfile") or
-	    fail("cannot open doc-base file $dbfile for reading.");
-
-	# Check if files referenced by doc-base are included in the package.
-	# The Index field should refer to only one file without wildcards.
-	# The Files field is a whitespace-separated list of files and may
-	# contain wildcards.  We skip without validating wildcard patterns
-	# containing character classes since otherwise we'd need to deal with
-	# wildcards inside character classes and aren't there yet.
-	#
-	# Defer checking files until we've read all possible continuation
-	# lines for the field.	As a result, all tags will be reported on the
-	# last continuation line of the field, rather than possibly where the
-	# offending file name is.
-	my (@files, $field, $sawindex, $sawdocument, $format, $insection);
-	while (1) {
-	    $_ = <IN>;
-	    if ((!defined ($_) || /^\S/ || /^$/) && $field) {
-		# Figure out the right line number.  It's actually the
-		# previous line, since we read ahead for continuation lines,
-		# unless we're at the end of the file.
-		my $line = $. - 1 + (defined ($_) ? 0 : 1);
-		if ($field eq 'index' && @files > 1) {
-		    tag "doc-base-index-references-multiple-files", "$dbfile:$line";
-		}
-		for my $file (@files) {
-		    if ($file =~ m%^/usr/doc%) {
-			tag "doc-base-file-references-usr-doc", "$dbfile:$line";
-		    }
-		    my $realfile = delink ($file);
-
-		    # openoffice.org-dev-doc has thousands of files listed so
-		    # try to use the hash if possible.
-		    my $found;
-		    if ($realfile =~ /[*?]/) {
-			my $regex = quotemeta ($realfile);
-			unless ($field eq 'index') {
-			    next if $regex =~ /\[/;
-			    $regex =~ s%\\\*%[^/]*%g;
-			    $regex =~ s%\\\?%[^/]%g;
-			    $regex .= '/?';
-			}
-			$found = grep { /^$regex\z/ } keys %all_files;
-		    } else {
-			$found = $all_files{$realfile} || $all_files{"$realfile/"};
-		    }
-		    unless ($found) {
-			tag "doc-base-file-references-missing-file", "$dbfile:$line", $file;
-		    }
-		}
-		undef @files;
-		undef $field;
-	    }
-	    if (defined ($_) && /^(Index|Files)\s*:\s*(.*?)\s*$/i) {
-		$field = lc $1;
-		@files = split (' ', $2);
-		if ($field eq 'index') {
-		    $sawindex = 1;
-		}
-	    } elsif (defined ($_) && /^Format\s*:\s*(.*?)\s*$/i) {
-		$format = lc $1;
-		tag "doc-base-file-unknown-format", "$dbfile:$.", $format
-		    unless $known_doc_base_formats{$format};
-	    } elsif (defined ($_) && /^Document\s*:/i) {
-		$sawdocument = 1;
-                tag "doc-base-document-field-ends-in-whitespace", "$dbfile:$."
-                    if /[ \t]$/;
-	    } elsif (defined ($_) && /^\s/ && $field) {
-		push (@files, split ' ');
-	    }
-	    if (defined ($_) && /^\s*\S/) {
-		$insection = 1;
-	    }
-	    if (!defined ($_) || /^$/) {
-		tag "doc-base-file-no-format", "$dbfile:$."
-		    if ($insection && !($format || $sawdocument));
-		if ($format && ($format eq 'html' || $format eq 'info')) {
-		    tag "doc-base-file-no-index", "$dbfile:$."
-			unless $sawindex;
-		}
-		last unless defined $_;
-		undef $format;
-		undef $sawdocument;
-		undef $sawindex;
-		undef $insection;
-	    }
-	}
-	close IN;
+	check_doc_base_file($dbfile);
     }
     closedir DOCBASEDIR;
 } else {
@@ -285,6 +214,233 @@
 }
 
 # -----------------------------------
+#
+
+
+sub check_doc_base_file {
+  my $dbfile = shift;
+
+  open (IN, '<', "doc-base/$dbfile") or
+    fail("cannot open doc-base file $dbfile for reading.");
+
+  my (@files, $field, @vals, %sawfields, %sawformats);
+  my $knownfields=\%known_docbase_main_fields;
+  my $line    = 0;  # global
+  %sawfields  = (); # local for each section of control file
+  %sawformats = (); # global for control file
+
+  while (<IN>) {
+    chomp();
+
+    if (/^(\S+)\s*:\s*(.*)$/) { # new field
+      # check previous field, if we have any
+      check_doc_base_field($dbfile, $line, $field, \@vals, \%sawfields, \%sawformats, $knownfields)
+        if $field;
+
+      $field  = lc $1;
+      @vals   = ($2);
+      $line   = $.;
+
+    } elsif ($field && /^\s+\S/) { # continuation of previously defined field
+      push (@vals, $_);
+      $line  = $.;    # all tags will be reported on the last continuation line
+                      # of doc-base field
+
+
+    } elsif (/^(\s*)$/) { # sections' separator
+      tag "doc-base-file-separator-extra-whitespaces", "$dbfile:$." if $1;
+
+      next unless $field; # skip successive empty lines
+
+      # check previously defined field & section
+      check_doc_base_field($dbfile, $line, $field, \@vals, \%sawfields, \%sawformats, $knownfields);
+      check_doc_base_file_section($dbfile, $line+1, \%sawfields, \%sawformats, $knownfields);
+
+      # intialise variables for new section
+      undef $field;
+      undef $line;
+      @vals       = ();
+      %sawfields  = ();
+      $knownfields=\%known_docbase_format_fields; # each section except the first one is format section
+
+    } else {  # everything else is a syntax error
+      tag "doc-base-file-syntax-error", "$dbfile:$.";
+    }
+  }
+
+  # check the last field/section of the control file
+  if ($field) {
+    check_doc_base_field($dbfile, $line, $field, \@vals, \%sawfields, \%sawformats, $knownfields);
+    check_doc_base_file_section($dbfile, $line, \%sawfields, \%sawformats, $knownfields);
+  }
+
+  tag "doc-base-file-no-format-section", "$dbfile:$." unless %sawformats;
+
+  close IN;
+}
+
+
+# Checks one field of doc-base control file
+# $vals is array ref containing all lines of the field
+# Modifies $sawfields and $sawformats
+sub check_doc_base_field {
+  my ($dbfile, $line, $field, $vals, $sawfields, $sawformats, $knownfields) = @_;
+
+
+
+  tag "doc-base-file-unknown-field", "$dbfile:$line", "$field"
+    unless defined $knownfields->{$field};
+  tag "doc-base-file-duplicated-field", "$dbfile:$line", "$field"
+    if $sawfields->{$field};
+  $sawfields->{$field} = 1;
+
+# Index/Files field
+  if ($field eq 'index' or $field eq 'files') {
+    # Check if files referenced by doc-base are included in the package.
+    # The Index field should refer to only one file without wildcards.
+    # The Files field is a whitespace-separated list of files and may
+    # contain wildcards.  We skip without validating wildcard patterns
+    # containing character classes since otherwise we'd need to deal with
+    # wildcards inside character classes and aren't there yet.
+
+    my @files = map { split ('\s+', $_) } @$vals;
+
+    if ($field eq 'index' && @files > 1) {
+      tag "doc-base-index-references-multiple-files", "$dbfile:$line";
+    }
+    for my $file (@files) {
+      if ($file =~ m%^/usr/doc%) {
+        tag "doc-base-file-references-usr-doc", "$dbfile:$line";
+      }
+      my $realfile = delink ($file);
+
+      # openoffice.org-dev-doc has thousands of files listed so
+      # try to use the hash if possible.
+      my $found;
+      if ($realfile =~ /[*?]/) {
+        my $regex = quotemeta ($realfile);
+        unless ($field eq 'index') {
+          next if $regex =~ /\[/;
+          $regex =~ s%\\\*%[^/]*%g;
+          $regex =~ s%\\\?%[^/]%g;
+          $regex .= '/?';
+        }
+        $found = grep { /^$regex\z/ } keys %all_files;
+      } else {
+        $found = $all_files{$realfile} || $all_files{"$realfile/"};
+      }
+      unless ($found) {
+        tag "doc-base-file-references-missing-file", "$dbfile:$line", $file;
+      }
+    }
+   undef @files;
+
+# Format field
+  } elsif ($field eq 'format') {
+    my $format = join (' ', @$vals);
+    $format =~ s/^\s+//o;
+    $format =~ s/\s+$//o;
+    $format = lc $format;
+
+    tag "doc-base-file-unknown-format", "$dbfile:$line", $format
+      unless $known_doc_base_formats{$format};
+    tag "doc-base-file-duplicated-format", "$dbfile:$line", $format
+      if $sawformats->{$format};
+    $sawformats->{$format} = 1;
+    # save the current format for the later section check
+    $sawformats->{' *current* '} = $format;
+
+# Document field
+  } elsif ($field eq 'document') {
+    $_ = join (' ', @$vals);
+
+    tag "doc-base-invalid-document-field", "$dbfile:$line", "$_"
+      unless /^[a-z0-9+.-]+$/;
+    tag "doc-base-document-field-ends-in-whitespace", "$dbfile:$line"
+      if /[ \t]$/;
+    tag "doc-base-document-field-not-in-first-line", "$dbfile:$line"
+      unless $line == 1;
+
+# Title field
+  } elsif ($field eq 'title') {
+
+    spelling_check("spelling-error-in-doc-base-title-field", join (' ', @$vals), "$dbfile:$line")
+      if @$vals;
+
+# Abstract field
+  } elsif ($field eq 'abstract') {
+
+
+    # The three following variables are used for checking if the field is correctly phrased.
+    # We detect if each line (except for the first line and lines containing single dot)
+    # of the field starts with the same number of spaces, not followed by the same non-space
+    # character, and the number of spaces is > 1.
+    #
+    # We try to match fields like this:
+    #  ||Abstract: The Boost web site provides free peer-reviewed portable
+    #  ||  C++ source libraries.  The emphasis is on libraries which work
+    #  ||  well with the C++ Standard Library.  One goal is to establish
+    # but not like this:
+    #  ||Abstract:  This is "Ding"
+    #  ||  * a dictionary lookup program for Unix,
+    #  ||  * DIctionary Nice Grep,
+    my $leadsp           = undef; # string with leading spaces from the second line
+    my $charafter        = undef; # first non-whitespace char of the second line
+    my $leadsp_different = 1;     # are spaces OK?
+
+    for my $idx (1 .. $#{@$vals}) { # intentionally skipping the first line
+      $_ = $vals->[$idx];
+      if (/manage\s+online\s+manuals\s.*Debian/o) {
+        tag "doc-base-abstract-field-is-template", "$dbfile:$line" unless $pkg eq "doc-base";
+
+      } elsif (/^(\s+)\.(\s*)$/o) {
+        tag "doc-base-abstract-field-separator-extra-whitespaces", "$dbfile:" . ($line - $#{@$vals} + $idx)
+          if $1 ne " " || $2;
+
+      } elsif (!$leadsp && /^(\s+)(\S)/o) { # the regexp should always match
+        ($leadsp, $charafter) = ($1, $2);
+        $leadsp_different     = $leadsp eq " ";
+
+      } elsif (!$leadsp_different && /^(\s+)(\S)/o) { # the regexp should always match
+      	undef $charafter if $charafter && $charafter ne $2;
+        $leadsp_different     = 1 if ($1 ne $leadsp)
+                                    or ($1 eq $leadsp  && $charafter);
+      }
+    }
+    tag "doc-base-abstract-might-contain-extra-leading-whitespaces", "$dbfile:$line"
+      unless $leadsp_different;
+
+    spelling_check("spelling-error-in-doc-base-abstract-field", join (' ', @$vals), "$dbfile:$line")
+      if @$vals;
+
+ }
+}
+
+
+# Checks section of doc-base control file
+# Tries to find required fields missing in the section
+sub check_doc_base_file_section {
+  my ($dbfile, $line, $sawfields, $sawformats, $knownfields) = @_;
+
+  tag "doc-base-file-no-format", "$dbfile:$line"
+    if (defined $sawfields->{'files'} || defined $sawfields->{'index'})
+      && ! (defined $sawfields->{'format'});
+
+  if ($sawfields->{'format'}) {
+    my $format =  $sawformats->{' *current* '}; # set by check_doc_base_field
+
+    tag "doc-base-file-no-index", "$dbfile:$line"
+      if $format && ($format eq 'html' || $format eq 'info')
+         && !$sawfields->{'index'};
+  }
+
+  map { tag "doc-base-file-lacks-required-field", "$dbfile:$line", "$_"
+      if $knownfields->{$_} == 1 && !$sawfields->{$_}
+    } sort (keys %$knownfields);
+}
+
+
+
 
 # Add file and link to %all_files and %all_links.  Note that both files and
 # links have to include a leading /.
diff -Nur checks.old/menus.desc checks/menus.desc
--- checks.old/menus.desc	2007-10-15 06:14:24.000000000 +0200
+++ checks/menus.desc	2007-10-30 19:52:30.000000000 +0100
@@ -162,7 +162,7 @@
 Info: The Index field in a doc-base file should reference the single index
  file for that document.  Any other files belonging to the same document
  should be listed in the Files field.
-Ref: Debian doc-base Manual section 2.3
+Ref: Debian doc-base Manual section 2.3.2.2
 
 Tag: doc-base-file-references-missing-file
 Type: error
@@ -176,7 +176,7 @@
 Info: The Format field in this doc-base control file declares a format
  that is not supported.  Recognized formats are "HTML", "Text", "PDF",
  "PostScript", "Info", "DVI", and "DebianDoc-SGML" (case-insensitive).
-Ref: Debian doc-base Manual section 2.3
+Ref: Debian doc-base Manual section 2.3.2.2
 
 Tag: doc-base-file-no-format
 Type: error
@@ -184,6 +184,12 @@
  format.  Each section after the first must specify a format.
 Ref: Debian doc-base Manual section 2.3.2.2
 
+Tag: doc-base-file-no-format-section
+Type: error
+Info: This doc-base control file didn't specify any format
+ section.
+Ref: Debian doc-base Manual section 2.3.2.2
+
 Tag: doc-base-file-no-index
 Type: error
 Info: Format sections in doc-base control files for HTML or Info documents
@@ -198,3 +204,85 @@
  doc-base (at least as of 0.8.5) cannot cope with such fields and
  debhelper 5.0.57 or earlier may create files ending in whitespace when
  installing such files.
+
+Tag: doc-base-document-field-not-in-first-line
+Type: error
+Info: The Document field in doc-base control file must be located at
+ first line of the file.  While unregistering documents, doc-base 0.8
+ and later parses only the first line of the control file for performance
+ reason.
+Ref: Debian doc-base Manual section 2.3.2.1
+
+Tag: doc-base-file-unknown-field
+Type: error
+Info: The doc-base control file contains field which is either unknown
+ or not valid for the section where was found.  Possible reasons for this
+ error are: a typo in field name, missing empty line between control file
+ sections, or an extra empty line separating sections.
+Ref: Debian doc-base Manual sections 2.3.2.1 and 2.3.2.2
+
+Tag: doc-base-file-duplicated-field
+Type: error
+Info: The doc-base control file contains duplicated field.
+
+Tag: doc-base-file-duplicated-format
+Type: error
+Info: The doc-base control file contains duplicated format.
+ Doc-base files must not register different documents in
+ one control file.
+Ref: Debian doc-base Manual section 2.3.2.2
+
+Tag: doc-base-file-lacks-required-field
+Type: error
+Info: The doc-base control file does not contain required field for
+ the appropriate section.
+Ref: Debian doc-base Manual sections 2.3.2.1 and 2.3.2.2
+
+Tag: doc-base-invalid-document-field
+Type: error
+Info: The Document field should consists only of letters (a-z), digits (0-9), plus (+)
+ or minus (-) signs, and dots (.)
+Ref: Debian doc-base Manual section 2.2
+
+Tag: doc-base-abstract-field-is-template
+Type: warning
+Info: The Abstract field of doc-base contains a "manage online manuals" phrase,
+which was copied verbatim from an example control file found in section 2.3.1
+of the Debian doc-base Manual.
+
+Tag: doc-base-abstract-might-contain-extra-leading-whitespaces
+Type: warning
+Info: Continuation lines of the Abstract field of doc-base control file
+should start with only one space, unless they are meant to be displayed
+verbatim by fontends.
+Ref: Debian doc-base Manual section 2.3.2
+
+Tag: doc-base-abstract-field-separator-extra-whitespaces
+Type: warning
+Info: Unnecessary spaces were found in the paragraph separator line
+of the doc-base's Abstract field.  The separator line should consist
+of a single space followed by a single dot.
+Ref: Debian doc-base Manual section 2.3.2
+
+Tag: spelling-error-in-doc-base-title-field
+Type: error
+Info: Lintian found a spelling error in the Title field of doc-base
+ control file.  Lintian has a list of common misspellings that it
+ looks for; it does not have a dictionary like a spelling checker does.
+
+Tag: spelling-error-in-doc-base-abstract-field
+Type: error
+Info: Lintian found a spelling error in the Abstract field of doc-base
+ control file.  Lintian has a list of common misspellings that it
+ looks for; it does not have a dictionary like a spelling checker does.
+
+Tag: doc-base-file-syntax-error
+Type: error
+Info: Lintian found a syntax error in the doc-base control file.
+Ref: Debian doc-base Manual section 2.3.2.2
+
+Tag: doc-base-file-separator-extra-whitespaces
+Type: warning
+Info: Unnecessary spaces were found in the doc-base file sections'
+ separator. The section separator is an empty line.
+Ref: Debian doc-base Manual section 2.3.2
diff -Nur checks.old/spelling_common.pm checks/spelling_common.pm
--- checks.old/spelling_common.pm	1970-01-01 01:00:00.000000000 +0100
+++ checks/spelling_common.pm	2007-10-29 20:26:06.000000000 +0100
@@ -0,0 +1,356 @@
+# spelling -- lintian check script -*- perl -*-
+
+# Look for common spelling errors in the package description and the
+# copyright file.
+
+# Copyright (C) 1998 Richard Braakman
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, you can find it on the World Wide
+# Web at http://www.gnu.org/copyleft/gpl.html, or write to the Free
+# Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+package spelling_common;
+use strict;
+use Tags;
+
+use base qw(Exporter);
+our @EXPORT = qw(spelling_check);     
+
+# All spelling errors that have been observed "in the wild" in package
+# descriptions are added here, on the grounds that if they occurred
+# once they are more likely to occur again.
+
+# Misspellings of "compatibility", "separate", and "similar" are 
+# particularly common.
+
+# Be careful with corrections that involve punctuation, since the check
+# is a bit rough with punctuation.  For example, I had to delete the
+# correction of "builtin" to "built-in".
+
+my %corrections = qw(
+		     accesnt accent
+		     accelleration acceleration
+		     accessable accessible
+		     accomodate accommodate
+		     acess access
+		     acording according
+		     additionaly additionally
+		     adress address
+		     adresses addresses
+		     adviced advised
+		     albumns albums
+		     alegorical allegorical
+		     algorith algorithm
+		     allpication application
+		     altough although
+		     alows allows
+		     amoung among
+		     amout amount
+		     analysator analyzer
+		     ang and
+		     appropiate appropriate
+		     arraival arrival
+		     artifical artificial
+		     artillary artillery
+		     attemps attempts
+		     authentification authentication
+		     automaticly automatically
+		     automatize automate
+		     automatized automated
+		     automatizes automates
+		     auxilliary auxiliary
+		     availavility availability
+		     availble available
+		     avaliable available
+		     availiable available
+		     backgroud background
+		     baloons balloons
+		     becomming becoming
+		     becuase because
+		     calender calendar
+		     cariage carriage
+		     challanges challenges
+		     changable changeable
+		     charachters characters
+		     charcter character
+		     choosen chosen
+		     colorfull colorful
+		     comand command
+		     commerical commercial
+		     comminucation communication
+		     commoditiy commodity
+		     compability compatibility
+		     compatability compatibility
+		     compatable compatible
+		     compatibiliy compatibility
+		     compatibilty compatibility
+		     compleatly completely
+		     complient compliant
+		     compres compress
+		     containes contains
+		     containts contains
+		     contence contents
+		     continous continuous
+		     contraints constraints
+		     convertor converter
+		     convinient convenient
+		     cryptocraphic cryptographic
+		     deamon daemon
+		     debain Debian
+		     debians Debian\'s
+		     decompres decompress
+		     definate definite
+		     definately definitely
+		     dependancies dependencies
+		     dependancy dependency
+		     dependant dependent
+		     developement development
+		     developped developed
+		     deveolpment development
+		     devided divided
+		     dictionnary dictionary
+		     diplay display
+		     disapeared disappeared
+		     dissapears disappears
+		     documentaion documentation
+		     docuentation documentation
+		     documantation documentation
+		     dont don\'t
+		     easilly easily
+		     ecspecially especially
+		     edditable editable
+		     editting editing
+		     eletronic electronic
+		     enchanced enhanced
+		     encorporating incorporating
+		     enlightnment enlightenment
+		     enterily entirely
+		     enviroiment environment
+		     environement environment
+		     excellant excellent
+		     exlcude exclude
+		     exprimental experimental
+		     extention extension
+		     failuer failure
+		     familar familiar
+		     fatser faster
+		     fetaures features
+		     forse force
+		     fortan fortran
+		     framwork framework
+		     fuction function
+		     fuctions functions
+		     functionnality functionality
+		     functonality functionality
+		     functionaly functionally
+		     futhermore furthermore
+		     generiously generously
+		     grahical graphical
+		     grahpical graphical
+		     grapic graphic
+		     guage gauge
+		     halfs halves
+		     heirarchically hierarchically
+		     helpfull helpful
+		     hierachy hierarchy
+		     hierarchie hierarchy
+		     howver however
+		     implemantation implementation
+		     incomming incoming
+		     incompatabilities incompatibilities
+		     indended intended
+		     indendation indentation
+		     independant independent
+		     informatiom information
+		     initalize initialize
+		     inofficial unofficial
+		     integreated integrated
+		     integrety integrity
+		     integrey integrity
+		     intendet intended
+		     interchangable interchangeable
+		     intermittant intermittent
+		     jave java
+		     langage language
+		     langauage language
+		     langugage language
+		     lauch launch
+		     lesstiff lesstif
+		     libaries libraries
+		     libary library
+		     licenceing licencing
+		     loggin login
+		     logile logfile
+		     loggging logging
+		     maintainance maintenance
+		     maintainence maintenance
+		     makeing making
+		     managable manageable
+		     manoeuvering maneuvering
+		     mathimatic mathematic
+		     mathimatics mathematics
+		     mathimatical mathematical
+		     ment meant
+		     modulues modules
+		     monochromo monochrome
+		     multidimensionnal multidimensional
+		     navagating navigating
+		     nead need
+		     neccesary necessary
+		     neccessary necessary
+		     necesary necessary
+		     nescessary necessary
+		     noticable noticeable
+		     optionnal optional
+		     orientatied orientated
+		     orientied oriented
+		     pacakge package
+		     pachage package
+		     packacge package
+		     packege package
+		     packge package
+		     pakage package
+		     particularily particularly
+		     persistant persistent
+		     plattform platform
+		     ploting plotting
+		     protable portable
+		     posible possible
+		     powerfull powerful
+		     prefered preferred
+		     prefferably preferably
+		     prepaired prepared
+		     princliple principle
+		     priorty priority
+		     proccesors processors
+		     proces process
+		     processsing processing
+		     processessing processing
+		     progams programs
+		     programers programmers
+		     programm program
+		     programms programs
+		     promps prompts
+		     pronnounced pronounced
+		     prononciation pronunciation
+		     pronouce pronounce
+		     protcol protocol
+		     protocoll protocol
+		     recieve receive
+		     recieved received
+		     redircet redirect
+		     regulamentations regulations
+		     remoote remote
+		     repectively respectively
+		     replacments replacements
+		     requiere require
+		     runnning running
+		     safly safely
+		     savable saveable
+		     searchs searches
+		     separatly separately
+		     seperate separate
+		     seperated separated
+		     seperately separately
+		     seperatly separately
+		     serveral several
+		     setts sets
+		     similiar similar
+		     simliar similar
+		     speach speech
+		     splitted split
+		     standart standard
+		     staically statically
+		     staticly statically
+		     succesful successful
+		     succesfully successfully
+		     suplied supplied
+		     suport support
+		     suppport support
+		     supportin supporting
+		     synchonized synchronized
+		     syncronize synchronize
+		     syncronizing synchronizing
+		     syncronus synchronous
+		     syste system
+		     sythesis synthesis
+		     taht that
+		     throught through
+		     useable usable
+		     usefull useful
+		     usera users
+		     usetnet Usenet
+		     utilites utilities
+		     utillities utilities
+		     utilties utilities
+		     utiltity utility
+		     utitlty utility
+		     variantions variations
+		     varient variant
+		     verson version
+		     vicefersa vice-versa
+		     yur your
+		     wheter whether
+		     wierd weird
+		     xwindows X
+		    );
+# The format above doesn't allow spaces
+$corrections{'alot'} = 'a lot';
+
+my %corrections_language_names = qw(
+				    english English
+				    french French
+				    german German
+				    russian Russian
+				   );
+
+# -----------------------------------
+
+sub _tag {
+    my @args = grep { defined($_)} @_;
+    tag(@args);
+}
+
+sub spelling_check {
+    my $tag = shift;
+    my $file = shift;
+    my $filename = shift;
+
+
+    foreach my $word (split(/\s+/, $file)) {
+	# before lowercasing the word, check if it's a non-uppercased
+	# language name
+	if (exists $corrections_language_names{$word}) {
+	    _tag($tag, $filename, $word, $corrections_language_names{$word});
+        }
+	$word = lc $word;
+	# try deleting the non-alphabetic parts from the word.
+	# Treat apostrophes specially: only delete them if they occur
+	# at the beginning or end of the word.
+	$word =~ s/(^\')|[^\w\xc0-\xd6\xd8-\xf6\xf8-\xff\']+|(\'$)//g;
+	if (exists $corrections{$word}) {
+	    _tag($tag, $filename, $word, $corrections{$word});
+        }
+    }
+    # special case for correcting a multi-word string
+    # $corrections{'Debian/GNU Linux'} = 'Debian GNU/Linux';
+    if ($file =~ m,Debian/GNU Linux,) {
+	_tag($tag, $filename, "Debian/GNU Linux", "Debian GNU/Linux");
+    }
+}
+
+1;
+
+# vim: syntax=perl

Attachment: signature.asc
Description: Digital signature

--- End Message ---

--- Begin Message ---

To: 448783-close@bugs.debian.org
Subject: Bug#448783: fixed in lintian 1.23.39
From: Russ Allbery <rra@debian.org>
Date: Sat, 08 Dec 2007 06:32:03 +0000
Message-id: <E1J0tEJ-0005vQ-4r@ries.debian.org>

Source: lintian
Source-Version: 1.23.39

We believe that the bug you reported is fixed in the latest version of
lintian, which is due to be installed in the Debian FTP archive:

lintian_1.23.39.dsc
  to pool/main/l/lintian/lintian_1.23.39.dsc
lintian_1.23.39.tar.gz
  to pool/main/l/lintian/lintian_1.23.39.tar.gz
lintian_1.23.39_all.deb
  to pool/main/l/lintian/lintian_1.23.39_all.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 448783@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Russ Allbery <rra@debian.org> (supplier of updated lintian package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Fri, 07 Dec 2007 22:12:56 -0800
Source: lintian
Binary: lintian
Architecture: source all
Version: 1.23.39
Distribution: unstable
Urgency: low
Maintainer: Debian Lintian Maintainers <lintian-maint@debian.org>
Changed-By: Russ Allbery <rra@debian.org>
Description: 
 lintian    - Debian package checker
Closes: 36017 356051 435963 448783 453177 454358 454688 454723 454730 454790
Changes: 
 lintian (1.23.39) unstable; urgency=low
 .
   The "Ubuntu and doc-base patch merge, with extra fixes" release.
 .
   * checks/*.desc:
     + [RA] Remove the unused Standards-Version header.
   * checks/changelog-file{.desc,}:
     + [RA] Check the latest entry of the Debian changelog and any
       NEWS.Debian file for common spelling errors.  (Closes: #36017)
     + [RA] If this looks like a new package (Debian revision of -1 and
       only one changelog entry), warn if it doesn't close a bug.  Thanks,
       Margarita Manterola.  (Closes: #356051)
     + [RA] Check for lines over 80 columns in the most recent entry.
       Thanks, Guillem Jover.  (Closes: #435963)
   * checks/copyright-file{.desc,}:
     + [RA] Moved spelling-error-in-copyright check to here.
   * checks/debconf:
     + [RA] Go back to not warning about "no" in boolean debconf
       questions.  The word is too common in normal English prose for
       reasons other than assuming a particular debconf interface.  Thanks,
       Rafael Laboissiere.  (Closes: #453177)
   * checks/debian-readme{.desc,}:
     + [RA] Moved spelling-error-in-readme-debian check to here.
   * checks/description{.desc,}:
     + [RA] Moved spelling-error-in-description check to here.
   * checks/fields:
     + [RA] Python documentation packages should still be in section doc.
       Thanks, Michal Čihař.  (Closes: #454688)
     + [RA] Warn about lib.*-dev packages not in section libdevel.
     + [RA] Warn about debug packages that aren't priority: extra.  Thanks,
       Joerg Jaspert.  (Closes: #454358)
     + [RA] Ignore Original-Maintainer if the version contains ubuntu.
     + [RA] Only warn about Section for Python packages starting with
       python-, not py, since py picks up too many things that aren't
       Python modules.
     + [RA] Only warn about Section for Perl packages matching lib.*-perl
       to avoid false positives for things like dh-make-perl.  Thanks,
       Damyan Ivanov.  (Closes: #454723)
   * checks/files:
     + [RA] Warn about packages providing files in /usr/lib/debug that
       aren't named -dbg.  Thanks, Joerg Jaspert.
   * checks/lintian.desc:
     + [RA] Add bad-ubuntu-distribution-in-changes-file, merged from the
       Ubuntu patch.
     + [RA] Fix the malformed-override long description.  Thanks, Stefan
       Fritsch.
   * checks/menus{.desc,}:
     + [RA] Substantial overhaul and expansion of the doc-base control file
       checks.  Patch from Robert Luberda.  (Closes: #448783)
   * checks/nmu:
     + [RA] No packages with ubuntu in the version number are NMUs.  Merged
       from the Ubuntu patch.
   * checks/patch-systems:
     + [RA] Ignore blank lines in 00list and don't report them as patches
       without descriptions.  Thanks, Julien BLACHE.  (Closes: #454730)
   * checks/spelling{.desc,}:
     + [RA] Subsumed into other check scripts and lib/Spelling.pm.
 .
   * frontend/lintian:
     + [RA] If the version number indicates an Ubuntu package, check
       against a different list of allowable distributions.  Merged from
       the Ubuntu patch.
     + [RA] Skip check and collection *.desc files whose names start with a
       period (mostly to avoid testing artifacts from editor lock files).
     + [RA] Restore previous override parsing and make the package name
       optional again.  Thanks, Stefan Fritsch.  (Closes: #454790)
     + [RA] Check overrides for implausible tags.
 .
   * lib/Spelling.pm:
     + [RA] New module to do general spelling checks for specific
       misspellings.  Based on the previous checks/spelling and a patch by
       Robert Luberda.
Files: 
 721e5584c1bbb5fe0115d8f14b2e6d6d 904 devel optional lintian_1.23.39.dsc
 60c7fafc093656f1da0b2533896e82a2 362802 devel optional lintian_1.23.39.tar.gz
 07ca14cf6ddc073163bf7eb74df4cb4b 306822 devel optional lintian_1.23.39_all.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHWjip+YXjQAr8dHYRAjVtAJ9Nkax/MkPdLDaPTuEzztjNESLZEQCfasIK
+zPl8GEi7OWape5uGN+U9xo=
=ENZp
-----END PGP SIGNATURE-----

--- End Message ---

Reply to:

Prev by Date: Bug#454723: marked as done (false positive: perl-package-should-be-section-perl dh-make-perl)
Next by Date: Bug#454688: marked as done (python-package-should-be-section-python produces false positives)
Previous by thread: Bug#36017: marked as done (lintian: should check current changelog entry for spelling)
Next by thread: Bug#356051: marked as done (lintian: Add a warning if the Initial Release of a package does not have an ITP.)
Index(es):
- Date
- Thread