Why does aspell suggest the very word that it fails to check? - aspell

Here is the command I run:
> echo "civilization" | aspell -a
#(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6.1)
& civilization 3 0: civilization, civilizations, civilization's
Why does aspell suggest the very word ("civilization") but fails to check its spelling? In contrast, hunspell seems to get this right
> echo "civilization" | hunspell
Hunspell 1.3.2
*
but that is probably because the two spell checkers use different dictionaries.
EDIT: Running this on a different machine and different/older aspell version seems to work though:
> echo civilization | aspell -a
#(#) International Ispell Version 3.1.20 (but really Aspell 0.60.3)
*

Uppercase and lowercase
What you get, if you try it with Civilization ?
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
T:\msys\1.0\src\aspell-0.60.6\.libs>echo "zivilisation" | aspell -a
#(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6)
& zivilisation 3 1: Zivilisation, Zivilisationen, Sterilisation
T:\msys\1.0\src\aspell-0.60.6\.libs>echo "Zivilisation" | aspell -a
#(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6)
*
T:\msys\1.0\src\aspell-0.60.6\.libs>

According to Kevin Atkinson (aspell maintainer, link) that's a bug and he wasn't sure if there's a report open for it. He wasn't sure if/when this will get fixed either.

Related

Grep behaving differently in OpenBSD to Linux. Cannot get command working

So basically what this command does is securely connect to a web domain to grab my external IP address, this works flawlessly on a Linux Debian system, but it is not working right on my OpenBSD system. Curl command works fine, however something is up with the Grep command as it just isn't grabbing the IP that curl is piping to it..
Does -Eo not work with OpenBSD? I cannot tell with the man page..
USERAGENT="Mozilla/4.0"
WEB_LOCATION="https://duckduckgo.com/?q=whats+my+ip"
curl -s --retry 3 --max-time 5 -tlsv1.2 --user-agent $USERAGENT $WEB_LOCATION | grep -Eo '\<[[:digit:]]{1,3}(\.[[:digit:]]{1,3}){3}\>'
******* RESOLVED (Kinda) ********
I worked out that for some reason this particular pattern :
grep -Eo '\<[[:digit:]]{1,3}(\.[[:digit:]]{1,3}){3}\>'
was not working on OpenBSD, but this long version does..
grep -Eo '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}'
Why this is the case is very confusing, as the first search pattern works fine on all the versions of Debian Linux I have used!
The issue is with the word boundary patterns in your regexp, which are [[:<:]] and [[:>:]] in OpenBSD but \< and \> respectively in Debian (and possibly other Linux distributions).
grep -Eo '[[:<:]][[:digit:]]{1,3}(\.[[:digit:]]{1,3}){3}[[:>:]]'
should work.
Read the man page for details.

regex: plus sign vs asterisk

The asterisk or star tells the engine to attempt to match the preceding token zero or more times. The plus tells the engine to attempt to match the preceding token once or more.
Based on the definition, I was wondering why the plus sign returns more matches than the asterisk sign.
echo "ABC ddd kkk DDD" | grep -Eo "[A-Z]+"
returns
ABC DDD
echo "ABC ddd kkk DDD" | grep -Eo "[A-Z]*"
returns
ABC
As far as I can tell, it doesn't. With GNU grep versions 2.5.3, 2.6.3, 2.10, and 2.12, I get:
$ echo "ABC ddd kkk DDD" | grep -Eo "[A-Z]+"
ABC
DDD
$ echo "ABC ddd kkk DDD" | grep -Eo "[A-Z]*"
ABC
DDD
Please double-check your second example. If you can confirm that you get only one line of output, it might be a bug in your grep. If you're using GNU grep, what's the output of grep --version? If not, what OS are you using, and (if you know) what grep implementation?
UPDATE :
I just built and installed GNU grep 2.5.1 (the version you're using) from source, and I confirm your output. It appears to be a bug in that version of grep, apparently corrected between 2.5.1a and 2.5.3. GNU grep 2.5.1 is about 12 years old; can you install a newer version? Looking through the ChangeLog for 2.5.3, I suspect this may have been the fix:
2005-08-24 Charles Levert <charles_levert#gna.org>
* src/grep.c (print_line_middle): In case of an empty match,
make minimal progress and continue instead of aborting process
of the remainder of the line, in case there's still an upcoming
non-empty match.
* tests/foad1.sh: Add two tests for this.
* doc/grep.texi, doc/grep.1: Document this behavior, since
--only-matching and --color are GNU extensions which are
otherwise unspecified by POSIX or other standards.
Even if you don't have full access on the machine you're using, you should still be able to download the source tarball from ftp://ftp.gnu.org/gnu/grep/ and install it under your home directory (assuming your system has a working compiler and associated tools).

How do you find using regular expression, characters beginning with and ending with any characters

In
AXyz122311Xyslasd22344ssaa Aklsssx#sdddf#4=sadsss kaaAASds
How do we get the characters "slas" out that begins with "11Xy" and ends with "d223" in UNIX using regular expression?
This is what lookahead and lookbehind assertions will do.
Have you tried something like this?
(?<=11Xy).+(?=d223)
Update
You can use grep -o to display only the matched text in a *nix environment.
Not too late, but, downvoters need to include that *NIX grep has a few limitations and lookaround/lookbehind/etc., do not actually work on most versions.
http://www.regular-expressions.info/grep.html
Since neither grep nor egrep support any of the special features such as lazy repetition or lookaround,
Only, recently was it added to GNU grep (3.0 ?) released recently which basically uses perl compatible regex
https://www.gnu.org/software/grep/manual/grep.html#The-Backslash-Character-and-Special-Expressions
-P
--perl-regexp Interpret the pattern as a Perl-compatible regular expression (PCRE). This is highly experimental, particularly when combined with the -z (--null-data) option, and ‘grep -P’ may warn of unimplemented features.
On upgrading my grep and using -P, it works like a charm
$cat test.txt | ggrep -oP '(?<=11Xy)(.*?)(?=d223)'
slas
$ggrep --version
ggrep (GNU grep) 3.1
Packaged by Homebrew
Copyright (C) 2017 Free Software Foundation, Inc.
...
On some OS, especially mac, your grep is BSD, so you have install GNU grep using homebrew to use it.
$grep -V
grep (BSD grep) 2.5.1-FreeBSD
$brew install grep
...
$ggrep -V

grep with regexp: whitespace doesn't match unless I add an assertion

GNU grep 2.5.4 on bash 4.1.5(1) on Ubuntu 10.04
This matches
$ echo "this is a line" | grep 'a[[:space:]]\+line'
this is a line
But this doesn't
$ echo "this is a line" | grep 'a\s\+line'
But this matches too
$ echo "this is a line" | grep 'a\s\+\bline'
this is a line
I don't understand why #2 does not match (whereas # 1 does) and #3 also shows a match. Whats the difference here?
Take a look at your grep manpage. Perl added a lot of regular expression extensions that weren't in the original specification. However, because they proved so useful, many programs adopted them.
Unfortunately, grep is sometimes stuck in the past because you want to make sure your grep command remains compatible with older versions of grep.
Some systems have egrep with some extensions. Others allow you to use grep -E to get them. Still others have a grep -P that allows you to use Perl extensions. I believe Linux systems' grep command can use the -P extension which is not available in most Unix systems unless someone has replaced the grep with the GNU version. Newer versions of Mac OS X also support the -P switch, but not older versions.
grep doesn't support the complete set of regular expressions, so try using -P to enable perl regular expressions. You don't need to escape the + i.e.
echo "this is a line" | grep -P 'a\s+line'

what is the 'what' command on AIX under LINUX

I am used to use what to find out some version string in my program, which is normal defined as a string in the c++ code, starting with "#(#)".
Now I cannot find it in Linux. Can anyone tell me what I am supposed to do? Thanks a lot!
The what command is part of the Source Code Control System (SCCS), which is not commonly available on Linux (if there is a Linux version at all). You can try to emulate it with the strings command:
strings a.out | fgrep '#(#)'
Reimplementations of what are available in CSSC (an SCCS-to-modern version control conversion package) and in BSD (source code).
try this
strings myprogram | grep '#('
As #larsmans said, what command is part of SCCS. Here is the link to the GNU replacement for SCCS
Additionally to the mention of SCCS, ident is the equivalent for RCS (and there are quite a few tools which use the same marker as RCS, CVS being the first one of these).
The following command gives most equivalent output compared to what
strings filename | grep -o \"\"#(#).*\"\" | sed 's/^\"#(#)//' | sed 's/\"$//'