How do I find broken NMEA log sentences with grep? - regex

My GPS logger occassionally leaves "unfinished" lines at the end of the log files. I think they're only at the end, but I want to check all lines just in case.
A sample complete sentence looks like:
$GPRMC,005727.000,A,3751.9418,S,14502.2569,E,0.00,339.17,210808,,,A*76
The line should start with a $ sign, and end with an * and a two character hex checksum. I don't care if the checksum is correct, just that it's present. It also needs to ignore "ADVER" sentences which don't have the checksum and are at the start of every file.
The following Python code might work:
import re
from path import path
nmea = re.compile("^\$.+\*[0-9A-F]{2}$")
for log in path("gpslogs").files("*.log"):
for line in log.lines():
if not nmea.match(line) and not "ADVER" in line:
print "%s\n\t%s\n" % (log, line)
Is there a way to do that with grep or awk or something simple? I haven't really figured out how to get grep to do what I want.
Update: Thanks #Motti and #Paul, I was able to get the following to do almost what I wanted, but had to use single quotes and remove the trailing $ before it would work:
grep -nvE '^\$.*\*[0-9A-F]{2}' *.log | grep -v ADVER | grep -v ADPMB
Two further questions arise, how can I make it ignore blank lines? And can I combine the last two greps?

The minimum of testing shows that this should do it:
grep -Ev "^\$.*\*[0-9A-Fa-f]{2}$" a.txt | grep -v ADVER
-E use extended regexp
-v Show lines that do not match
^ starts with
.* anything
\* an asterisk
[0-9A-Fa-f] hexadecimal digit
{2} exactly two of the previous
$ end of line
| grep -v ADVER weed out the ADVER lines
HTH, Motti.

#Motti's answer doesn't ignore ADVER lines, but you easily pipe the results of that grep to another:
grep -Ev "^\$.*\*[0-9A-Fa-f]{2}$" a.txt |grep -v ADVER

#Tom (rephrased) I had to remove the trailing $ for it to work
Removing the $ means that the line may end with something else (e.g. the following will be accepted)
$GPRMC,005727.000,A,3751.9418,S,14502.2569,E,0.00,339.17,210808,,,A*76xxx
#Tom And can I combine the last two greps?
grep -Ev "ADVER|ADPMB"

#Motti: Combining the greps isn't working, it's having no effect.
I understand that without the trailing $ something else may folow the checksum & still match, but it didn't work at all with it so I had no choice...
GNU grep 2.5.3 and GNU bash 3.2.39(1) if that makes any difference.
And it looks like the log files are using DOS line-breaks (CR+LF). Does grep need a switch to handle that properly?

#Tom
GNU grep 2.5.3 and GNU bash 3.2.39(1) if that makes any difference.
And it looks like the log files are using DOS line-breaks (CR+LF). Does grep need a switch to handle that properly?
I'm using grep (GNU grep) 2.4.2 on Windows (for shame!) and it works for me (and DOS line-breaks are naturally accepted) , I don't really have access to other OSs at the moment so I'm sorry but I won't be able to help you any further :o(

Related

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

I understand that what I'm asking can be accomplished using awk or sed, I'm asking here how to do this using GREP.
Given the following input:
.bash_profile
.config/ranger/bookmarks
.oh-my-zsh/README.md
I want to use GREP to get:
.bash_profile
.config/
.oh-my-zsh/
Currently I'm trying
grep -Po '([^/]*[/]?){1}'
Which results in output:
.bash_profile
.config/
ranger/
bookmarks
.oh-my-zsh/
README.md
Is there some simple way to use GREP to only get the first matched string on each line?
I think you can grep non / letters like:
grep -Eo '^[^/]+'
On another SO site there is another similar question with solution.
You don't need grep for this at all.
cut -d / -f 1
The -o option says to print every substring which matches your pattern, instead of printing each matching line. Your current pattern matches every string which doesn't contain slashes (optionally including a trailing slash); but it's easy to switch to one which only matches this pattern at the beginning of a line.
grep -o '^[^/]*' file
Notice the addition of the ^ beginning of line anchor, and the omission of the -P option (which you were not really using anyway) as well as the silly beginner error {1}.
(I should add that plain grep doesn't support parentheses or repetitions; grep -E would support these constructs just fine, of you could switch to toe POSIX BRE variation which requires a backslash to use round or curly parentheses as metacharacters. You can probably ignore these details and just use grep -E everywhere unless you really need the features of grep -P, though also be aware that -P is not portable.)

Get text between two patterns with egrep and awk

I'm trying to parse a command's help file to grab all the arguments the command excepts.
Here is some text from the help file:
* --digest:
Set the digest for fingerprinting (defaults to the digest used when
signing the cert). Valid values depends on your openssl and openssl ruby
extension version.
* --debug:
Enable full debugging.
* --help:
Print this help message
* --verbose:
Enable verbosity.
* --version:
Print the puppet version number
I want to just grab --argument and nothing else.
I almost got it with this command, but its still including the ":" which I want to exclude:
puppet cert --help | egrep '^* --(.*):$' | awk '{print $2}'
--all:
--allow-dns-alt-names:
--digest:
--debug:
--help:
--verbose:
--version:
Why is '^* --(.*):$' including the ":" shouldn't it be matching everything between '^* --' and ':$' ?
shouldn't it be matching everything between ^* -- and :$ ?
Actually, no. You're capturing a group, but it won't print just the group. I suggest using the -P flag to use Perl regex, and look arounds. In your case, this might be enough:
$ cert --help | grep -Po '^\* \K--\w+'
Note that I also used the -o option, to print only the matched content, not the whole line. This eliminates the usage of awk.
A more complete line based on your initial thoughts and more look arounds:
$ cert --help | grep -Po '^\* \K--.*(?=:)'
Edit: as noted in the comments and fine answer by mklement0, this requires GNU grep. You can however do the same with Perl itself, which certainly is probably already installed in your system.
$ cert --help | perl -nle 'print $1 if /^\* (--\w+)/'
This works like a line of code inside a loop. Which is automatically generated by the -nle. -n for the input look, -l for the auto line break, and -e to present the line of code.
The line of Perl code prints the first captured group if the line matches the regex. So it combines ideas from your original solution too.
For a complete POSIX compliant answer, check the answer by mklement0 here in this page.
To provide a POSIX-compliant alternative to sidyll's elegant GNU grep answer (which also explains why the OP's approach didn't work):
Update: Avinash Raj points out in a comment that sed is an option, which indeed allows for a POSIX-compliant single-tool solution: sed allows us to match entire lines of interest and replace them with the contents of a capture group (the part of the line of interest):
puppet cert --help | sed -n 's/^\* \(--.*\):$/\1/p'
Note that since sed is used without the - nonstandard - -r / -E option, a basic regular expression must be used, where ( and ) must be \-escaped to act as capture-group delimiters.
Original answer:
puppet cert --help | egrep '^\* --.+:$' | awk -F '\\* |:' '{print $2}'
Note:
^* was replaced with ^\* so as to ensure that * is matched as a literal, and (.*) was replaced with .+, because (a) there is nothing to be gained by a capture group here, and (b) it's fair to assume that at least one letter follows the --.
-F '\\* |:' uses either literal *<space> or : as the field separator, which ensures that only the --... token (the second field) is printed.

Get all Commands without arguments from history (with Regex)

I have just started with learning shell commands and how to script in bash.
Now I like to solve the mentioned task in the title.
What I get from history command (without line numbers):
ls [options/arguments] | grep [options/arguments]
find [...] exec- sed [...]
du [...]; awk [...] file
And how my output should look like:
ls
grep
find
sed
du
awk
I already found a solution, but it doesn't really satisfy me. So far I declared three arrays, used the readarray -t << (...) command twice, in order to save the content from my history and after that, in combination with compgen -ac, to get all commands which I can possibly run. Then I compared the contents from both with loops, and saved the command every time it matched a line in the "history" array. A lot of effort for an simple exercise, I guess.
Another solution I thought of, is to do it with regex pattern matching.
A command usually starts at the beginning of the line, after a pipe, an execute or after a semicolon. And maybe more, I just don't know about yet.
So I need a regex which gives me only the next word after it matched one of these conditions. That's the command I've found and it seems to work:
grep -oP '(?<=|\s/)\w+'
Here it uses the pipe | as a condition. But I need to insert the others too. So I have put the pattern in double quotes, created an array with all conditions and tried it as recommend:
grep -oP "(?<=$condition\s/)\w+"
But no matter how I insert the variable, it fails. To keep it short, I couldn't figure out how the command works, especially not the regex part.
So, how can solve it using regular expressions? Or with a better approach than mine?
Thank you in advance! :-)
This is simple and works quite well
history -w /dev/stdout | cut -f1 -d ' '
You can use this awk with fc command:
awk '{print $1}' <(fc -nl)
find
mkdir
find
touch
tty
printf
find
ls
fc -nl lists entries from history without the line numbers.

Bullet Proof Text

I have no idea what is going on, but grep, awk, sed have been rendered neutered in the face of a series of text files. Simply put, they wil not work. I cannot pattern match over a range with awk with proven personal and public examples. I cannot get sed to use the p or i options, but s still works. Both awk and sed have the odd behavior of simply printing everything regardless if the pattern matches or not. And grep will find a word (via regex or string) but if I go to exclude the word (-v) it erases everything. And I mean everything.
I don't think pasting code will be useful, but I am amenable to this. I don't think pasting text will work either.
Is there some super secret setting that renders these programs daft? I've made sure everything is saved in utf-8, and have run tr -d '\r\n' and its permutations over everything. This is a linux box and I am beating my head against the table.
All of this is being done in a linux environment with BASH.
Any ideas?
#GordonDavisson seems to have nailed it. Your tr -d '\r\n' turned your file into one long line without an ending newline so you should expect grep -v <something that appears in the file> to output nothing [at best] since everything's on one line and, while some will do their best with it, you shouldn't even expect UNIX tools to be able to handle it at all since it's not a valid text file without an ending newline. Look:
$ cat file
the
quick
brown
dog
$ grep bro file
brown
$ grep -v bro file
the
quick
dog
$ tr -d '\r\n' < file > file2
$ cat file2
thequickbrowndog$
$ grep bro file2
thequickbrowndog
$ grep -v bro file2
$
Not sure what you wanted to achieve with that tr so not sure what to advise you to do with the file now.

Extract number embedded in string

So I run a curl command and grep for a keyword.
Here is the (sanitized) result:
...Dir');">Town / Village</a></th><th>Phone Number</th></tr><tr class="rowodd"><td><a href="javascript:calldialog('ASDF','&Mode=view&helloThereId=42',600,800);"...
I want to get the number 42 - a command line one-liner would be great.
search for the string helloThereId=
extract the number right beside it (42 in the above case)
Does anyone have any tips for this? Maybe some regex for numbers? I'm afraid I don't have enough experience to construct an elegant solution.
You could use grep with -P (Perl-Regexp) parameter enabled.
$ grep -oP 'helloThereId=\K\d+' file
42
$ grep -oP '(?<=helloThereId=)\d+' file
42
\K here actually does the job of positive lookbehind. \K keeps the text matched so far out of the overall regex match.
References:
http://www.regular-expressions.info/keep.html
http://www.regular-expressions.info/lookaround.html
If your grep version supports -P, (as is true for the OP, given that they're on Linux, which comes with GNU grep), Avinash Raj's answer is the way to go.
For the potential benefit of future readers, here are alternatives:
If your grep doesn't support -P, but does support -o, here's a pragmatic solution that simply extracts the number from the overall match in a 2nd step, by splitting the input into fields by =, using cut:
grep -Eo 'helloThereId=[0-9]+' in | cut -d= -f2 file
Finally, if your grep supports neither -P nor -o, use sed:
Here's a POSIX-compliant alternative, using sed with a basic regular expression (hence the need to emulate + with \{1,\} and to escape the parentheses):
sed -n 's/.*helloThereId=\([0-9]\{1,\}\).*/\1/p' file
This will work with any sed on any UNIX OS, even the pre-POSIX default sed on Solaris:
$ sed -n 's/.*helloThereId=\([0-9]*\).*/\1/p' file
42