search in shell

search in shell - regex

folks
I have a output file looks like:
Title: [name of component]
**garbage output**
**garbage output**
Test run: succuess 17 failure 2
**garbage output**
**garbage output**
and there are many components like this. I cannot change the way of the output. So I'd want to just grab the lines of title and test result.
My question is, how to write a regular expression to achieve this?
I tried:
cat output | sed -e 'm/Tests run(.*)/g'
but it always complains: unknown command `m'
Other methods except regex would also be appreciated!!!
Thanks a lot

You don't need cat, try
grep -E '^Title:|^Test run' fileName
on older systems you may need to use egrep '^Title...'.
Edit for
I want exclude Title with certain prefix from the result, like "Title: foo XXX" or "Title bar XXX".
There is certainly a regex for grep -E that would handle this, but for the first few years of cmd-line work, AND as you appear to be using this to cleanup test.log output, it is good to use the unix tool box approach, in this case, 'get something working and add a little more to it', i.e.
grep -E '^Title:|^Test run' fileName | egrep -v '^Title: foo XXX|^Title:bar XXX'
This is the power of the unix pipeline, got too much output?, then keep adding more grep -vs to clean it up.
Note that *grep -v means exclude lines that match the following patterns.
I hope this helps

Related

egrep - regex filtering characters only working when run via cron?

This is baffling to me, please help :-)
I have a program which sometimes runs by CLI, and sometimes though cron, both as the same user, and both in the bash.
In cron I use SHELL=/bin/bash to force bash.
The offending command within the script is:
egrep -v "$^" playlist.txt | egrep -v "[^ -.[:alnum:]]" >>formattedPlaylist.txt
Basically, it should remove all blank lines from the playlist, then remove any line which contains anything other than [A-Za-z0-9 - .].
For some reason, when run as a user from cli, this does not filter out many characters, whereas if cron runs it, it works exactly as expected.
The characters which are not filtered out are:
% $ # ! * & ( ) '
Any ideas??

Try:
sed '/[^-A-Za-z0-9.\x27 ]/d;/''/d;/^\s*$/d' playlist.txt > cleaned_playlist.txt
Input text:
A goat
232423
-sdf-g
Here it goes
'keep me
$ let it go
\ this one too
Output:
A goat
232423
-sdf-g
Here it goes
'keep me

Try setting your locale explicitly.
LC_ALL=C egrep -v "$^|[^ -.[:alnum:]]" playlist.txt >>formattedPlaylist.txt
I also simplified the command by merging the two regular expressions, but the locale fix is the answer to your question.

looking for regExp to return line between two strings that works with pdfgrep

Though I'm not totally new to regExp, they always give me headaches. Especially when not all forms of regular expressions can be used.
The pattern has to work with pdfgrep as the information I'm looking for is inside a pdf Document.
Obviously the document is multiline
The resulting pattern will be used in a bash script if this does make any difference
The keywords usually can be found more than once in the same file, while I need only the data between the first occurences of both keywords
The data looks like:
some text
some more text
even more information Date
02.Feb.2014
Customer
some more text
some more information
even more information Date
02.Feb.2014
Customer
some more text
some more information
...
The result of the command should be: 02.Feb.2014
I don't know which characters might be around this date (tabs, spaces ...) and I don't want to rely on them.
I tried
pdfgrep -h 'Date(.*?)Customer' *.pdf
which gave no result at all.
Next try was
pdfgrep -h '(?<=Date)(.*)(?=Customer)' *.pdf
which resulted in an error "Invalid preceding regular expression"
The best shot I can come up until now is
pdfgrep -h '(Date)[[:space:]]{,1}.{,100}[[:space:]](Customer){,1}' *.pdf
This returns all matching dates together with the first keyword. But I'd like a much more elegant way as regExp should be able to provide it.
I'd appreciate any useful hint ;)
Regards
Manuel

The only document you should ever read when using grep, awk, or sed regular expressions is here. It cleared a lot of stuff up for me.
sed -n -e '/even more information Date/ {' \
-e ' n' \
-e ' s/^[[:space:]]*//' \
-e ' p' \
-e '}'
UNIX regular expressions only look at lines in the file. you can't capture stuff in an RE across lines.
The above sed command looks for a line looking like even more information Date, looks at the next line, removes the white space, and prints that line (the one with 02.Feb.2014 on it). The -n option is used to suppress output (only print lines if "I tell you to", sed).

The hint to use gs in combination with sed does the trick. Though I had to do some testing until it worked as desired.
The command used now is:
gs -q -dBATCH -dNOPAUSE -sDEVICE=txtwrite -dFirstPate=1 -dLastPage=1 \
-sOutputFile=- /path/to/my.pdf 2>/dev/null | sed -n -e '/Date/ {' \
-e'n' -e's/^[[:space:]]*//' -e 'p' -e '}'
Thanks to all contributors :)

Linux Bash Regular Expressions, retrieving data from SNMPGet Output

I've been working on getting a few simple monitoring tools running at home, and decided to be funny and retrieve the printer data along with everything else, however now that I've got the SNMP portion of it working quite well, I can't seem to be able to parse the data that my SNMPGET command retrieves properly in Linux, the current script I am using is as follows:
#!/usr/bin/env bash
# RegEx for Strings: "(.+?)"| -?\d+
RegExStr='"(.+?)"| -?\d+'
# ***
# Brother HL-2150N Printer
# ***
# Order Data: Toner Naame, Toner Level, Drum Name, Drum Status, Total Pages Printer, Display Status
Input=$(snmpget -v 1 -c public 192.168.16.112 SNMPv2-SMI::mib-2.43.11.1.1.6.1.1 SNMPv2-SMI::mib-2.43.11.1.1.8.1.1 SNMPv2-SMI::mib-2.43.11.1.1.6.1.2 SNMPv2-SMI::mib- 2.43.11.1.1.9.1.1 SNMPv2-SMI::mib-2.43.10.2.1.4.1.1 SNMPv2-SMI::mib-2.43.16.5.1.2.1.1 -m BROTHER-MIB)
Output1=( $(echo $Input | egrep -o $RegExStr) )
# Output
echo $Input
echo ${Output1[#]}
Which, oddly enough does not work. I'm fairly certain my regular expression ( "(.+?)" ) is correct, as I've tested it numerous times in various different syntax checkers and testers. It's supposed to select all the data that's between quotation marks ("").
Anyhow, the SNMPGET return is:
SNMPv2-SMI::mib-2.43.11.1.1.6.1.1 = STRING: "Black Toner Cartridge" SNMPv2-SMI::mib-2.43.11.1.1.8.1.1 = INTEGER: -2 SNMPv2-SMI::mib-2.43.11.1.1.6.1.2 = STRING: "Drum Unit" SNMPv2-SMI::mib-2.43.11.1.1.9.1.1 = INTEGER: -3 SNMPv2-SMI::mib-2.43.10.2.1.4.1.1 = Counter32: 13630 SNMPv2-SMI::mib-2.43.16.5.1.2.1.1 = STRING: "SLAAP "
I've tried various things myself, and using grep returns a blank string. to my understanding grep does not support every regular expression command by itself, so I started using egrep, while this returns SOMETHING, it is everything inside the original string divided by spaces, starting at the first quotation mark.
Is there anything I'm missing? I've looked around, and adjusted my methods a few times but never seemed to get a usable array in return.
Anyhow, I appreciate any help/pointers you'd be able to give me. I'd like to be able to get this running, even if just for fun and a good learning experience. Thank you in advance though! I'll be fidgeting on with it some more myself, but will check here every now and then.

From your output:
To get all strings:
grep -oP 'STRING: *"\K[^"]*'
Black Toner Cartridge
Drum Unit
SLAAP
To get all integers:
grep -oP '(INTEGER|Counter32): *\K[^ ]*'
-2
-3
13630

With awk you can do this:
awk 'NR%2==0' RS=\" <<< $Input
Black Toner Cartridge
Drum Unit
SLAAP
Or into a variable
Output1=$(awk 'NR%2==0' RS=\" <<< $Input)

Remove everything between pairs of braces with sed

I've got a string that looks like this:
[%{%B%F{blue}%}master %{%F{red}%}*%{%f%k%b%}%{%f%k%b%K{black}%B%F{green}%}]
I want to remove the substrings matching %{...}, which may or may not contain further substrings of the same order.
I should get: [master *] as the final output. My progress so far:
gsed -E 's/%\{[^\}]*\}//g'
which gives:
echo '[%{%B%F{blue}%}master %{%F{red}%}*%{%f%k%b%}%{%f%k%b%K{black}%B%F{green}%}]' | gsed -E 's/%\{[^\}]*\}//g'
[%}master %}*%B%F{green}%}]
So, this works fine for %{...} sections which do not contain %{...}. It fails for strings like %{%B%F{blue}%} (it returns %}).
What I want to do is parse the string until I find the matching }, then remove everything up to that point, rather than removing everything between %{ and the first } I encounter. I'm not sure how to do this.
I'm fully aware that there are probably multiple ways to do this; I'd prefer an answer regarding the way specified in the question if it is possible, but any ideas are more than welcome.

This might work for you:
echo '[%{%B%F{blue}%}master %{%F{red}%}*%{%f%k%b%}%{%f%k%b%K{black}%B%F{green}%}]' |
sed 's/%{/{/g;:a;s/{[^{}]*}//g;ta'
[master *]

Use recursion to eat it out from the inside out.
s/%{.*?%}//g
Then wrap in
while(there's at least one more brace)
(probably while $? -ne 0 ... whatever rcode sed uses to say "no matches!")

Try this:
sed -E 's/%{([^{}]*({[^}]*})*[^{}]*)*}//g'

bash grep - negative match

I want to show flag places in my Python unittests where I have been lazy and de-activated tests.
But I also have conditional executions that are not laziness, they are motivated by performance or system conditions at time of testing. Those are the skipUnless ones and I want to ignore them entirely.
Let's take some inputs that I have put in a file, test_so_bashregex.txt, with some comments.
!ignore this, because skipUnless means I have an acceptable conditional flag
#unittest.skipUnless(do_test, do_test_msg)
def test_conditional_function():
xxx
!catch these 2, lazy test-passing
#unittest.skip("fb212.test_urls_security_usergroup Test_Detail.test_related fails with 302")
def sometest_function():
xxx
#unittest.expectedFailure
def test_another_function():
xxx
!bonus points... ignore things that are commented out
# #unittest.expectedFailure
Additionally, I can't use a grep -v skipUnless in a pipe because I really want to use egrep -A 3 xxx *.py to give some context, as in:
grep -A 3 "#unittest\." *.py
test_backend_security_meta.py: #unittest.skip("rewrite - data can be legitimately missing")
test_backend_security_meta.py- def test_storage(self):
test_backend_security_meta.py- with getMultiDb() as mdb:
test_backend_security_meta.py-
What I have tried:
Trying # https://www.debuggex.com/
I tried #unittest\.(.+)(?!(Unless\()) and that didn't work, as it matches the first 3.
Ditto #unittest\.[a-zA-Z]+(?!(Unless\())
#unittest\.skip(?!(Unless\()) worked partially, on the 2 with skip.
All of those do partial matches despite the presence of Unless.
on bash egrep, which is where this going to end up, things don't look much better.
jluc#explore$ egrep '#unittest\..*(?!(Unless))' test_so_bashregex.txt
egrep: repetition-operator operand invalid

you could try this regex:
(?<!#\s)#unittest\.(?!skipUnless)(skip|expectedFailure).*
if you don't care if 'skip' or 'expectedFailure' appear you could simplify it:
(?<!#\s)#unittest\.(?!skipUnless).*

How about something like this - grep seems a bit restrictive
items=$(find . -name "*.py")
for item in $items; do
cat $item | awk '
/^\#unittest.*expectedFailure/{seen_skip=1;}
/^\#unittest.*skip/{seen_skip=1;}
/^def/{
if (seen_skip == 1)
print "Being lazy at " $1
seen_skip=0;
}
'
done

OK, I'll put up what I found with sweaver2112's help, but if someone has a good single-stage grep-ready regex, I'll take it.
bash's egrep/grep doesn't like ?! (ref grep: repetition-operator operand invalid). end of story there.
What I have done instead is to pipe it to some extra filters: negative grep -v skipUnless and another one to strip leading comments. These 2 strip out the unwanted lines. But, then pipe their output back into another grep looking for #unittest again and again with the -A 3 flag.
If the negative greps have cleared out a line, it won't show in the last pipestage so drops out of the input. If not, I get my context right back.
egrep -A 3 -n '#unittest\.' test_so_bashregex.txt | egrep -v "^\s*#" | egrep -v "skipUnless\(" | grep #unittest -A 3
output:
7:#unittest.skip("fb212.test_urls_security_usergroup Test_Detail.test_related fails with 302")
8-def sometest_function():
9- xxx
10:#unittest.expectedFailure
11-def test_another_function():
12- xxx
And my actual output from running it on * *.py*, rather than my test.txt file:
egrep -A 3 -n '#unittest\.' *.py | egrep -v "\d:\s*#" | egrep -v "skipUnless\(" | grep #unittest -A 3
output:
test_backend_security_meta.py:77: #unittest.skip("rewrite - data can be legitimately missing")
test_backend_security_meta.py-78- def test_storage(self):
test_backend_security_meta.py-79- with getMultiDb() as mdb:
test_backend_security_meta.py-80-
--
test_backend_security_meta.py:98: #unittest.skip("rewrite - data can be legitimately missing")
test_backend_security_meta.py-99- def test_get_li_tag_for_object(self):
test_backend_security_meta.py-100- li = self.mgr.get_li_tag()
test_backend_security_meta.py-101-

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

search in shell - regex

Related

egrep - regex filtering characters only working when run via cron?

looking for regExp to return line between two strings that works with pdfgrep

Linux Bash Regular Expressions, retrieving data from SNMPGet Output

Remove everything between pairs of braces with sed

bash grep - negative match

Categories

Resources