GREP REGEX LARGE FILE - regex

On a MAC how do I GREP? I have a large TXT file (200MB). The sample data is below. I want to run a GREP with a regex and be able to get ONLY the following data values in my terminal response:
00424730350000190100130JEAN DANIELE &
I want everything up to 82700. Once I have this information, I can copy it into another file for other purpose. Now I just get back tons of information.
Sample Record:
00424730350000190100130JEAN DANIELE & 82700 TINEPORK CT LAT BORAN AK 12345 3342843470224201400003980000002664300001216IWD QD0415200800004005880002281300000671IWD QM0330200500004900000001836800000431IWD QM0325199900002455270001147700000969IWD QM
sample Grep I wrote:
grep -E "^(.*?)82700" MYFILE.TXT
grep -E "^(.*?)[0-9]" MYFILE.TXT
This still doesn't work, it gives back tons of info and the 82700 can be any value...I would like it to be Any help suggestions? thank you

For the sample data
grep -E -o "^[0-9]{23}[^0-9]+[0-9]+" MYFILE.TXT
seems to do the job:
00424730350000190100130JEAN DANIELE & 82700
using grep (BSD grep) 2.5.1-FreeBSD on Darwin 14.4.0.
Please comment, if and as this requires adjustment / further detail.

Related

sed filtering of tshark dump: speedup

I'm filtering output generated by tshark with sed to only show Info and Bytes of packages. But when the tshark output gets too long, this can be too slow in some instances. I have not much experience with sed: Is there any way to speed this up? I think at the moment I run sed 5 times on each line with different expressions, but I remember researching this and find out that there is no "or" in my sed version. Would awk be faster?
tshark -P -x -r btdebug.snoop | sed -n 's/^.*\(Reassembled.*\)$/\1/p;s/^.*\(Sent.*\)$/\1/p;s/^.*\(Rcvd.*\)$/\1/p;s/^.*\(PT=.*\)$/\1/p;s/^[0-9a-f]*\s\(\(\s[0-9a-f][0-9a-f]\)\{1,16\}\).*$/\1/p'
EDIT: https://drive.google.com/open?id=1qzXkV9F9rjGmlvDo7CoxwjXZVRbQWDe- with example outputs from just tshark and after sed.

The following code is showing error in AIX server but working fine in Red Hat server

string='binddn:cn=SxX.UXxxxM-E2A,OU=CA,OU=AI INFRASTRUCTURE,DC=i,DC=companyname,DC=com'
The working peice of code in Red Hat
dn=($(grep -oi 'cn=[^():]*dc=com' <<< "$string"))
I modified the code for AIX and modified code is
dn=($(grep -xi 'cn=[^():]*dc=com' "$string"))
The code is working perfect in RedHat server, the output in redhat is
dn[0]="cn=SxX.UXxxxM-E2A,OU=CA,OU=AI INFRASTRUCTURE,DC=i,DC=companyname,DC=com"
The error in AIX is
grep: can't open binddn:cn=SxX.UXxxxM-E2A,OU=CA,OU=AI INFRASTRUCTURE,DC=i,DC=companyname,DC=com
Edited:
Another example:
string = "userbasedn:DC=i,DC=companyname,DC=com?subtree?(&(objectcategory=person)(uidNumber=*)(|(memberOf:1.2.840.113556.1.4.1941:=cn=example1,OU=GROUPS,OU=AI INFRASTRUCTURE,DC=i,DC=companyname,DC=com)(memberOf:1.2.840.11.1.4.1941:=cn=example2,OU=GROUPS,OU=AI INFRASTRUCTURE,DC=i,DC=companyname,DC=com)))
groupbasedn:DC=i,DC=companyname,DC=com?subtree?(&(objectcategory=group)(gidNumber=*))"
expected output
dn[0]=cn=example1,OU=GROUPS,OU=AI INFRASTRUCTURE,DC=i,DC=companyname,DC=com
dn[1]=cn=example2,OU=GROUPS,OU=AI INFRASTRUCTURE,DC=i,DC=companyname,DC=com
If you can use awk, try this:
echo "$string" | awk -F"cn=" 'NF>1{$0=tolower($0);for (i=2;i<=NF;i++) {split($i,a,"dc=com)");print FS a[1]"dc=com"}}'
cn=example1,ou=groups,ou=ai infrastructure,dc=i,dc=companyname,dc=com
cn=example2,ou=groups,ou=ai infrastructure,dc=i,dc=companyname,dc=com
The second argument to grep is a file name, not a string. AIX correctly reports that it cannot find a file with that name. You would get the same error on Red Hat if you tried the same command.
Unfortunately, the -x option doesn't do what you hope; it checks whether the entire line of input matches the regex. Again, you will find exactly the same behavior on Red Hat.
According to the AIX grep manual page it supports the -o option just fine, though.
The Bash "here string" syntax <<<"string" is not available if you don't have Bash, but it is easy to rephrase portably:
printf '%s\n' "$string" |
grep -oi 'cn=[^():]*dc=com'
If you don't have grep -o, try with sed:
printf '%s\n' "$string" |
sed -n 's/.*\(cn=[^():]*dc=com\).*/\1/p'
This is not exactly the same, because the first .* is greedy. If you expect more than one match on a line, you will need a slightly more complex regex.

Grep bitcoin address with regexp

I'm trying to find bitcoin adresses with grep, but without luck. What the problem? The main command is
grep -R --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$"
Try this:
egrep --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$" filename

Making a mistake with | (or) in regular expressions

I'm sure I'm doing something really obviously wrong here, but I can't figure out what. Using grep from a bash shell, I have a file test.txt:
ABC123
ABC456
ABC789
DEF123
DEF456
DEF789
Now at the command line:
$ grep ABC test2.txt
ABC123
ABC456
ABC789
$ grep DEF test2.txt
DEF123
DEF456
DEF789
So those work great. Now, I expect the following command to print the whole file, but:
$ grep ABC\|DEF test2.txt
$ grep (ABC)\|(DEF) test2.txt
-bash: syntax error near unexpected token `ABC'
$ grep \(ABC\)\|\(DEF\) test2.txt
$ grep 'ABC|DEF' test2.txt
What am I doing wrong?
Turn on the extended regex with -E:
grep -E "ABC|DEF" test2.txt
As others have pointed out, standard grep command does not support the or syntax. Unfortunately, from there, things are a mishmash.
Some systems have a egrep that does offer or syntax.
Some systems can use a -E or -P (for Perl) flag to extend grep syntax.
Some systems have both the -E and egrep that do the same thing. This implies that there are systems out there where grep -E and egrep are not the same. (sad but true).
Some systems now use the extended regular expressions in their standard grep command. Apparently, your system doesn't.
Read your manpages to see what your system does support. Some systems have a manpage for re_formatthat will explain what they support and don't support in extended format.
Then again, you could always just use a Perl one-liner:
$ perl -ne "print if /(ABC)|(DEF)/" test.txt
At least you know all the stuff that supports.
I don't think standard syntax supports it. You could use -P switch if available:
grep -P "(ABC|DEF)" test2.txt
Use egrep instead, which is the same as using grep -E:
egrep 'ABC|DEF' test2.txt

Why does this `grep -o` fail, and how should I work around it?

Given the input
echo abc123def | grep -o '[0-9]*'
On one computer (with GNU grep 2.5.4), this returns 123, and on another (with GNU grep 2.5.1) it returns the empty string. Is there some explanation for why grep 2.5.1 fails here, or is it just a bug? I'm using grep -o in this way in a bash script that I'd like to be able to run on different computers (which may have different versions of grep). Is there a "right way" to get consistent behavior?
Yes, 2.5.1's -o handling was buggy:
http://www.mail-archive.com/bug-grep#gnu.org/msg00993.html
Grep is probably not the right tool for this; sed or tr or even perl might be better depending on what the actual task is.
you can use the shell. its faster
$ str=abc123def
$ echo ${str//[a-z]/}
123
I had the same issue and found that egrep was installed on that machine. A quick solution was using
echo abc123def | egrep -o '[0-9]*'
This will give similar results:
echo abc123def | sed -n 's/[^0-9]*\([0-9]\+\).*/\1/p'
Your question is a near-duplicate of this one.
Because you are using a regex so you must use either:
grep -E
egrep (like Sebastian posted).
Good luck!