I've been trying to do this for the last two days. I read a lot of tutorials and I learned a lot of new things but so far I couldn't manage to achieve what I'm trying to do. Let's say this is the command line output:
Johnny123 US 224
Johnny123 US 145
Johnny123 US 555
Johnny123 US 344
Robert UK 4322
Robert UK 52
Lucas FR 344
Lucas FR 222
Lucas FR 8945
I want to print the lines which match 'the first field (Lucas) of last line'.
So, I want to print out:
Lucas FR 344
Lucas FR 222
Lucas FR 8945
Notes:
What I'm trying to print have a different line count each time so I can't do something like returning the last 3 lines only.
The first field doesn't have a specific pattern that I can use to print.
Here is another way using tac and awk:
tac file | awk 'NR==1{last=$1}$1==last' | tac
Lucas FR 344
Lucas FR 222
Lucas FR 8945
The last tac is only needed if the order is important.
awk 'NR==FNR{key=$1;next} $1==key' file file
or if you prefer
awk '{val[$1]=val[$1] $0 RS; key=$1} END{printf "%s", val[key]}' file
This might work for you (GNU sed):
sed -nr 'H;g;/^(\S+\s).*\n\1[^\n]*$/!{s/.*\n//;h};$p' file
Store lines with duplicate keys in the hold space. At change of key remove previous lines. At end-of-file print out what remains.
Related
I have a text file which I need to extract a match from in a bash script. There might be more than one match and everything else is supposed to be discarded.
Sample snippet of input.txt file content:
PART TWO OF TWO PARTS-
E RESNO 56/20 56/30 54/40 52/50 TUDEP
EAST LVLS NIL
WEST LVLS 310 320 330 340 350 360 370 380 390
EUR RTS WEST NIL
NAR NIL-
REMARKS.
1.TMI IS 142 AND OPERATORS ARE REMINDED TO INCLUDE THE
TMI NUMBER AS PART OF THE OCEANIC CLEARANCE READ BACK.
2.ADS-C AND CPDLC MANDATED OTS ARE AS FOLLOWS
TRACK A 350 360 370 380 390
TRACK B 350 360 370 380 390
I try to match for 142 from the line
1.TMI IS 142 AND OPERATORS ARE REMINDED TO INCLUDE THE
The match is always a number (one to three digits, may have leading zeroes) and always preceded by TMI IS.
My experiments so far led to nothing: I tried .*TMI IS ([0-9]+).* with the following sed command in my bash script
sed -n 's/.*TMI IS \([0-9]+\).*/\1/g' input.txt > output.txt
but only got an empty output.txt.
My script runs in GNU Bash-4.2. Where do I make my mistake? I ran out of ideas so your input is highly appreciated!
Thanks,
Chris
Two moments about your sed approach to make it work:
+ quantifier should be escaped in sed basic regular expressions
to print matched pattern use p subcommand:
sed -n 's/.*TMI IS \([0-9]\+\).*/\1/gp' input.txt
142
To get only the first match for your current format use:
sed -n 's/^\S\+TMI IS \([0-9]\+\).*/\1/gp' input.txt
With GNU grep:
$ grep -oP 'TMI IS \K([0-9]*)' input.txt
142
You could also do this using perl as an alternative to the above:
$ perl -nle 'print $1 if /TMI IS (\d+)/;' < input.txt
142
I'm looking for an easy way to create lists of Twitter #handles based on SocialBakers data (copy/paste into TextMate).
I've tried using the following RegEx, which I found here on StackOverflow, but unfortunately it doesn't work the way I want it to:
^(?!.*#([\w+])).*$
While the expression above deletes all lines without #handles, I'd like the RegEx to delete everything before and after the #handle as well as lines without #handles.
Example:
1
katyperry KATY PERRY (#katyperry)
Followings 158
Followers 82 085 596
Rating
5
Worst012345678910Best
2
justinbieber Justin Bieber (#justinbieber)
254 399
74 748 878
2
Worst012345678910Best
3
taylorswift13 Taylor Swift (#taylorswift13)
245
70 529 992
Desired result:
#katyperry
#justinbieber
#taylorswift13
Thanks in advance for any help!
Something like this:
cat file | perl -ne 'while(s/(#[a-z0-9_]+)//gi) { print $1,"\n"}'
This will also work if you have lines with multiple #handles in.
A Twitter handle regex is #\w+. So, to remove everything else, you need to match and capture the pattern and use a backreference to this capture group, and then just match any character:
(#\w+)|.
Use DOTALL mode to also match newline symbols. Replace with $1 (or \1, depending on the tool you are using).
See demo
Strait REGEX Tested in Caret:
#.*[^)]
The above will search for and any given and exclude close parenthesis.
#.*\b
The above here does the same thing in Caret text editor.
How to awk and sed this:
Get usernames as well:
$ awk '/#.*/ {print}' test
katyperry KATY PERRY (#katyperry)
justinbieber Justin Bieber (#justinbieber)
taylorswift13 Taylor Swift (#taylorswift13)
Just the Handle:
$ awk -F "(" '/#.*/ {print$2}' test | sed 's/)//g'
#katyperry
#justinbieber
#taylorswift13
A look at the test file:
$ cat test
1
katyperry KATY PERRY (#katyperry)
Followings 158
Followers 82 085 596
Rating
5
Worst012345678910Best
2
justinbieber Justin Bieber (#justinbieber)
254 399
74 748 878
2
Worst012345678910Best
3
taylorswift13 Taylor Swift (#taylorswift13)
245
70 529 992
Bash Version:
$ bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)
Copyright (C) 2007 Free Software Foundation, Inc.
here is an example of my datafile.txt
jones
dave
mike
dave
nathan
ben
james
jim
dave
dave
jones
bill
john
i am using grep to find string dave which is fine
grep "dave" datafile.txt >> duplicate.txt
i need to find which line # that string dave was found
first match dave is on line # 2
next dave is on line # 4
next dave is on line # 9
next dave is on line # 10
and 2nd query to find the line count between the last occurrence
so first match is 0
2nd match is after 2 lines
third match is after 5 lines
fourth match is after 1 line
so need to know the exact line number as well as the line number
simple awk can do the work for you
$ awk '/dave/{print NR}' input
2
4
9
10
What it does
/dave/ matches /dave/ on the line
{print NR} prints the NR, line number.
And
$ awk '/dave/{print prev?NR-prev:0; prev=NR}' input
0
2
5
1
What it does?
prev variable contains the previous line which matches the /dave/
prev?NR-prev:0 if prev is set, then print NR-prev else print 0
prev=NR sets the prev as the current NR
I was reading GNU awk manual but I didnt find a regular expression wich whom I can match a string just once.
For example from the files aha_1.txt, aha_2.txt, aha_3.txt, .... I would like to print the second column $2 from the first time ana appears in the files (aha_1.txt, aha_2.txt, aha_3.txt, ....). In addition, the same thing when pedro appears.
aha_1.txt
luis 321 487
ana 454 345
pedro 341 435
ana 941 345
aha_2.txt
pedro 201 723
gusi 837 134
ana 319 518
cindy 738 278
ana 984 265
.
.
.
.
Meanwhile I did this but it counts all the cases not just the first time
/^ana/ {print $2 }
/^pedro/ {print $2 }
Thanks for your help :-)
Just call the exit command after printing the first value(second column in the line which starts with the string ana).
$ awk '$1~/^ana$/{print $2; exit}' file
454
Original question
Only processing one file.
awk '/ana/ { if (ana++ == 0) print $2 }' aha.txt
or
awk '/ana/ && ana++ == 0 { print $2 }' aha.txt
Or, if you don't need to do anything else, you can exit after printing, as suggested by Avinash Raj in his answer.
Revised question
I have many files (aha.txt, aha_1.txt, aha_2.txt, ...) each file has ana inside and I need just to take the fist time ana appears in each file and the output has to be one file.
That's sightly different as a question. If you have GNU grep, you can use (more or less):
grep -m1 -e ana aha*.txt
That will list the whole line, not just column 2, and will list the filenames too, so it isn't a perfect match.
Using awk, you have to work a bit more:
awk 'FILENAME != old_file { ana = 0; old_file = FILENAME }
/ana/ { if (ana++ == 0) print $2 }' aha*.txt
I have the following bogus data:
Dominik Dryja|4111 2386 0873 0189|0315
Laivonen Eero|5111 0620 0750 8041|0813
Jukka Valimaa|5111 6500 0489 0035|0415
Rafael Diaz de Leon|4111 3036 6209 4796|0516
Mr Jonathan Bird|4111 6150 0291 7415|0215
ERRANTE VINCENZO|4222 6111 0038 6639|0114
YOSHIO MOTOKI|5222 3200 0374 7129|0513
I. A. VLACHOGIANNIS|4333 0115 6936 2003|0315
Soumya Kanti Deb|4333 0590 0165 4877|1019
WU KE ZHAN|5444 8213 7236 0431|0716
I try to strip the space ONLY from the digit number to look like this:
Dominik Dryja|4111238608730189|0315
Laivonen Eero|5111062007508041|0813
Jukka Valimaa|5111650004890035|0415
Rafael Diaz de Leon|4111303662094796|0516
Mr Jonathan Bird|4111615002917415|0215
ERRANTE VINCENZO|4222611100386639|0114
YOSHIO MOTOKI|5222320003747129|0513
I. A. VLACHOGIANNIS|4333011569362003|0315
Soumya Kanti Deb|4333059001654877|1019
WU KE ZHAN|5444821372360431|0716
Tried sed -r '#|([0-9]{4})\ ([0-9]{4})\ ([0-9]{4})\ ([0-9]{4})|#\1\2\3\4#g'
for some reason without success. Any idea where I'm mistaken?
Thanks!
You can simplify your sed:
sed 's/\([0-9]\{4\}\) /\1/g' inFile
Assuming there's always a single space between numbers:
sed 's/\([0-9]\) \([0-9]\)/\1\2/g'
Works with your example.
The code is simple - remove all single spaces if they happen between two digits.