How can I extract Twitter #handles from a text with RegEx?

How can I extract Twitter #handles from a text with RegEx? - regex

I'm looking for an easy way to create lists of Twitter #handles based on SocialBakers data (copy/paste into TextMate).
I've tried using the following RegEx, which I found here on StackOverflow, but unfortunately it doesn't work the way I want it to:
^(?!.*#([\w+])).*$
While the expression above deletes all lines without #handles, I'd like the RegEx to delete everything before and after the #handle as well as lines without #handles.
Example:
1
katyperry KATY PERRY (#katyperry)
Followings 158
Followers 82 085 596
Rating
5
Worst012345678910Best
2
justinbieber Justin Bieber (#justinbieber)
254 399
74 748 878
2
Worst012345678910Best
3
taylorswift13 Taylor Swift (#taylorswift13)
245
70 529 992
Desired result:
#katyperry
#justinbieber
#taylorswift13
Thanks in advance for any help!

Something like this:
cat file | perl -ne 'while(s/(#[a-z0-9_]+)//gi) { print $1,"\n"}'
This will also work if you have lines with multiple #handles in.

A Twitter handle regex is #\w+. So, to remove everything else, you need to match and capture the pattern and use a backreference to this capture group, and then just match any character:
(#\w+)|.
Use DOTALL mode to also match newline symbols. Replace with $1 (or \1, depending on the tool you are using).
See demo

Strait REGEX Tested in Caret:
#.*[^)]
The above will search for and any given and exclude close parenthesis.
#.*\b
The above here does the same thing in Caret text editor.
How to awk and sed this:
Get usernames as well:
$ awk '/#.*/ {print}' test
katyperry KATY PERRY (#katyperry)
justinbieber Justin Bieber (#justinbieber)
taylorswift13 Taylor Swift (#taylorswift13)
Just the Handle:
$ awk -F "(" '/#.*/ {print$2}' test | sed 's/)//g'
#katyperry
#justinbieber
#taylorswift13
A look at the test file:
$ cat test
1
katyperry KATY PERRY (#katyperry)
Followings 158
Followers 82 085 596
Rating
5
Worst012345678910Best
2
justinbieber Justin Bieber (#justinbieber)
254 399
74 748 878
2
Worst012345678910Best
3
taylorswift13 Taylor Swift (#taylorswift13)
245
70 529 992
Bash Version:
$ bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)
Copyright (C) 2007 Free Software Foundation, Inc.

Related

Using grep, how to match beginning of line with pattern from stdin

I have a one-liner that prints out a series a numbers:
124
132
186
I am then piping this output into grep to match these numbers to the beginning of lines in another file but sometimes the second number in the line matches one of the patterns and I get an incorrect match like so:
$ get_id_command | grep -f - users.list
124 => 3456, Charles Charmichael, ccharmichael
132 => 2498, Sarah Walker, swalker
186 => 8934, John Casey, jcasey
240 => 1245, Morgan Grimes, mgrimes
What options do I need for grep to only match patterns at the beginning of the line? I would really like to keep this as a one-linter.

Prepend a circumflex to each line of your file and it will work. Circumflex does indicate the line start within the pattern. So modify your users.list as described, e.g.
sed -Ei 's|(.*)|^\1|' users.list
After that you should get the desired result by your command
$ get_id_command | grep -f - users.list

Bash/sed: delete everything from text file except match(es)

I have a text file which I need to extract a match from in a bash script. There might be more than one match and everything else is supposed to be discarded.
Sample snippet of input.txt file content:
PART TWO OF TWO PARTS-
E RESNO 56/20 56/30 54/40 52/50 TUDEP
EAST LVLS NIL
WEST LVLS 310 320 330 340 350 360 370 380 390
EUR RTS WEST NIL
NAR NIL-
REMARKS.
1.TMI IS 142 AND OPERATORS ARE REMINDED TO INCLUDE THE
TMI NUMBER AS PART OF THE OCEANIC CLEARANCE READ BACK.
2.ADS-C AND CPDLC MANDATED OTS ARE AS FOLLOWS
TRACK A 350 360 370 380 390
TRACK B 350 360 370 380 390
I try to match for 142 from the line
1.TMI IS 142 AND OPERATORS ARE REMINDED TO INCLUDE THE
The match is always a number (one to three digits, may have leading zeroes) and always preceded by TMI IS.
My experiments so far led to nothing: I tried .*TMI IS ([0-9]+).* with the following sed command in my bash script
sed -n 's/.*TMI IS \([0-9]+\).*/\1/g' input.txt > output.txt
but only got an empty output.txt.
My script runs in GNU Bash-4.2. Where do I make my mistake? I ran out of ideas so your input is highly appreciated!
Thanks,
Chris

Two moments about your sed approach to make it work:
+ quantifier should be escaped in sed basic regular expressions
to print matched pattern use p subcommand:
sed -n 's/.*TMI IS \([0-9]\+\).*/\1/gp' input.txt
142
To get only the first match for your current format use:
sed -n 's/^\S\+TMI IS \([0-9]\+\).*/\1/gp' input.txt

With GNU grep:
$ grep -oP 'TMI IS \K([0-9]*)' input.txt
142

You could also do this using perl as an alternative to the above:
$ perl -nle 'print $1 if /TMI IS (\d+)/;' < input.txt
142

Regex for soccer data

Why isn't my regex working? It just returns back the original file. My file looks like this (for a few hundred lines):
1 Germany 1765 0 Equal
2 Argentina 1631 0 Equal
3 Colombia 1488 1 Up
4 Netherlands 1456 -1 Down
5 Belgium 1444 0 Equal
6 Brazil 1291 1 Up
7 Uruguay 1243 -1 Down
8 Spain 1228 -1 Down
9 France 1202 1 Up
...
192 US Virgin Islands 28 -1 Down
And I want this:
Germany,1
Argentina,2
Colombia,3
...
US Virgin Islands,192
This is the regex I tried:
sed 's/\([0-9]*\)\t\([a-zA-Z]*\)/\2,\1/g' <fifa.csv >fifa.csv
But it just returns the original file.
EDIT:
Now I tried
sed 's/\([0-9]*\)\t\([a-zA-Z]*\)/\2,\1/g' <fifa.csv >fifa.csv
and got
,1 Germany,,1765Equal,0,
,2 Argentina,,1631Equal,0,
,3 Colombia,,1488Up,1,
,4 Netherlands,,1456-Down,1,
,5 Belgium,,1444Equal,0,

You could try the below sed command if the fields are tab-separated.
sed 's/^\([0-9]\+\)\t\([^\t]*\).*/\2,\1/' file
Add the inline-edit option -i to save the changes made.
sed -i 's/^\([0-9]\+\)\t\([^\t]*\).*/\2,\1/' file
^ means start of the line anchor. + would repeat the previous character one or more times. Basic sed uses BRE so you need to escape the + to do the functionality of repeating the previous character one or more times. [^\t]* matches any character but not of \t tab character zero or more times.

The following is what you are looking for. The -i option specifies that files are to be edited in-place.
sed -i 's/^\([0-9]\+\)\t\([^\t]*\).*/\2,\1/' fifa.csv

awk '{print( $2 "," $1)}' YourFile
not a sed but easier to manage

Sed command garbled with very easy mutiline regex in bash

I'm again garbled with sed command, because most probably i have very old version of sed but according to my limitations i couldn't change the version of 'sed' (!)
My question is this i wrote such an easy regex that fits with my string file such as:
/[^,]*$/mg
My string file is this :
23:53:20,650
23:53:20,654
23:53:20,655
23:53:20,656
23:53:21,238
23:53:21,240
23:53:21,302
23:53:21,303
23:53:21,304
23:53:21,305
23:53:21,889
23:53:21,890
23:53:21,896
23:53:21,897
23:53:21,898
23:53:21,899
23:53:22,492
23:53:22,538
23:53:22,539
23:53:23,109
23:53:23,110
23:53:23,115
23:53:23,117
23:53:23,118
23:53:23,119
23:53:23,690
23:53:23,721
23:53:23,722
23:53:24,275
23:53:24,276
23:53:24,313
23:53:24,316
23:53:24,317
23:53:24,318
23:53:24,854
23:53:24,888
23:53:24,889
23:53:24,890
23:53:24,891
23:53:50,676
23:53:50,677
23:53:50,711
23:53:50,713
23:53:50,714
23:53:51,257
23:53:51,258
23:53:51,296
23:53:51,297
23:53:51,298
23:53:51,820
23:53:51,822
23:53:51,823
23:53:52,358
23:53:52,364
23:53:52,367
23:53:52,909
23:53:52,910
23:53:52,936
23:53:52,939
23:53:52,941
23:53:52,944
23:53:52,945
23:53:52,946
23:53:52,949
23:53:52,953
23:53:52,956
23:53:52,959
23:53:52,963
23:53:52,966
23:53:52,970
23:53:52,971
23:53:52,974
23:53:52,978
23:53:52,980
23:53:52,983
23:53:52,984
23:53:52,986
23:53:52,987
23:53:52,989
23:53:52,990
23:53:52,991
23:53:52,994
23:53:52,995
23:53:52,999
23:53:53,001
23:53:53,002
23:53:53,004
23:53:53,005
23:53:53,007
23:53:53,010
23:53:53,026
23:53:53,027
23:53:53,081
23:53:53,082
23:53:53,083
23:53:53,085
07:32:54,519
07:32:54,521
07:32:54,537
07:32:54,538
07:32:54,539
07:32:54,540
07:32:54,541
07:32:54,542
07:32:54,543
07:32:54,544
07:32:54,545
07:32:54,546
07:32:54,547
07:32:54,548
07:32:54,549
07:32:54,550
I'm trying to get the values after the comma then assign them into array, when I used the sed command like :
`sed -n '/[^,]*$/mg'` file
It says command garbled, i read about multiline sed but i still couldn't reach to solution, i am new to regexes so the help will be appreciated.
Thank you in advance!

If you are using a "recent" bash, I think you can use cut and assign extracted values to an array:
numbers="$(cut -d',' -f2 filename.txt)"
array_numbers=( $numbers )

If you want to get the values after comma then you could use the below sed command which removes the values from the start upto the first comma.
sed 's/^[^,]*,//' file
OR
sed 's/^.*,//' file
Example:
$ echo '23:53:22,492' | sed 's/^[^,]*,//'
492
$ echo '23:53:22,492' | sed 's/^.*,//'
492

sed s/.*,// file
would match the till the first , are substitute the match wth nothing, which effectively gives the values after comma
for the input file
23:53:20,650
23:53:20,654
23:53:20,655
23:53:20,656
23:53:21,238
23:53:21,240
23:53:21,302
23:53:21,303
23:53:21,304
23:53:21,305
23:53:21,889
23:53:21,890
23:53:21,896
23:53:21,897
23:53:21,898
23:53:21,899
23:53:22,492
23:53:22,538
will produce output as
650
654
655
656
238
240
302
303
304
305
889
890
896
897
898
899
492
538

How do you extract IP addresses from files using a regex in a linux shell?

How to extract a text part by regexp in linux shell? Lets say, I have a file where in every line is an IP address, but on a different position. What is the simplest way to extract those IP addresses using common unix command-line tools?

You could use grep to pull them out.
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

Most of the examples here will match on 999.999.999.999 which is not technically a valid IP address.
The following will match on only valid IP addresses (including network and broadcast addresses).
grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' file.txt
Omit the -o if you want to see the entire line that matched.

This works fine for me in access logs.
cat access_log | egrep -o '([0-9]{1,3}\.){3}[0-9]{1,3}'
Let's break it part by part.
[0-9]{1,3} means one to three occurrences of the range mentioned in []. In this case it is 0-9. so it matches patterns like 10 or 183.
Followed by a '.'. We will need to escape this as '.' is a meta character and has special meaning for the shell.
So now we are at patterns like '123.' '12.' etc.
This pattern repeats itself three times(with the '.'). So we enclose it in brackets.
([0-9]{1,3}\.){3}
And lastly the pattern repeats itself but this time without the '.'. That is why we kept it separately in the 3rd step. [0-9]{1,3}
If the ips are at the beginning of each line as in my case use:
egrep -o '^([0-9]{1,3}\.){3}[0-9]{1,3}'
where '^' is an anchor that tells to search at the start of a line.

I usually start with grep, to get the regexp right.
# [multiple failed attempts here]
grep '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*' file # good?
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file # good enough
Then I'd try and convert it to sed to filter out the rest of the line. (After reading this thread, you and I aren't going to do that anymore: we're going to use grep -o instead)
sed -ne 's/.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*/\1/p # FAIL
That's when I usually get annoyed with sed for not using the same regexes as anyone else. So I move to perl.
$ perl -nle '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ and print $&'
Perl's good to know in any case. If you've got a teeny bit of CPAN installed, you can even make it more reliable at little cost:
$ perl -MRegexp::Common=net -nE '/$RE{net}{IPV4}/ and say $&' file(s)

You can use sed. But if you know perl, that might be easier, and more useful to know in the long run:
perl -n '/(\d+\.\d+\.\d+\.\d+)/ && print "$1\n"' < file

I wrote a little script to see my log files better, it's nothing special, but might help a lot of the people who are learning perl. It does DNS lookups on the IP addresses after it extracts them.

You can use some shell helper I made:
https://github.com/philpraxis/ipextract
included them here for convenience:
#!/bin/sh
ipextract ()
{
egrep --only-matching -E '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
}
ipextractnet ()
{
egrep --only-matching -E '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/[[:digit:]]+'
}
ipextracttcp ()
{
egrep --only-matching -E '[[:digit:]]+/tcp'
}
ipextractudp ()
{
egrep --only-matching -E '[[:digit:]]+/udp'
}
ipextractsctp ()
{
egrep --only-matching -E '[[:digit:]]+/sctp'
}
ipextractfqdn ()
{
egrep --only-matching -E '[a-zA-Z0-9]+[a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,}'
}
Load it / source it (when stored in ipextract file) from shell:
$ . ipextract
Use them:
$ ipextract < /etc/hosts
127.0.0.1
255.255.255.255
$
For some example of real use:
ipextractfqdn < /var/log/snort/alert | sort -u
dmesg | ipextractudp

For those who want a ready solution for getting IP addresses from apache log and listing occurences of how many times IP address has visited website, use this line:
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' error.log | sort | uniq -c | sort -nr > occurences.txt
Nice method to ban hackers. Next you can:
Delete lines with less than 20 visits
Using regexp cut till single space so you will have only IP addresses
Using regexp cut 1-3 last numbers of IP addresses so you will have only network addresses
Add deny from and a space at the beginning of each line
Put the result file as .htaccess

grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}"

I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.
EDIT: Just to make it more like a complete program, you could do something like the following (not tested):
#!/usr/bin/perl -w
use strict;
while (<>) {
if (/(\d+\.\d+\.\d+\.\d+)/) {
print "$1\n";
}
}
This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretut gives you a more detailed tutorial on regular expressions.

All of the previous answers have one or more problems. The accepted answer allows ip numbers like 999.999.999.999. The currently second most upvoted answer requires prefixing with 0 such as 127.000.000.001 or 008.008.008.008 instead of 127.0.0.1 or 8.8.8.8. Apama has it almost right, but that expression requires that the ipnumber is the only thing on the line, no leading or trailing space allowed, nor can it select ip's from the middle of a line.
I think the correct regex can be found on http://www.regextester.com/22
So if you want to extract all ip-adresses from a file use:
grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt
If you don't want duplicates use:
grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt | sort | uniq
Please comment if there still are problems in this regex. It easy to find many wrong regex for this problem, I hope this one has no real issues.

Everyone here is using really long-handed regular expressions but actually understanding the regex of POSIX will allow you to use a small grep command like this for printing IP addresses.
grep -Eo "(([0-9]{1,3})\.){3}([0-9]{1,3})"
(Side note)
This doesn't ignore invalid IPs but it is very simple.

I have tried all answers but all of them had one or many problems that I list a few of them.
Some detected 123.456.789.111 as valid IP
Some don't detect 127.0.00.1 as valid IP
Some don't detect IP that start with zero like 08.8.8.8
So here I post a regex that works on all above conditions.
Note : I have extracted more than 2 millions IP without any problem with following regex.
(?:(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)\.){3}(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)

I wrote an informative blog article about this topic: How to Extract IPv4 and IPv6 IP Addresses from Plain Text Using Regex.
In the article there's a detailed guide of the most common different patterns for IPs, often required to be extracted and isolated from plain text using regular expressions.
This guide is based on CodVerter's IP Extractor source code tool for handling IP addresses extraction and detection when necessary.
If you wish to validate and capture IPv4 Address this pattern can do the job:
\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
or to validate and capture IPv4 Address with Prefix ("slash notation"):
\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/[0-9]{1,2})\b
or to capture subnet mask or wildcard mask:
(255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)
or to filter out subnet mask addresses you do it with regex negative lookahead:
\b((?!(255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)))(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
For IPv6 validation you can go to the article link I have added at the top of this answer.
Here is an example for capturing all the common patterns (taken from CodVerter`s IP Extractor Help Sample):
If you wish you can test the IPv4 regex here.

You could use awk, as well. Something like ...
awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' file
May require cleaning. just a quick and dirty response to shows basically how to do it with awk.

The awk example above didn't work for me, and I needed to do it with awk specifically, so I came up with this method:
$ awk '{match($0,/[0-9]{1,3}+\.[0-9]{1,3}+\.[0-9]{1,3}+\.[0-9]{1,3}+/); ip = substr($0,RSTART,RLENGTH); print ip}' your_sample_file.log
You can also just use pipes if you're getting the data from somewhere else. Eg, ipconfig
I also realized the method matches invalid IP addresses.
Here is an extended version that only matches valid IPv4 Addresses:
$ awk 'match($0, /(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/) {print substr($0, RSTART, RLENGTH)}' sample_file.log
Hope it helps someone else.

It's a REALLY brute-force type of solution, and I haven't had time to handle things like subnet masks.
Since many awk variants lack backreferences in regex, range notation in regex {n,m}, FPAT ability, an array target for match(), I have to try my best to emulate some of that functionality here.
The regex itself is very basic, and it's very much intentional, since each of candidates that passed through the first layer filter will then be fed into the ip4 validation function to ensure values are in range.
Additionally, I use a second array to handle the duplicate scenario (although it's only de-duped in the ASCII string sense - leading zeros, for now, will show up multiple times for each unique ASCII string representation of it).
I know it's ultra brute-force and unseemly of a solution - there's only so much lemonade I can make out of the lemons I have.
echo "${bbbbbbbb}" \
\
| mawk 'function validIP4(_,__,___) {
__^=__=___=4;—-__
if(--___!=gsub("[.]","_",_)) {
return !___ }
++___
do {
if ((+_<-_)||(__<+_)||(--___<-___)) {
_="[|]"
break
} } while (sub("^[^_]+[_]","",_))
return _!="[|]"
} BEGIN { FS = RS = "^$"
__=(__= (__="[0]*([012]?[0-9])?[0-9][.]")__)__
sub("...$","",__)
} END {
gsub(/[^0-9.]+/,OFS)
gsub(__,"=&~")
gsub(/[~][^0-9.=~]+[=]/,"~=")
gsub(/^[^=~]+[=~]|[=~][^=~]+$/,"")
split($(_<_),___,"[=~]+")
for(_ in ___) {
if ( ! (____[__=___[_]]++)) {
if (validIP4(__)) {
print (__) } } } }' \
\
| gsort -t'.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n \
| gcat -n \
| rs -t -c$'\n' -C= 0 4 \
| column -s= -t \
| lgp3 5
1 00.69.84.243 76 23.108.43.3 151 79.127.56.148 226 172.241.192.165
2 00.71.110.228 77 23.108.43.19 152 80.48.119.28 227 172.245.220.154
3 00.105.215.18 78 23.108.43.55 153 80.76.60.2 228 175.196.182.58
4 00.123.2.171 79 23.108.43.94 154 80.244.229.102 229 176.74.9.62
5 00.123.228.2 80 23.108.43.120 155 81.8.52.78 230 176.214.97.55
6 00.201.223.164 81 23.108.43.208 156 83.166.241.233 231 177.128.44.131
7 01.51.106.70 82 23.108.43.244 157 85.25.4.28 232 177.129.53.114
8 01.144.14.232 83 23.108.75.98 158 85.25.91.156 233 178.88.185.2
9 01.148.85.50 84 23.108.75.164 159 85.25.91.161 234 180.180.171.123
10 01.174.10.170 85 23.225.64.59 160 85.25.117.171 235 180.183.15.198
11 02.64.120.219 86 36.37.177.186 161 85.25.150.32 236 180.250.153.129
12 02.68.128.214 87 36.94.161.219 162 85.25.201.22 237 181.36.230.242
13 02.129.196.242 88 37.48.82.87 163 85.195.104.71 238 181.191.141.43
14 02.134.127.15 89 37.144.180.52 164 85.208.211.163 239 182.253.186.140
15 03.28.246.130 90 41.65.236.56 165 85.209.149.130 240 185.24.233.208
16 03.73.194.2 91 41.65.251.86 166 88.119.195.35 241 185.61.152.137
17 03.80.77.1 92 41.79.65.241 167 91.107.15.221 242 185.74.7.51
18 03.81.77.194 93 41.161.92.138 168 91.188.246.246 243 185.93.205.236
19 03.97.200.52 94 41.164.68.42 169 93.184.8.74 244 185.138.114.113
20 3.120.173.144 95 41.164.68.194 170 94.16.15.100 245 186.3.85.131
21 03.134.97.233 96 41.205.24.155 171 94.75.76.3 246 186.5.117.82
22 03.148.72.192 97 43.255.113.232 172 94.228.204.229 247 186.46.168.42
23 03.150.113.147 98 45.5.68.18 173 95.181.150.121 248 186.96.50.39
24 03.159.46.18 99 45.5.68.25 174 95.181.151.105 249 186.154.211.106
25 03.162.181.132 100 45.43.63.230 175 110.74.200.177 250 186.167.48.138
26 03.177.45.7 101 45.67.212.99 176 112.163.123.242 251 186.202.176.153
27 03.177.45.10 102 45.67.230.13 177 113.161.59.136 252 186.233.186.60
28 03.177.45.11 103 45.71.203.110 178 115.87.196.88 253 186.251.71.193
29 03.217.169.100 104 45.87.249.80 179 116.212.155.229 254 187.217.54.84
30 03.232.215.194 105 45.122.233.76 180 117.54.114.101 255 188.94.225.177
31 04.208.138.14 106 45.131.213.170 181 117.54.114.102 256 188.95.89.81
32 04.244.75.205 107 45.158.158.29 182 117.54.114.103 257 188.133.153.143
33 5.39.189.39 108 45.179.193.70 183 119.82.241.21 258 188.138.89.50
34 05.149.219.201 109 45.183.142.126 184 120.72.20.225 259 188.138.90.226
35 5.149.219.201 110 45.184.103.68 185 121.1.41.162 260 188.166.218.243
36 5.189.229.42 111 45.184.155.7 186 123.31.30.100 261 190.128.225.115
37 07.151.182.247 112 45.189.113.63 187 125.25.33.241 262 190.217.7.73
38 07.154.221.245 113 45.189.117.237 188 125.25.206.28 263 190.217.19.243
39 07.244.242.103 114 45.192.141.247 189 133.242.146.103 264 192.3.219.94
40 08.177.248.47 115 45.229.32.190 190 137.74.93.21 265 192.99.38.64
41 08.177.248.213 116 45.250.65.15 191 137.184.57.245 266 192.140.42.83
42 08.177.248.217 117 46.99.146.232 192 139.5.151.182 267 192.155.107.59
43 8.210.83.33 118 46.243.220.70 193 139.59.233.24 268 192.254.104.201
44 8.213.128.19 119 46.246.80.6 194 139.255.58.212 269 194.5.193.183
45 8.213.128.30 120 47.74.114.83 195 140.238.19.26 270 194.114.128.149
46 8.213.128.41 121 47.88.79.154 196 151.106.13.221 271 194.233.67.98
47 8.213.128.106 122 47.91.44.217 197 151.106.18.126 272 194.233.69.41
48 8.213.128.123 123 47.243.75.115 198 152.26.229.67 273 194.233.73.103
49 8.213.128.131 124 47.254.28.2 199 152.32.143.109 274 194.233.73.104
50 8.213.128.149 125 49.156.47.162 200 153.122.106.94 275 194.233.73.105
51 8.213.128.152 126 50.195.227.153 201 153.122.107.129 276 194.233.73.107
52 8.213.128.158 127 50.235.149.74 202 154.85.35.235 277 194.233.73.109
53 8.213.128.171 128 50.250.56.129 203 154.95.36.182 278 194.233.88.38
54 8.213.128.172 129 51.68.199.120 204 154.236.162.59 279 195.80.49.3
55 8.213.128.202 130 51.77.141.29 205 154.236.168.179 280 195.80.49.4
56 8.213.128.214 131 51.81.32.81 206 154.236.177.101 281 195.80.49.5
57 8.213.129.23 132 51.159.3.223 207 154.236.179.226 282 195.80.49.6
58 8.213.129.36 133 51.178.182.23 208 157.100.26.69 283 195.80.49.7
59 8.213.129.51 134 54.80.246.241 209 159.65.69.186 284 195.80.49.253
60 8.213.129.57 135 61.9.48.169 210 159.65.133.175 285 195.80.49.254
61 8.213.129.243 136 61.9.53.157 211 159.203.13.121 286 195.158.30.232
62 8.214.41.50 137 62.75.219.49 212 160.16.242.164 287 197.149.247.82
63 8.218.213.95 138 62.75.229.77 213 161.22.34.142 288 197.243.20.178
64 09.200.156.102 139 62.78.84.159 214 164.132.137.241 289 198.46.200.70
65 13.237.147.45 140 62.138.8.42 215 167.71.207.46 290 198.229.231.13
66 20.47.108.204 141 62.204.35.69 216 167.86.81.208 291 212.112.113.178
67 20.113.24.12 142 63.161.104.189 217 167.249.180.42 292 212.154.234.46
68 23.19.7.136 143 66.29.154.103 218 168.205.100.36 293 212.174.44.87
69 23.19.10.93 144 66.29.154.105 219 169.57.1.85 294 213.32.75.44
70 23.81.127.253 145 69.163.252.140 220 170.81.35.26 295 213.230.69.193
71 23.105.78.193 146 76.118.227.8 221 170.83.60.19 296 213.230.71.230
72 23.105.78.252 147 77.83.86.65 222 170.155.5.235 297 213.230.90.106
73 23.105.86.52 148 77.83.87.217 223 171.233.151.214 298 221.159.192.122
74 23.108.42.228 149 77.104.97.3 224 172.241.156.1 299 222.158.197.138
75 23.108.42.238 150 77.236.243.125 225 172.241.192.104 300 222.252.23.5

cat ip_address.txt | grep '^[0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[,].*$\|^.*[,][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[,].*$\|^.*[,][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}$'
Lets assume the file is comma delimited and the position of ip address in the beginning ,end and somewhere in the middle
First regexp looks for the exact match of ip address in the beginning of the line.
The second regexp after the or looks for ip address in the middle.we are matching it in such a way that the number that follows ,should be exactly 1 to 3 digits .falsy ips like 12345.12.34.1 can be excluded in this.
The third regexp looks for the ip address at the end of the line

I wanted to get only IP addresses that began with "10", from any file in a directory:
grep -o -nr "[10]\{2\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" /var/www

If you are not given a specific file and you need to extract IP address then we need to do it recursively.
grep command -> Searches a text or file for matching a given string and displays the matched string .
grep -roE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
-r We can search the entire directory tree i.e. the current directory and all levels of sub-directories. It denotes recursive searching.
-o Print only the matching string
-E Use extended regular expression
If we would not have used the second grep command after the pipe we would have got the IP address along with the path where it is present

for centos6.3
ifconfig eth0 | grep 'inet addr' | awk '{print $2}' | awk 'BEGIN {FS=":"} {print $2}'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I extract Twitter #handles from a text with RegEx? - regex

Something like this: cat file | perl -ne 'while(s/(#[a-z0-9_]+)//gi) { print $1,"\n"}' This will also work if you have lines with multiple #handles in.

Related

Using grep, how to match beginning of line with pattern from stdin

Bash/sed: delete everything from text file except match(es)

Regex for soccer data

Sed command garbled with very easy mutiline regex in bash

How do you extract IP addresses from files using a regex in a linux shell?

Categories

Resources