Use sed and regex to isolate data

Use sed and regex to isolate data - regex

Hehey
i'm on the way to learn about sed and regex and grouping and how to isolate data from a file with that.
Ok i wrote a sed command there give me, all the IPs in the auth.log and write the ips in the logips.log file, so i need to grouping the regex and take the second (ip) group.
sed 's/(.*)([0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3}[.][0-9]{1,3})(.*)/\/2/g' /var/log/auth.log > logips.log
But i have every time the whole auth.log in my logips.log. After 2 hours of thinking and seeking, i'm here and asking.
i hope someone can push me in the right direction to solve this.
happy greetings

To get all IP from the auth.log, try grep instead.
grep -o -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" auth.log
-o output only match
-E extended regex
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} match IP

Related

sed with capturing group

I have strings like below
VIN_oFDCAN8_8d836e25_In_data;
IPC_FD_1_oFDCAN8_8d836e25_In_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_data
I want to insert _Moto in between as below
VIN_oFDCAN8_8d836e25_In_Moto_data
IPC_FD_1_oFDCAN8_8d836e25_In_Moto_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_Moto_data
But when I used sed with capturing group as below
echo VIN_oFDCAN8_8d836e25_In_data | sed 's/_In_*\(_data\)/_Moto_\1/'
I get output as:
VIN_oFDCAN8_8d836e25_Moto__data
Can you please point me to right direction?

Though you could use simple substitution of IN string(considering that it is present only 1 time in your Input_file) but since your have asked specifically for capturing style in sed, you could try following then.
sed 's/\(.*_In\)\(.*\)/\1_Moto\2/g' Input_file
Also above will add string _Moto to avoid adding 2 times _ after Moto confusion, Thanks to #Bodo for mentioning same in comments.
Issue with OP's attempt: Since you are NOT keeping _In_* in memory of sed so it is taking \(_data_\) only as first thing in memory, that is the reason it is not working, I have fixed it in above, we need to keep everything till _IN in memory too and then it will fly.

$ sed 's/_[^_]*$/_Moto&/' file
VIN_oFDCAN8_8d836e25_In_Moto_data
IPC_FD_1_oFDCAN8_8d836e25_In_Moto_data
BRAKE_FD_2_oFDCAN8_8d836e25_In_Moto_data

In your case, you can directly replace the matching string with below command
echo VIN_oFDCAN8_8d836e25_In_data | sed 's/_In_data/_In_Moto_data/'

Grep regex treated as path

I have a script, where I read strings from txt file, then assign it to $snmp_cred variable and then trying to strip ip address from strings, using grep, into another variable ($snmp_ip)
while read snmp_cred; do
echo appliance $ADDM_address and $snmp_cred
snmp_ip=$(echo $snmp_cred | grep "/((25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\d(?=#)/g")
echo IP for snmp community is $snmp_ip
done </tmp/input.txt
Content of input.txt file is:
a10networks/generic/1.3.6.1.4.1.22610.1.3.27_thunder_series4430s/10.72.168.33#public
a10networks/generic/1.3.6.1.4.1.22610.1.3.23_thunder_series1030s/172.17.48.24#public
a10networks/generic/1.3.6.1.4.1.22610.1.3.16_ax3200_12/10.251.1.101#public
The regex works in online regex editor, but fails into bash script. Bash output is:
++ echo $'a10networks/generic/1.3.6.1.4.1.22610.1.3.27_thunder_series4430s/10.72.168.33#public\r'
++ grep '/((25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\d(?=#)/g'
+ snmp_ip=
+ echo IP for snmp community is
IP for snmp community is
can anyone point, what an I doing wrong?

Since you are not getting the matched texts only, you do not really need the lookahead that the POSIX regex does not support. Also, note that \d is not supported by POSIX regex standard either. Also, grep pattern should not be placed inside regex delimiters.
If you still need to use your pattern (say, to also grab the matches), pass the -oP option use:
grep -oP "((25[0-5]|2[0-4]\d|[01]?[1-9]\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?[1-9]\d?)\d(?=#)"
And the online demo

In this statement:
snmp_ip=$($snmp_cred | grep "/((25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\d(?=#)/g")
you are just expanding the variable, without passing it to grep.
you need to either pass it to grep as an argument (in the form of a file redirection) or send it to greps STDIN.
this worked for me
#!/bin/bash
while read snmp_cred; do
#echo appliance $ADDM_address and $snmp_cred
snmp_ip=$(grep -E -o "((25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)#" <<< $snmp_cred)
echo IP for snmp community is $snmp_ip
done <input.txt
output:
IP for snmp community is 10.72.168.33#
IP for snmp community is 172.17.48.24#
IP for snmp community is

Sed regex to find-replace version numbers

I'm new to sed, trying to write a script to find/replace text in a file. The file (test.txt) looks like this;
hello_world (1.2.0.123)
and I'm finding that this script (which I inherited):
sed -i 's/\(^\s*hello_world \)(.*)/\1hello_world (1.2.0.456)/' test.txt
is leading to;
hello_world hello_world (1.2.0.456)
when I need it to be
hello_world (1.2.0.456)
I'm not sure how to make the first part match only the parentheses, any assistance would be appreciated.
EDIT
The whitespace before the hello_world is important
The sed line is being auto-generated using variables etc. I'm looking for a way to make this regex work without changing that. The variables I have to play with are
variable1: hello_world
variable2: hello_world (1.2.0.456)
(hopefully it's obvious where these variables sat within the sed expression)
EDIT
I got this sorted in the end, answer below if anyone else is interested.

Got it
sed -i 's/\(^\s*\)phoenix_utils (.*)/\1phoenix_utils (1.0.0.28583)/' test.txt

sed -i -e 's/^\([[:blank:]]*hello_world \).*/\1(1.0.0.28583)/' YourFile
\1 is the content of first ( ) so \1Helloworld write it twice in your sample
be carefull with escape content depending of -e or not (behavior change and non GNU sed often need to escape (for grouping pattern)

How to print only matches with sed?

Okay, this is an easy one, but I can't figure it out.
Basically I want to extract all links ([^<>]*) from a big html file.
I tried to do this with sed, but I get all kinds of results, just not what I want. I know that my regexp is correct, because I can replace all the links in a file:
sed 's_[^<>]*_TEST_g'
If I run that on something like
<div>A google link</div>
<div>A google link</div>
I get
<div>TEST</div>
<div>TEST</div>
How can I get rid of everything else and just print the matches instead? My preferred end result would be:
A google link
A google link
PS. I know that my regexp is not the most flexible one, but it's enough for my intentions.

Match the whole line, put the interesting part in a group, replace by the content of the group. Use the -n option to suppress non-matching lines, and add the p modifier to print the result of the s command.
sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
Note that if there are multiple links on the line, this only prints the last link. You can improve on that, but it goes beyond simple sed usage. The simplest method is to use two steps: first insert a newline before any two links, then extract the links.
sed -n -e 's!</a>!&\n!p' | sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
This still doesn't handle HTML comments, <pre>, links that are spread over several lines, etc. When parsing HTML, use an HTML parser.

If you don't mind using perl like sed it can copy with very diverse input:
perl -n -e 's+(<a href=.*?</a>)+ print $1, "\n" +eg;'

Assuming that there is only one hyperlink per line the following may work...
sed -e 's_.*&lta href=_&lta href=_' -e 's_>.*_>ed &lt&lt'EOF'
-e 's_.*&lta href=_&lta href=_' -e 's_>.*_>_'

This might work for you (GNU sed):
sed '/<a href\>/!d;s//\n&/;s/[^\n]*\n//;:a;$!{/>/!{N;ba}};y/\n/ /;s//&\n/;P;D' file

Strange behaviour with command-line perl

I have a file that I'm trying to modify using perl from the terminal in Ubuntu Linux (Natty).
The name of the file is vm.args and the first two lines are as follows:
## Name of the riak node
-name riak#127.0.0.1
I am trying to use perl to update the ip address. Below is my code:
riak_ip=`ifconfig eth1 | grep "inet addr" | cut -d ":" -f2 | cut -d " " -f1`
perl -0777 -i -pe "s/(\-name[\t ]*riak\#)[^\n]+/\1$riak_ip/g" vm.args
Let's assume the ip address I get is 10.181.106.32. The perl command gives me a result I can't understand. The resulting first two lines in the my file after I run the above in the terminal become:
## Name of the riak node
H.181.106.32
Which is the letter H and part of the ip address.
I can't seem to figure out what I'm doing wrong and will appreciate some assistance.
Thanks in advance.

This seems to work reliably:
perl -0777 -i -pe "s/(-name\\s*riak#).*/\${1}$riak_ip/g" vm.args
The "\\1$riak_ip" seems to cause some problems since perl was seeing it as "\1172.20.2.136" if $riak_ip was 172.20.2.136. My guess is that the back reference to "1172" was causing some weirdness. Anyway, switching to the ${1} form removes the possibility for misinterpretation (pun intended).

This really should all be done in Perl, which is much better at extracting data from text than shell script. Something like this should work, but I cannot test it at present.
perl -0777 -i -pe '($ip)=`ifconfig eth1`=~/inet addr:([\d.]+)/;s/-name\s+riak#\K[\d.]+/$ip/g;' vm.args
I would be grateful if someone could confirm whether this works OK. Beware that the \K construct in Perl regexes is a recent addition and may not be in any given installation of Perl.

Problem is that \1 gets concatenated with the first IP octet. To make it work despite concatenation, the ${1} syntax needs to be used and properly quoted. This works:
perl -0777 -i -pe "s/(\-name[\t ]*riak\#)[^\n]+/\${1}$riak_ip/g" vm.args
You might consider to use single quotes for the regex parts, to remove one layer of quoting:
perl -0777 -i -pe 's/(-name[\t ]*riak#)[^\n]+/${1}'"$riak_ip"'/g' vm.args
(Edited/corrected according to comments, my previous suggestion was wrong.)

Sounds like a good use for the \K sequence (v5.10). And [^\n] is actually ., unless the /s modifier is used. No need for /g option unless you intend to replace the string several times.
perl -0777 -i -pe "s/\-name[\t ]*riak\#\K.+/$riak_ip/" vm.args

This would be the correct regexp:
perl -0777 -i -pe "s/(-name\s*riak#)\S+/$1$riak_ip/g" vm.args
Result:
## Name of the riak node
10.181.106.32
Use \s for space characters, and \S (no space character) to match the whole IP address. In the replacement string, $1 is used instead \1. - and # are not special, so there is no need to escape them, although there is nothing bad with them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Use sed and regex to isolate data - regex

To get all IP from the auth.log, try grep instead. grep -o -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" auth.log -o output only match -E extended regex [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3} match IP

Related

sed with capturing group

Grep regex treated as path

Sed regex to find-replace version numbers

How to print only matches with sed?

Strange behaviour with command-line perl

Categories

Resources