search and print a word from a file - regex

I am trying to capture a word with a static search string.
Search String: customfield_12345
Here is the source file that i am trying to feed to awk script:
Infput file: abc.log
{"expand":"hello,foo,boo,doo","id":"546546","self":"http://localhost/abc/rest/api/latest/issue/12345","key":"abcd-4567","fields":{"customfield_12345":"$D21.0/dfgdf/string_to_capture_from_file "}}
Query: awk '{for(i=1;i<=NF;i++){if($i~/^customfield_12345/){print $i}}}' abc.log
Expected output: string_to_capture_from_file
I thought to use combination of grep and cat, but somehow option "-o" is not supprted on all platforms.

awk is not the best tool for your case. Figuring out the relvant separators might be a pain, and using a JSON parser as suggested by the other answer would be easier.
However, in your specific case, you can modify your query as follow :
MYVAR=awk -F'":"|","|{"|"}' '{for(i=1;i<=NF;i++){if($i~/customfield_12345/){i++;print $i}}}' test
echo ${MYVAR##*/}
-F allows us to set ":", ",", {" and "} as internal fields separators. When awk encounters one of those patterns, it will split the line in several columns.
This will return $D21.0/dfgdf/string_to_capture_from_file, which you can later parse with bash using echo ${MYVAR##*/}

Your input file contains a JSON string, so I would parse it as JSON instead of using a regex:
python -c "import json;json_data=open('abc.log');data = json.load(json_data);print data['fields']['customfield_12345'];json_data.close();"

Related

Extract Source IP from log files

i want to extract "srcip=x.x.x.x" from log file in bash. my log file is like this:
2019:06:23-17:50:03 myhost ulogd[5692]: id="2021" severity="info" sys="SecureNet" sub="packetfilter" name="Packet dropped (GEOIP)" action="drop" fwrule="60019" initf="eth0" srcmac="3c:1e:04:92:6f:fb" dstmac="00:50:56:97:7c:af" srcip="185.53.91.50" dstip="192.168.50.10" proto="6" length="44" tos="0x00" prec="0x00" ttl="235" srcport="54522" dstport="5038" tcpflags="SYN"
I've wrote awk '{print $15}' to extract srcip but the problem is srcip position not same in each line. how can i extract srcip=x.x.x.x without position of that?
With any sed in any shell on every UNIX box:
$ sed -n 's/.*\(srcip="[^"]*"\).*/\1/p' file
srcip="185.53.91.50"
The following command provides the result you expect
grep -o -P 'srcip="(\d{1,3}[.]){3}\d{1,3}"' log
The option o is to print only the matched parts. The option P is to use perl-compatible regular expressions. The regex is matching srcip=<ipv4> and log is the name of the file you want to extract content from.
Here is a link to regex101 for an explanation for the regex: https://regex101.com/r/hjuZlM/2
An awk version
awk -F"srcip=" '{split($2,a," ");print FS a[1]}' file
srcip="185.53.91.50"
Split the line using the key word, then get the next field after split.

Extracting substring in bash script

I am not that good at bash scripting. I have a requirement to extract a substring between two words of a string. I tried different ways. Could some one help me pls?
This is my text "RegionName": "eu-west-1", "LatestAmiId": "ami-0ebfeadd9ccacfbb2",
Remember the the quotes and comma are the part of String. I need to extract the AMI ID alone, Means text between "LatestAmiId": " and ",
Any help pls?
Assuming you have this string stored in a variable name input_text you can get the AmiId using sed like this
ami_id=$(echo "$input_text" | sed -e 's/.*LatestAmiId": "//' -e 's/",$//')
this uses two different sed scripts:
s/.*LatestAmiId": "// replaces all text up to and including LatestAmiId": " with nothing
s/",$// replaces the ", at the end of the line with nothing
As I mentioned in comments, jq is a tool that I have found really helpful when working with JSON objects in bash scripts. Since your input string looks like a section out of a json response from an AWS api, I highly recommend using a json tool rather than a regex to extract this information.

Sed - replace value in file with regex match in another file

I am trying to code a bash script in a build process where we only have a few tools (like grep, sed, awk) and I am trying to replace a value in an ini file with a value from a regular expression match in another.
So, I am matching something like "^export ADDRESS=VALUE" in file export_vars.h and putting VALUE into an ini file called config.ini in a line with "ADDRESS=[REPLACE]". So, I am trying to replace [REPLACE] with VALUE with one command in bash.
I have come across that sed can take an entire file and insert it into another with a command like
sed -i -e "/[REPLACE]/r export_vars.h" config.ini
I need to somehow refine this command to only read the pattern match from export_vars.h. Does anyone know how to do this?
sed is for simple substitutions on individual lines, that is all. You need to be looking at awk for what you're trying to do. Something like:
awk '
BEGIN { FS=OFS="=" }
NR==FNR {
if ( $1 == "export ADDRESS" ) {
value = $2
}
next
}
{ sub(/\[REPLACE\]/,value); print }
' export_vars.h config.ini
Untested, of course, since you didn't provide testable sample input/output.
Another in awk:
$ awk '/ADDRESS/{if(a!="")$0=a;else a=$NF}NR>FNR' export_vars.h config.ini
ADDRESS=VALUE
Explained:
$ awk '
/ADDRESS/ { # when ADDRESS is found in record
if(a!="") $0=a # if a is set (from first file), use it
else a=$NF } # otherwise set a with the last field
NR>FNR # print all record of the last file
' export_vars.h config.ini # mind the order
This solution does not tolerate space around = since $0 is replaced with $NF from the other file.

Bash - Retrieving Strings of Text in Files with Regular Expressions

I'm sorry if the title was poorly worded. Here's the idea. Let's say that I have many files and I wish to find all occurrences of a particular expression such as:
tag:"some text I wish to retrieve"
Note that the entire line above would appear in the files. I wish to copy only what is in the quotation marks after the word 'tag'.
I'm not an expert at bash by any means, but I could easily use grep to retrieve the entire line that contains the regular expression. Easy. However, I only want part of that line. The text in quotation marks varies in length. Ultimately I want to amalgamate all occurrences into one file.
For instance, I would want to take FILE 1 and FILE 2 and end up with FILE 3:
FILE 1:
whatever:"text I don't want"
something:"More text I don't want" tag:"some text I wish to retrieve"
FILE 2:
whatever:"don't want" tag:"more text I wish to retrieve" something:"nope"
FILE 3:
some text I wish to retrieve
more text I wish to retrieve
Can this be accomplished using bash? I could do it in C with a bit of effort, but I'd rather not if I don't have to.
EDIT:
"-o" is used to show only the part of the line that matches the expression. I don't know how I missed that in the man page.
You can use grep to perform this task.
grep -hrPo 'tag:"\K[^"]*' * > result
Here is a gnu awk (gnu due to multiple characters in RS) version:
awk -v RS="tag:" -F\" '{$1=$1} FNR>1 {print $2}' FILE*
some text I wish to retrieve
more text I wish to retrieve
This should work with all version of awk:
awk -F\" '{for (i=1;i<=NF;i++) if ($i~" tag:") print $(i+1)}' FILE*
some text I wish to retrieve
more text I wish to retrieve

Regular Expression to parse Common Name from Distinguished Name

I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...
CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu
I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)
DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.
Thank you very much for your help!
I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:
CN=Doe\, John,CN=Users,DC=example,DC=local
Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:
echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'
(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.
Two cut commands is probably the simplest (although not necessarily the best):
DSCL | cut -d, -f1 | cut -d= -f2
First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.
Using sed:
sed 's/^CN=\([^,]*\).*/\1/' input_file
^ matches start of line
CN= literal string match
\([^,]*\) everything until a comma
.* rest
http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators
awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt
I like awk too, so I print the substring from the fourth char:
DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt
This regex will parse a distinguished name, giving name and val a capture groups for each match.
When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:
(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+
Here is is nicely formatted:
(?:^|,\s?)
(?:
(?<name>[A-Z]+)=
(?<val>"(?:[^"]|"")+"|[^,]+)
)+
Here's a link so you can see it in action:
https://regex101.com/r/zfZX3f/2
If you want a regex to get only the CN, then this adapted version will do it:
(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))