awk to parse the ldap data between two strings linux - regex

Hi I want to get the strings between two string but in my case the first string like kdp2002 or kdp1005 this is not going to be constant for all entries across the output, that means the numbers after KDP and always changing and that KDP+number don't want to be printed.
$ ldapsearch -x -LLL -o ldif-wrap=no -b ou=Projects,ou=People,ou=KDI,o=KDP cn="alltest1p1" KDPHomeDirectory
dn: cn=alltest1p1,ou=Projects,ou=People,ou=KDI,o=KDP
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch_c/q,Quota=20000,Id=scratch_c
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=economy,NisMap=KDP2002:/proj/KDP2002_alltest1p1/q,Quota=10000
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch/q,Quota=20000,Id=scratch
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch_a/q,Quota=20000,Id=scratch_a
Trial that works Partially:
$ ldapsearch -x -LLL -o ldif-wrap=no -b ou=Projects,ou=People,ou=KDI,o=KDP cn="alltest1p1" KDPHomeDirectory | grep -o -P '(?<=NisMap=).*(?=,Quota)'
KDP2002:/proj/KDP2002_alltest1p1/q
KDP2002:/proj/KDP2002_alltest1p1_scratch/q
KDP2002:/proj/KDP2002_alltest1p1_scratch_a/q
Expected output:
/proj/KDP2002_alltest1p1/q
/proj/KDP2002_alltest1p1_scratch/q
/proj/KDP2002_alltest1p1_scratch_a/q

I would harness GNU sed for this task following way, let file.txt content be
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch_c/q,Quota=20000,Id=scratch_c
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=economy,NisMap=KDP2002:/proj/KDP2002_alltest1p1/q,Quota=10000
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch/q,Quota=20000,Id=scratch
KDPHomeDirectory: nisMapName=auto.home,ou=KDI_US-CDC01,ou=Locations,ou=KDI,o=KDP#0#Quality=scratch,NisMap=KDP2002:/proj/KDP2002_alltest1p1_scratch_a/q,Quota=20000,Id=scratch_a
then
sed 's/.*KDP2002:\([^,]*\).*/\1/' file.txt
gives output
/proj/KDP2002_alltest1p1_scratch_c/q
/proj/KDP2002_alltest1p1/q
/proj/KDP2002_alltest1p1_scratch/q
/proj/KDP2002_alltest1p1_scratch_a/q
Explanation: I use single capturing group denoted by \( and \) which containg zero-or-more (*) non(^) ,, which is located after KDP2002: with whole replacement prefixed by .* and suffixed by .* to span whole line.
(tested in GNU sed 4.2.2)

1st solution: With your shown samples only, please try following GNU awk code.
awk -v RS='=KDP[0-9]+:([^,]+)' 'RT{split(RT,arr,":");print arr[2]}' Input_file
2nd solution: With any awk version, using awk's match function, with your shown samples please try following code.
awk '
match($0,/=KDP[0-9]+:([^,]+)/){
split(substr($0,RSTART,RLENGTH),arr,":")
print arr[2]
}
' Input_file

Using gnu-grep you can use:
grep -oP '=KDP\d+:\K[^,]+'
/proj/KDP2002_alltest1p1_scratch_c/q
/proj/KDP2002_alltest1p1/q
/proj/KDP2002_alltest1p1_scratch/q
/proj/KDP2002_alltest1p1_scratch_a/q
Here \K resets/discards matched info to give you desired output after KDP\d+: only.
Alternatively you can use this gnu-awk command:
awk 'match($0, /=KDP[0-9]+:([^,]+)/, a) {print a[1]}' file
/proj/KDP2002_alltest1p1_scratch_c/q
/proj/KDP2002_alltest1p1/q
/proj/KDP2002_alltest1p1_scratch/q
/proj/KDP2002_alltest1p1_scratch_a/q

Related

Get the following character which match a string

I'm trying to retreive a specific data returned from a command line. Here is my command line:
snmpwalk -v2c -c community localhost 1.3.6.1.2.1.2 | grep tun0
Which give me as result:
IF-MIB::ifDescr.4 = STRING: tun0
In this result I want to retreive 4. I thought using regex, but maybe there is an easier way to fetch it.
Regex I tried :
\ifDescr.\s+\K\S+ https://regex101.com/r/9X04MD/1
[\n\r].*ifDescr.\s*([^\n\r]*) https://regex101.com/r/9X04MD/2
I would like to fetch it in a single command line like
snmpwalk -v2c -c community localhost 1.3.6.1.2.1.2 | grep tun0 | ?
There are so many options that don't involve using GNU grep's experimental -P option. For example given just your sample input to work off, here's one way with any sed:
$ echo "$out" | sed 's/.*\.\([0-9]\).*tun0/\1/'
4
or any awk:
$ echo "$out" | awk -F'[. ]' '/tun0/{print $2}'
4
I'd recommend pattern (?<=ifDescr\.)[^ =]+
Explanation:
(?<=ifDescr\.) - positive lookbehind, asserts that wat is preceeding is ifDescr.
[^ =]+ match one or more characters other than space or equal sign =
Demo

Find all text between $...$ delimiters using bash script

I have a text file, and I'm trying to get an array of strings containing between $..$ delimiters (LaTeX formulas) using bash script. My current code doesn't work, result is empty:
#!/bin/bash
array=($(grep -o '\$([^\$]*)\$' test.txt))
echo ${array[#]}
I tested this regex here, it finds the matches. I use the following test string:
b5f1e7$bfc2439c621353$d1ce0$629f$b8b5
Expected result is
bfc2439c621353 629f
But echo returns empty. Although if I use '[0-9]\+' it works:
5 1 7 2439 621353 1 0 629 8 5
What do I do wrong?
How about:
grep -o '\$[^$]*\$' test.txt | tr -d '$'
This is basically performing your original grep (but without the brackets, which were causing it to not match), then removing the first/last characters from each match.
You may use awk with input field separator as $:
s='b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
awk -F '$' '{for (i=2; i<=NF; i+=2) print $i}' <<< "$s"
Note that this awk command doesn't validate input. If you want awk to allow for only valid inputs then you may use this gnu awk command with FPAT:
awk -v FPAT='\\$[^$]*\\$' '{for (i=1; i<=NF; i++) {gsub(/\$/, "", $i); print $i}}' <<< "$s"
bfc2439c621353
629f
What about this?
grep -Eo '\$[^$]+\$' a.txt | sed 's/\$//g'
I'm using sed to replace the $.
Try escaping your braces:
tst> grep -o '\$\([^\$]*\)\$' test.txt
$bfc2439c621353$
$629f$
of course, you then have to strip out the $ signs (-o prints the entire match). You can try sed instead:
tst> sed 's/[^\$]*\$\([^\$]*\)\$[^\$]*/\1\n/g' test.txt
bfc2439c621353
629f
Why is your expected output given b5f1e7$bfc2439c621353$d1ce0$629f$b8b5 the two elements bfc2439c621353 629f rather than the three elements bfc2439c621353 d1ce0 629f?
Here's a single grep command to extract those:
$ grep -Po '\$\K[^\$]*(?=\$)' <<<'b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
bfc2439c621353
d1ce0
629f
(This requires GNU grep as compiled with libpcre for -P)
This uses \$\K (equivalent to (?<=\$)to look behind at the first $ and (?=\$) to look ahead to the next $. Since these are lookarounds, they are not absorbed by grep in the process and therefore d1ce0 is available to be found.
Here's a single POSIX sed command to extract those:
$ sed 's/^[^$]*\$//; s/\$[^$]*$//; s/\$/\n/g' \
<<<'b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
bfc2439c621353
d1ce0
629f
This does not use any GNU notation and should work on any POSIX-compatible system (such as OS X). It removes the leading and trailing portions that aren't wanted, then replaces each $ with a newline.
Using bash regex:
var="b5f1e7\$bfc2439c621353\$d1ce0\$629f\$b8b5" # string to var
while [[ $var =~ ([^$]*\$)([^$]*)\$(.*) ]] # matching
do
echo -n "${BASH_REMATCH[2]} " # 2nd element has the match
var="${BASH_REMATCH[3]}" # 3rd is the rest of the string
done
echo # trailing newline
bfc2439c621353 629f

Access the word in the file with grep

I have a conf file and I use grep to access the data in this file but not a very useful method for me.
How can I just get the main word by search-term?
I using:
grep "export:" /etc/VDdatas.conf
Print:
export: HelloWorld
I want: (without "export: ")
HelloWorld
How can I do that?
If you're using GNU grep you can use PCRE and a lookbehind:
grep -P -o '(?<=export:).*' /etc/VDdatas.conf
The -o option means to print only the part of the line that matches the regexp, and using a lookbehind for the export: prefix makes it not part of the match.
You can also use sed or awk
sed 's/export:/s/^export: //' /etc/VDdatas.conf
awk '/export:/ {print $2}' /etc/VDdatas.conf
I suggest you pipe the match to awk.
grep "export:" /etc/VDdatas.conf | awk -F ' ' '{print $2}'
This will print the second word in the output (after splitting the line on spaces).

Extract version using grep/regex in bash

I have a file that has a line stating
version = "12.0.08-SNAPSHOT"
The word version and quoted strings can occur on multiple lines in that file.
I am looking for a single line bash statement that can output the following string:
12.0.08-SNAPSHOT
The version can have RELEASE tag too instead of SNAPSHOT.
So to summarize, given
version = "12.0.08-SNAPSHOT"
expected output: 12.0.08-SNAPSHOT
And given
version = "12.0.08-RELEASE"
expected output: 12.0.08-RELEASE
The following command prints strings enquoted in version = "...":
grep -Po '\bversion\s*=\s*"\K.*?(?=")' yourFile
-P enables perl regexes, which allow us to use features like \K and so on.
-o only prints matched parts instead of the whole lines.
\b ensures that version starts at a word boundary and we do not match things like abcversion.
\s stands for any kind of whitespace.
\K lets grep forget, that it matched the part before \K. The forgotten part will not be printed.
.*? matches as few chararacters as possible (the matching part will be printed) ...
(?=") ... until we see a ", which won't be included in the match either (this is called a lookahead).
Not all grep implementations support the -P option. Alternatively, you can use perl, as described in this answer:
perl -nle 'print $& if m{\bversion\s*=\s*"\K.*?(?=")}' yourFile
Seems like a job for cut:
$ echo 'version = "12.0.08-SNAPSHOT"' | cut -d'"' -f2
12.0.08-SNAPSHOT
$ echo 'version = "12.0.08-RELEASE"' | cut -d'"' -f2
12.0.08-RELEASE
Portable solution:
$ echo 'version = "12.0.08-RELEASE"' |sed -E 's/.*"(.*)"/\1/g'
12.0.08-RELEASE
or even:
$ perl -pe 's/.*"(.*)"/\1/g'.
$ awk -F"\"" '{print $2}'

Match word ignoring same word with extra character in sed

The following works in Javascript as a match:
[^\$]fileref.*
However, in sed the regex does not match anything. I would like to replace a variable reference in a bash file while ignoring the variable identifier. The premise is that I have a default placeholder file that needs to be updated from a shell script with sed -i. I can't locate a reason as to why sed is having an issue with this expression.
Sed test example:
echo -e 'fileref=old\n./executable $fileref' | sed 's/[^\$]fileref.*/fileref=replaced/g'
Output from gnu sed (ubuntu or centOS) where no match is found:
fileref=old
./executable $fileref
Desired output:
fileref=replaced
./executable $fileref
"No $ before fileref" can be expressed like this: "a character that isn't a $ or no character at all before fileref"
echo -e 'fileref=old\n./executable $fileref' | sed 's/\([^$]\|^\)fileref.*/\1fileref=replaced/g'
The same with Javascript:
var result = str.replace(/([^$]|^)fileref.*/, '$1fileref=replaced');
echo -e 'fileref=old\n./executable $fileref' | sed 's/^fileref.*$/fileref=replaced/g'
gives the following
fileref=replaced
./executable $fileref
by putting in [^\$] you are saying to search for $ at the start of a line