AWK regex to find String with pattern - regex

I have a File that contain too many charechter and symbols. I want to find an exact string and then cut it and give it to two variables. I have write it with grep but i want to write it in **AWK** or SED.
Here is my example file :
.f#alU|A#Z<inCWV6a=L?o`A5vIod"%Mm+YW1RM#,L;aN
r^n<&)}[??!VcVIV**2zTest1.Test2n9**94EN~yK,$lU=9?UT.[
e`)G:FS.nGz%?#~k!20aLJ^PU-[#}0W\ !8x
cujOmEK"1;!cI134lu%0-A +/t!VIf?8uT`!
aC1QAQY>4RE$46iVjAE^eo5yR|
1?/T?<H5,%G~[|9I/c&8MY$O]%,UYQe{!{Bm[rRC[
aHC`<m?BUau#N_O>Yct.MXo[>r5^uV&26#MkYB'Kiu\Y
K(*}ldO:ZQnI8t989fi+
CrvEwmTQ80k3==,a'Jj9907+}NNy=0Op
"nzb.j-.i%z5`U*8]~#64sF'r;\x\;ylr_;q5F` A!~p*
first i want to find 2zTest1.Test2n9 then cut the first 2 and last two charechter and finally get Two Words without dot(.). First word will i send to one variable and second one two another Variable.
Note : I want to find 2zTest1.Test2n9 and then i want to cut it.
output :
variable 1 = test1
variable 2 = test2
Thanks

With sed its:
sed -n 's/.*\(2z\(\(.*\)\.\(.*\)\)n9\).*/variable 1 = \L\3\nvariable 2 = \L\4/p' your.file
Output:
variable 1 = test1
variable 2 = test2

Using GNU awk:
read var1 var2 < <(
gawk 'match($0, /2[[:alpha:]]([^.]+)\.(.*)[[:alpha:]]9/, m) {
print m[1], m[2]
}' file
)
echo "var1=$var1"
echo "var2=$var2"
var1=Test1
var2=Test2
I read your comments to hek2mgl's answer -- those requirement need to be in the question itself.

Related

Unix regex get only the first match

I have the following text:
NodeMetaData MapNodeId="105141" PageFormat="OsXml" UniqueIdentifier="fd0f9ade-88e1-4b04-b338-0a8884f66423" RelativePath="Test_03/AddressMap_MyAddressMap.os.xml" LastPulledRevision="-9223372036854775808" LastPulledMd5="" LastSyncedMd5="7D0C294B9A7C09F17FD5AC0414179DD414649455297B8F73125D7FB5E39D647D" HasMergeConflicts="false"
NodeMetaData MapNodeId="105142" Pag
eFormat="OsXml" UniqueIdentifier="85f55c40-f95c-47f2-9c97-d35881e8f762" RelativePath="Test_03/Struct_MyStruct.os.xml" LastPulledRevision="-922337203685477580
8" LastPulledMd5="" LastSyncedMd5="32364BCCBCD8AA9C47D8E09A3EB06667DD9476EB155F9411FA359EFA5C1A4F4F" HasMergeConflicts="false"
There are two MapNodeId (see bold) and I need to get only the first one and insert it to a file.
I used the following:
set WorkingCopyRI=`( sed -n 's/.*MapNodeId=\"// ; s/\" .*//p' Result.log)`
but the var contains the the id of both MapNodeId, what do I need to add in order to get only the first one?
You can append ;T;q to your script to make it quit after the second s instruction prints for the first time.
Here's a cleaner and more robust way to do the whole thing:
sed -n '/MapNodeId=/ { s/^.*\sMapNodeId="\([^"]*\)"\s .*$/\1/p; q }'
I'm assuming your ID-s won't contain double quotes -- if they can, you will have to modify the expression in group #1.
(Also, your formatting gives no clue as to whether your text occurs in multiple lines or not, but I'm assuming that the MapNodeId="..." parts appear on separate lines, otherwise you wouldn't have this problem.)
perl approach:
perl -ne 'print "$1\n" if /MapNodeId="([^"]+)"/' Result.log
The output:
105141
print "$1\n" - print the first captured group value
Or if you have grep PCRE support:
grep -Po '.*MapNodeId="\K([^"]+)' Result.log | head -n 1

Regexp for removing certain columns

I have an input of this format:
<apple1> <orange1> : <apple2> <orange2> : <apple3> <orange3> : ...
This input is of undefined length and consists of apple-orange pairs with varying orange and apple parts, separated by a colon.
I'd like to have this as an output:
<apple1> <orange1> : <orange2> : <orange3> : ...
I. e. all apple parts but the first removed.
Each apple part is 14 characters wide, each orange part is 19 characters wide.
I tried things like this:
sed -r 's/.{14}(.{19}):/\1:/g'
But this always ran into problems skipping the first apple part.
Can anybody provide a regexp solving this task?
Real world example input:
appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:ppppppppppppppqqqqqqqqqqqqqqqqqqq:nnnnnnnnnnnnnnttttttttttttttttttt
Output should be this:
appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:barbarbarbarbarbarb:barbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:qqqqqqqqqqqqqqqqqqq:ttttttttttttttttttt
Your regex to sed was almost correct. Just match ":_14_19" over and over and remove the 14 part. (Note: I use commas as regex delimiters below because they're easier to read.)
$ export A='appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb:xxxxxxxxxxxxxxooooooooooooooooooo:ppppppppppppppqqqqqqqqqqqqqqqqqqq:nnnnnnnnnnnnnnttttttttttttttttttt'
$ echo $A | sed -Ee 's,:.{14}(.{19}),:\1,g'
appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo:barbarbarbarbarbarb:barbarbarbarbarbarb:barbarbarbarbarbarb:ooooooooooooooooooo:qqqqqqqqqqqqqqqqqqq:ttttttttttttttttttt
This job is more suitable to awk as input file is well structured in rows and columns using a known delimiter i.e. colon:
awk 'BEGIN{FS=OFS=":"} {for (i=2; i<=NF; i++) $i = substr($i, 15)} 1' file
appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:barbarbarbarbarbarb:barbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:qqqqqqqqqqqqqqqqqqq:ttttttttttttttttttt
This awk command uses : as input+output delimiter and starting from 2nd field in each record it sets each field to a substring of same field from 15th position.
With perl..
Our Input: appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo
lets assume
a=appleappleappl (14 characters)
b=orangeorangeorangeo (19 characters)
c=appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo (rest of the line, which is a repeating combination of a and b.
Expected output: Before the fist colon (:), both a and b are kept and after the first colon, only b is kept.
${a}${b}:${b}:${b}:.... (please correct me if I am wrong)
So here it is once again, to recap, both the input and output.
Our Input: appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo
Expected Output: appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo
Please try this script: (As mentioned earlier, this is using perl and not shell).
%_Host#User> cat apple.pl
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
chomp $_ ;
my #tmp = split /:/, $_ ;
my ($a,$b) = (substr($tmp[0],0,14), substr($tmp[0],14,19)) ;
my $str = "$a"."$b" ;
foreach my $i (1..$#tmp) {
$tmp[$i] =~ s/$a//g ;
$str .= ":"."$tmp[$i]" ;
}
print "$str\n" ;
}
%_Host#User>
Script Output:
%_Host#User> cat td_apple |./apple.pl
appleappleapplorangeorangeorangeo:orangeorangeorangeo:orangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:barbarbarbarbarbarb:barbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:ppppppppppppppqqqqqqqqqqqqqqqqqqq:nnnnnnnnnnnnnnttttttttttttttttttt
Sample Data:
%_Host#User> cat td_apple
appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo:appleappleapplorangeorangeorangeo
foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb:foofoofoofoofobarbarbarbarbarbarb
xxxxxxxxxxxxxxooooooooooooooooooo:ppppppppppppppqqqqqqqqqqqqqqqqqqq:nnnnnnnnnnnnnnttttttttttttttttttt
%_Host#User>
Thanks.

regex, repeat, count group

i need some help with a regex that follows up this format:
First part of the string is a email address, followed by eight columns divided by ";".
a.test#test.com;Alex;Test;Alex A.Test;Alex;12;34;56;78
the first part i have is (.*#.*com)
these are also possible source strings:
a.test#test.com;Alex;;Alex A.Test;;12;34;56;78
a.test#test.com;Alex;;Alex A.Test;Alex;;34;;78
a.test#test.com;Alex;Test;;Alex;12;34;56; and so on
You can try this regex:
^(.*#.*com)(([^";\n]*|"[^"\n]*");){8}(([^";\n]*|"[^"\n]*"))$
If you have a different number of columns after the adress change the number between { and }
For your data here the catches:
1. `a.test#test.com`
2. `56;`
3. `56`
4. `78`
Here the test
If you are sure there will be no " in your strings you can use this:
^(.*#.*com)(([^;\n]*);){8}([^;\n]*)$
Here the test
Edit:
OP suggested this usage:
For use the first regex with sed you need -i -n -E flags and escape the " char.
The result will look like this:
sed -i -n -E "/(.*#.*com)(([^\";\n]*|\"[^\"\n]*\");){8}(([^\";\n]*|\"[^\"\n]*\"))/p"
you can have something like
".*#.*\.com;[A-Z,a-z]*;[A-Z,a-z]*;[A-Z,a-z, ,.,]*;[A-Z,a-z]*;[0-9][0-9];[0-9][0-9];[0-9][0-9];[0-9][0-9]"
Assuming the numbers are only two digit
Using awk you can do this easily:
awk -F ';' '$1 ~ /\.com$/{print NF}' file
9
9
9
cat file
a.test#test.com;Alex;;Alex A.Test;;12;34;56;78
a.test#test.com;Alex;;Alex A.Test;Alex;;34;;78
a.test#test.com;Alex;Test;;Alex;12;34;56; and so on

unix : search a file if a string is present between two patterns

I have a file, having a format, given below. I want to search if a word for e.g. 'hello' is present in line following schema and before the DocName. If it is present, how many such schema's have it?
How can I do this in one line using grep/awk/sed?
The expected output is: assuming I am searching if word 'hello' is present, then in this case it is present in 1st, 2nd and 4th schema, so the output is 3, since we have three 'hello' present in three schemas. Note even if there are multiple occurrences of 'hello' in first schema, it is still counted as one.
:
:
:
DocName: abjrkj.txt
schema:
abs
askj
djsk
djsk
hello
adj
hello
DocName: abjrkj.txt
schema:
abs
askj
djsk
djsk
adj
hello
DocName: aasjrkj.txt
schema:
absasd
askjas
djsksa
djskasd
adjsg
DocName: ghhd.txt
schema:
absg
fdgaskj
dgdjsk
dgdfdjsk
drgadj
hello
:
:
:
Try this.
awk -F '^DocName:' '/hello/ { ++i }
END { print i }' file
If you absolutely require a one-line solution (why??) the whitespace can be condensed to just one space.
Here is sed solution:
sed ':a; N; s/\n/ /; $!ba; s/DocName/\n&/g' < file | sed -n '/DocName/{/hello/p}' | wc
This is algorithm: It puts whole file in pattern space with replacing all \n characters with space. Then before every DocName string puts \n. After that is piping throw searching Docname & hello finally prints 3 numbers from which first is asked. If you want to see printed lines omit | wc piping for test reasons. Maybe more elegant sed solution exists playing with pattern & hold space!
Since your input file has schemas separated by blank lines you can use awk in paragraph mode and then it's simply:
$ awk -v RS= '/hello/{++c} END{print c}' file
3

unix regex for adding contents in a file

i have contents in a file
like
asdfb ... 1
adfsdf ... 2
sdfdf .. 3
I want to write a unix command that should be able to add 1 + 2 + 3 and give the result as 6
From what I am aware grep and awk would be handy, any pointers would help.
I believe the following is what you're looking for. It will sum up the last field in each record for the data that is read from stdin.
awk '{ sum += $NF } END { print sum }' < file.txt
Some things to note:
With awk you don't need to declare variables, they are willed into existence by assigning values to them.
The variable NF is the number of fields in the current record. By prepending it with a $ we are treating its value as a variable. At least this is how it appears to work anyway :)
The END { } block is only once all records have been processed by the other blocks.
An awk script is all you need for that, since it has grep facilities built in as part of the language.
Let's say your actual file consists of:
asdfb zz 1
adfsdf yyy 2
sdfdf xx 3
and you want to sum the third column. You can use:
echo 'asdfb zz 1
adfsdf yyy 2
sdfdf xx 3' | awk '
BEGIN {s=0;}
{s = s + $3;}
END {print s;}'
The BEGIN clause is run before processing any lines, the END clause after processing all lines.
The other clause happens for every line but you can add more clauses to change the behavior based on all sorts of things (grep-py things).
This might not exactly be what you're looking for, but I wrote a quick Ruby script to accomplish your goal:
#!/usr/bin/env ruby
total = 0
while gets
total += $1.to_i if $_ =~ /([0-9]+)$/
end
puts total
Here's one in Perl.
$ cat foo.txt
asdfb ... 1
adfsdf ... 2
sdfdf .. 3
$ perl -a -n -E '$total += $F[2]; END { say $total }' foo
6
Golfed version:
perl -anE'END{say$n}$n+=$F[2]' foo
6