I want only a match on a 3 digit number (under 600, in example below "598") when a specific number in string is visible between start wording and end wording. With below regular expression I get a match of everything, can anyone help?
Regular expression: (?<=Start)(.*)(?=End).
Test string:
Start 440 3 956 4 603 5 - 6 603 7 440 8 - 9 440 10 956 11 440 12 603 13 2005
14 440 15 598 16 1156 17 946 18 761 19 761 20 946 21 598 22 598
23 1156 24 2057 25 946 26 1194 27 946 28 946 - - - Zurich 2019 M T W T F S S - - - - 1 - 2 1058 3 542 4 852 5 - 6 1517 7 1058 8 - 9 1058 10 848 11 542 12 705 13 1306 14 1058 15 1258 16 2159 17 1617 18 700 19 863 20 700 21 1258 22 1911 23 1911 24 1617 25 1258 26 2759 27 1258 28 1258 - - - End
With \b[0-5]\d{2}\b you find all 3 digit number under 600.
Demo: https://regex101.com/r/0ZSbbY/2
Try this pattern:
(?<=^|\D)[1-5]?\d{2}(?!.+Start)(?=\D.+End)
(?<=^|\D)[1-5]?\d{1,2} this will match all 1- or 2-digit numbers, as they are less than 600. It also findes also 1**, 2**, 3**, 4**, 5** numbers.
(?!.+Start)(?=\D.+End) this lookahead assure that we are before End word and not before Start word, i.e. between them. It couldn't be done with positive lookbehind as #TimBiegeleisen stated, as it would have variable length.
Demo
#!/usr/bin/perl
use Modern::Perl;
use Data::Dumper;
my $str = 'Start 440 3 956 4 603 5 - 6 603 7 440 8 - 9 440 10 956 11 440 12 603 13 2005 14 440 15 598 16 1156 17 946 18 761 19 761 20 946 21 598 22 598 23 1156 24 2057 25 946 26 1194 27 946 28 946 - - - Zurich 2019 M T W T F S S - - - - 1 - 2 1058 3 542 4 852 5 - 6 1517 7 1058 8 - 9 1058 10 848 11 542 12 705 13 1306 14 1058 15 1258 16 2159 17 1617 18 700 19 863 20 700 21 1258 22 1911 23 1911 24 1617 25 1258 26 2759 27 1258 28 1258 - - - End';
my $threshold = 600;
my $re = qr/
(?: # start non capture group
Start # literally
| # OR
\G # iterate from last match position
) # end group
(?:(?!End).)*? # make sure we don't have "End" before to number to find
(?<!\d) # negative lookbehind, make sure we don't have a digit before
(\d{3}) # 3 digit number
(?!\d) # negative lookahead, make sure we don't have a digit after
/x;
# Retrieve all 3 digit numbers between Start and End
my #numbers = $str =~ /$re/g;
# Select numbers that are less than $threshold. In this case 600
#numbers = grep { $_ < $threshold } #numbers;
say Dumper \#numbers;
Output:
$VAR1 = [
440,
440,
440,
440,
440,
598,
598,
598,
542,
542
];
If you're searching for a specific number, like one that is close to 600, I would suggest to use regexp to collect all numbers and then use some algorythm to find matching number.
This regexp will help you to check that your string matches pattern and to collect all numbers using group "number".
^Start (([^\d]+ )*((?<number>\d+) )*)*End$
This simplier regexp will help you to collect numbers without checking all String:
\d+
Iterate trough your numbers collection and find needed one.
Sorry I don't noticed what language do you use to write code snippet.
I am new to Awk programming.I have a question on manipulating text file,which is required to draw certain Network based images in a visualization software(Circos http://circos.ca)
I have input data for which I want to manipulate values using awk/grep/sed.
There are 9 pairs(18 lines).5 pairs(first 10 lines) are for "from=ABCB11", and 4 pairs(next 8 lines) are for "from =ABCC8". What I want is extract the value from the first line of the first pair and replace it in each alternate line of the rest of the other pairs.
So value for group-2 is 9 10 ,which should replace all the occurence of value in group2.
The next value for group-2 is 28 29,which should be replaced by 9 10.
The stop should be determined by "from=name" which is "from=ABCB11".Its not necessary that the rows that have to captured expression from and replace in its next occurence will belong to group-2 as in this instance.It could be group-3 or group-4 until group-10.So second set ("from =ABCC8")could have been belonged to group-4/5/6 not necessary group-2.Its just a coincidence here.
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 28 29 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-5 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-2 29 30 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-5 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-2 10 11 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-3 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-2 11 12 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-3 2 3 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 12 13 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 0 1 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-1 0 1 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-2 1 2 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-1 1 2 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-2 2 3 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1
group-1 2 3 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1
Below is the FINAL output,I am looking for:
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-5 0 1 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM1,toid=114,use=1,z=1
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-5 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=CHRM2,toid=115,use=1,z=1
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-3 1 2 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=DRD2,toid=158,use=1,z=1
group-2 9 10 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-3 2 3 text color=black,from=ABCB11,fromid=4,order=2,thickness=3,to=EGF,toid=164,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-3 12 13 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ACE,toid=11,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-1 0 1 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1A,toid=21,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-1 1 2 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1B,toid=22,use=1,z=1
group-2 21 22 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1
group-1 2 3 text color=black,from=ABCC8,fromid=5,order=2,thickness=3,to=ADRA1D,toid=23,use=1,z=1
Also,this is just a sample data.So many pairs would have group-1,group-4,group-5 upto group 10.Here,only pairs from lesser groups are mentioned.
I want to loop through the lines until the value in "from=name" remains same,so that I can change all occurences in each alternate line.Code:
awk -F, 'NR%2==1 {split($2,a,"="); print a[2]}' file.txt
The above code is able to extract the alternate lines and the "name" in "from=name"
The following is quite verbose (I love verbose variable names). Using your sample-data, I get the data you want to have. This assumes, that every "uneven" line gets the values from the first line with the same "from=xxxx" information.
awk '
BEGIN {
namevar=""
val1var=""
val2var=""
linenum=0
}
{
split($0, linearr)
split(linearr[5], csvarr, ",")
if (namevar != csvarr[2]) {
namevar=csvarr[2]
val1var=linearr[2]
val2var=linearr[3]
linenum=0
}
linenum+=1
if (linenum%2==1) {
print linearr[1], val1var, val2var, linearr[4], linearr[5]
} else {
print linearr[1], linearr[2], linearr[3], linearr[4], linearr[5]
}
}' file.txt