Why does my regular expression select everything? - regex

Hey guys, I'm trying to select a specific string out of a text, but I'm not a master of regular expressions.
I tried one way, and it starts from the string I want but it matches everything after what I want too.
My regex:
\nSCR((?s).*)(GI|SI)(.*?)\n
Text I'm matching on.
Hierbij een test
SCR
S09
/vince#test.be
05FEB
GI BRGDS OPS
middle text string (may not selected)
SCR
S09
05FEB
LHR
NPVT700 PVT701 30MAR30MAR 1000000 005CRJ FAB1900 07301NCE DD
/ RE.GBFLY/
GI BRGDS
The middle string is selected, it only needs the SCR until the GI line.

Use the non-greedy quantifier also on the first quantifier:
\nSCR((?s).*?)(GI|SI)(.*?)\n
Or you could use a negative look-ahead assertion (?!expr) to capture just those lines that do not start with either GI or SI:
\nSCR((?:\n(?!GI|SI).*)*)\n(?:GI|SI).*\n

To match from a line starting with SCR to a line starting with GI or SI (inclusive), you would use the following regular expression:
(?m:^SCR\n(?:^(?!GI|SI).*\n)*(?:GI|SI).*)
This will:
Find the start of a line.
Match SCR and a new line.
Match all lines not starting with GI or SI.
Match the last line, requiring there to be GI or SI (this prevents it from matching to the end of the string if there is no GI or SI.

Related

Need to match the repeated words and replace it with a new one using regex

I was trying to match a pattern from the below line in linux,
$(menu_no),ini_question3.vox,inv_question3.vox,inv_question3.vox,ini_question3.vox,to_question3.vox
From the above line i need to find the repeated word and replace the repetition with some other word,
Ex: Here the repeated word is
inv_question3.vox,inv_question3.vox
i need this to be changed to
inv_question3.vox,end.vox
I was trying to do a find and replace in vim editor using the below command, but it didn't work,
:s/\(inv_question*.vox\),\(inv_question*.vox\)/\1,end.vox/g
I'm not sure what you want to replace, but if you want to match duplicate string inv_question3.vox, you can try:
let string = '$(menu_no),ini_question3.vox,inv_question3.vox,inv_question3.vox,ini_question3.vox,to_question3.vox';
let result = string.replace(/(inv_question\d\.vox\,){2,}/, '$1,end.vox,');
console.log(result)
\d: only 1 number.
{2,}: 2 or more times
Then, replacing with $1,end.vox, with $1 is inv_question3.vox

How remove 1st ":" word from line in txt file?

Please see my textfile data below
roydwk27:teenaibuchytilibu5762sumonkhan:IJQRiq&76:8801627574057
deonnarsi15:latashajcclaypoolejcv5946sumonkhan:JKVWjv&20:8801627573929
ernaalo68:lindaohschletteoha1797sumonkhan:OPYZoy&84:8801628302709
dorathyshi56:fredrickaslperkinsonsle8932sumonkhan:STJKsj&30:8801621846709
londassg15:nataliaunmcredmondung5478sumonkhan:UVDEud&61:8801624792536
xiaoexu39:miriamfyboatwrightfyr3810sumonkhan:IJZAiz&47:8801626854856
I am want delete first word until :
like
roydwk27:
deonnarsi15:
ernaalo68:
dorathyshi56:
actually I am want if sumonkhan starting line then no problem but if sumonkhan line area 1st position available : with something then need remove this.
below actually data show in my .txt file
nataliaunmcredmondung5478sumonkhan:UVDEud&61:8801624792536
miriamfyboatwrightfyr3810sumonkhan:IJZAiz&47:8801626854856
all line available sumonkhan so if sumon khan starting position like this then good else delete this : full word not full line.
I hope this regex would help you. This regex deletes everything until first colon(:).
If you are reading a file then, read it line by line and run following regex on each line.
$str = 'roydwk27:teenaibuchytilibu5762sumonkhan:IJQRiq&76:8801627574057';
$str =~ s/^(?:.*?):(.*)/$1/g;
This code is in perl, you can re-write equivalent code in any other language.
See this demo at regex101.com.
^[\w\d]+:(.*)
^ // match the beginning of a line
[\w\d]+ // match any letter and any number
: // match ":" literally
( // start of the capturing group
.* // match any characters
) // end of capturing group
Now in all your matches in the first group you have the text you want matched. Note the g (global) and m (multiline) modifiers.

How to select any 6 digit numbers separated with a dot on each line

I'm using Sublime 2 and I have a giant text file that is like:
12lk lkkls 92k.sk kal lk 123.456 ldfdk pak 1. s
193.482 ls k lsdk 2.w0 slk s099092 s,. s.
kllk aslk a01ma lka 983.873
Every line has only one number like XXX.XXX. I need to clean everything else but that number.
Can I do that using only Sublime and regex?
Find required string in line, capture it and replace line with capturing string.
.*(\d{3}\.\d{3}).*
regex demo
Find > Replace..
Find What: ^.*(\d\d\d\.\d\d\d).*$
Replace With: \1

regex to find inner most occurrence of strings between two delimiters

I am using TextCrawler *regxp* to align existing plain text file.
Text inside the file are continuous without line break.
....moredata....
,actor's list:
Amy Brenneman, Aaron Eckhart, Catherine Keener, Natassja Kinski
, Jason Patric, Ben Stiller,
movies released:
Gladiator,Matrix Reloaded,The Shawshank Redemption,Pirates of the Caribbean
- Curse of the Black Pearl,Monsters Inc,
genre:
SciFi,Romance,Drama,Action,Comedy,Advenure,Animated,Western,Horror
....moredata....
I am trying to find the string(s) between the comma and the colon and replace with the same but with new line added before found pattern.
I tried following, but it matching string form outermost comma to colon.
[,]{1}.[A-Z].*[:]
Any idea on the same ? Where i went wrong?
Why not use this pattern:
search: (?<=,)[^,:]+(?=:)
replace: \n$0
pattern details:
(?<=,) # lookbehind assertion: only a check that means "preceded by ,"
[^,:]+ # negated char class: all characters except , and :
(?=:) # lookahead assertion: only a check that means "followed by :"
Lookarounds are only tests that can make the pattern fail or succeed, they are not part of the match result.
The below mentioned pattern works:
Search Pattern : (,?[^:,]+:)
Replacement String : \n\1\n
For eg:
Given a file a.txt with contents :
actor's list:A,B,C,movies released:D,E,F,genre:G,H,I
perl -pe "s#(,?[^:,]+:)#\n\1\n#g" a.txt
The above command produces a output of the below format :
actor's list:
A,B,C
,movies released:
D,E,F
,genre:
G,H,I
I hope the the above output is what you are expecting.

Regular Expression: Extract the lines

I try to extract the name1 (first-row), name2 (second-row), name3 (third-row) and the street-name (last-row) with regex:
Company Inc.
JohnDoe
Foobar
Industrieterrein 13
The very last row is the street name and this part is already working (the text is stored in the variable "S2").
REGEXREPLACE(S2, "(.*\n)+(?!(.*\n))", "")
This expression will return me the very last line. I am also able the extract the first row:
REGEXREPLACE(S2, "(\n.*)", "")
My problem is, that I do not know how to extract the second and third row....
Also how do I test if the text contains one, two, three or more rows?
Update:
The regex is used in the context of Scribe (a ETL tool). The problem is I can not execute sourcecode, I only have the following functions:
REGEXMATCH(input, pattern)
REGEXREPLACE(input, pattern, replacement)
If the regex language provides support for lookaheads you may count rows backwards and thus get (assuming . does not match newline)
(.*)$ # matching the last line
(.*)(?=(\n.*){1}$) # matching the second last line (excl. newline)
(.*)(?=(\n.*){2}$) # matching the third last line (excl. newline)
just use this regex:
(.+)+
explain:
.
Wildcard: Matches any single character except \n.
+
Matches the previous element one or more times.
As for a regular expression that will match each of four rows, how about this:
(.*?)\n(.*?)\n(.*?)\n(.*)
The parentheses will match, and the \n will match a new line. Note: you may have to use \r\n instead of just \n depending; try both.
You can try the following:
((.*?)\n){3}