Regular Expression for phrases starting with TO - regex

I am pretty new to Regular Expression. I want to write a regular expression to get the TO Followed by the rest of it after each new line. I tried to use this but doesn't work properly.
^TO\n?\s?[A-Za-z0-9]\n?[A-Za-z0-9]
It only highlights properly the TO W11 which all are in one line. Highlights only TO from first data and the 3rd data only highlights the first line. Basically it doesn't read the new lines.
Some of my data looks like this:
TO
EXTERNAL
TRAVERSE
TO W11
TO CONTROL
TRAVERSE
I would appreciate if anybody can help me.

Make sure you use a multiline regex:
var options = RegexOptions.MultiLine;
foreach (Match match in Regex.Matches(input, pattern, options))
...
More at: http://msdn.microsoft.com/en-us/library/yd1hzczs(v=vs.110).aspx

It looks like your pattern isn't matching because the start of the string is really a space and not the T character. Also, [A-Za-z0-9] matches only one character, and you want the whole word. I used the + to denote that I want one or more matches of those characters.
(TO\n?\s?[A-Za-z0-9]+)
This regex matches "TO EXTERNAL", "TO W11" and "TO CONTROL". Be sure to use the global modifier so that you get all matches, not just the first one.

Related

Regular Expression Match (get multiple stuff in a group)

I have trouble working on this regular expression.
Here is the string in one line, and I want to be able to extract the thing in the swatchColorList, specifically I want the word Natural Burlap, Navy, Red
What I have tried is '[(.*?)]' to get everything inside bracket, but what I really want is to do it in one line? is it possible, or do I need to do this in two steps?
Thanks
{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}
You can try this regex
(?<=[[,]\{\")[^"]+
If negative lookbehind is not supported, you can use
[[,]\{"([^"]+)
This will save needed word in group 1.
import json
str = '{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}'
obj = json.loads(str)
words = []
for thing in obj["swatchColorList"]:
for word in thing:
words.append(word)
print word
Output will be
Natural Burlap
Navy
Red
And words will be stored to words list. I realize this is not a regex but I want to discourage the use of regex on serialized object notations as regular expressions are not intended for the purpose of parsing strings with nested expressions.

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

Regex to match "Warm Regards"-type email signatures

I am an absolute regex noob and have been banging my head against the wall trying to write a regex to remove email signatures from a string that look like this:
Hi There, this is an email.
Warm Regards,
Joe Bloggs
Thus far, I’ve tried variations on:
/^[\w |][R|r]egards,/
The regex should:
look at the beginning of the line (what I was aiming for with the ^,
cover variations like “Warm Regards”, “Kind Regards”, “Best Regards”, and plain old “Regards” (which I was hoping to accomplish with the [\w |] to match any word or blank and the [R|r] to cover Regards/regards),
be OK with mixed case like “warm regards” or “Warm Regards”, and
only pickup lines that are [word] Regards or just regards, so that we don’t grab email body that has the word “regards” somewhere in it.
This seems elementary, but I just can’t nail it, and I seem to err on broadening my regex too much such that any line that contains “regards” gets picked up. I’m doing this in Node.js combined with the string.search function if that matters.
This seems to fit all your requirements:
^(\w*\s)?[r|R]egards,?
Has to start on a new line, then can have any word followed by a space, and the word regards, or just the word regards, with the comma also being optional.
If you want to wipe out everything after the regards line as well you can add in \s*.*
^(\w*\s)?[r|R]egards,?\s*.*
If you are trying to remove everything from the Warm Regards line on, this should do it
^[^<]*?(?=(.*)[R|r]egards)
Try the following regular expression
^\w* ?regards,?
with the case insensitive & global flag specified.
You can see the regular expression explanation and what it matches here: http://regex101.com/r/vR3zG5
The regular expression that matches signatures defined in #1-#4 is following:
/^(\w+ +)?regards,? *$/im
How it works:
"^" in the beginning means new line
"(\w+ +)?" means optional segment that contains exactly one word followed by at least one space
"regards" is just a simple match
",?" optional comma at the end
" *" - the line may contain trailing spaces (it may be useful to put the same match after ^)
"$" - end of line
/.../i - means that the expression is case-insensitive
/.../m - means that ^ and $ match at line breaks

Regular Expression - Want two matches get only one

I'm working wih a regular expression and have some lines in javascript. My expression should deliver two matches but recognizes only one and I don't know whats the problem.
The Lines in javascript look like this:
if(mode==1) var adresse = "?APPNAME=CampusNet&PRGNAME=ACTION&ARGUMENTS=-A7uh6sBXerQwOCd8VxEMp6x0STE.YaNZDsBnBOto8YWsmwbh7FmWgYGPUHysiL9u0.jUsPVdYQAlvwCsiktBzUaCohVBnkyistIjCR77awL5xoM3WTHYox0AQs65SoHAhMXDJVr7="; else var adresse = "?APPNAME=CampusNet&PRGNAME=ACTION&ARGUMENTS=-AHMqmg-jXIDdylCjFLuixe..udPC2hjn6Kiioq7O41HsnnaP6ylFkQLhaUkaWKINEj4l2JqL2eBSzOpmG.b5Av2AvvUxEinUhMBTt5awdgAL4SkBEgYXGejTGUxcgPE-MfiQjefc=";
My expression looks like this:
(?<Popup>(popUp\(')|(adresse...")).*\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
I want to have two matches with APPNAME...... as Parameters.
[UPDATE] Like Tim Pietzcker wrote i used the greedy version and should have used the lazy version. while he wrote that i solved it myself by using .? instead of . in the middle so the expression looks like this:
(?<Popup>(popUp\(')|(adresse...")).*?\\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
That worked. Thanks to Tim Pietzcker
Your regex matches too much - from the very first adresse until the very last " because it uses a greedy quantifier .*.
If you make that quantifier lazy, i. e.
(?<Popup>(popUp\(')|(adresse...")).*?\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
you get two matches.
Alternatively, if your data allows this, use a different quantifier that only matches non-space characters. This will match faster (but will fail of course if the text you're trying to match could possibly contain spaces):
(?<Popup>(popUp\(')|(adresse..."))\S*\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
Usually you must apply the regex with the "global" flag to find all matches. I can't really say more until I see the complete code sample you are working with.

Need regexp to find substring between two tokens

I suspect this has already been answered somewhere, but I can't find it, so...
I need to extract a string from between two tokens in a larger string, in which the second token will probably appear again meaning... (pseudo code...)
myString = "A=abc;B=def_3%^123+-;C=123;" ;
myB = getInnerString(myString, "B=", ";" ) ;
method getInnerString(inStr, startToken, endToken){
return inStr.replace( EXPRESSION, "$1");
}
so, when I run this using expression ".+B=(.+);.+"
I get "def_3%^123+-;C=123;" presumably because it just looks for the LAST instance of ';' in the string, rather than stopping at the first one it comes to.
I've tried using (?=) in search of that first ';' but it gives me the same result.
I can't seem to find a regExp reference that explains how one can specify the "NEXT" token rather than the one at the end.
any and all help greatly appreciated.
Similar question on SO:
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Replace patterns that are inside delimiters using a regular expression call
RegEx matching HTML tags and extracting text
You're using a greedy pattern by not specifying the ? in it. Try this:
".+B=(.+?);.+"
Try this:
B=([^;]+);
This matches everything between B= and ; unless it is a ;. So it matches everything between B= and the first ; thereafter.
(This is a continuation of the conversation from the comments to Evan's answer.)
Here's what happens when your (corrected) regex is applied: First, the .+ matches the whole string. Then it backtracks, giving up most of the characters it just matched until it gets to the point where the B= can match. Then the (.+?) matches (and captures) everything it sees until the next part, the semicolon, can match. Then the final .+ gobbles up the remaining characters.
All you're really interested in is the "B=" and the ";" and whatever's between them, so why match the rest of the string? The only reason you have to do that is so you can replace the whole string with the contents of the capturing group. But why bother doing that if you can access contents of the group directly? Here's a demonstration (in Java, because I can't tell what language you're using):
String s = "A=abc;B=def_3%^123+-;C=123;";
Pattern p = Pattern.compile("B=(.*?);");
Matcher m = p.matcher(s);
if (m.find())
{
System.out.println(m.group(1));
}
Why do a 'replace' when a 'find' is so much more straightforward? Probably because your API makes it easier; that's why we do it in Java. Java has several regex-oriented convenience methods in its String class: replaceAll(), replaceFirst(), split(), and matches() (which returns true iff the regex matches the whole string), but not find(). And there's no convenience method for accessing capturing groups, either. We can't match the elegance of Perl one-liners like this:
print $1 if 'A=abc;B=def_3%^123+-;C=123;' =~ /B=(.*?);/;
...so we content ourselves with hacks like this:
System.out.println("A=abc;B=def_3%^123+-;C=123;"
.replaceFirst(".+B=(.*?);.+", "$1"));
Just to be clear, I'm not saying not to use these hacks, or that there's anything wrong with Evan's answer--there isn't. I just think we should understand why we use them, and what trade-offs we're making when we do.