Regex Fine End of Search Field then remove rest of line - regex

Hi I have been looking for a Regex I can find most of what im after but not quite right.
Im trying to do a find a replace using regex, which i can get to work but not quite the way i want to.
An example of what i am searching is
10/01/14PUT/a/users/84335httpetcetcetcete
10/01/14GET/a/users/663/badges?thisisatest
10/01/14GET/a/users/8836:thisisatestetc
What im trying to do is and the end of the user digits as shown below by a % i have put in temporarily i want to remove the rest of the line.
10/01/14PUT/a/users/84335%httpetcetcetcete
10/01/14GET/a/users/663%/badges?thisisatest
10/01/14GET/a/users/8836%:thisisatestetc
I have been using s = s.regex.replace(s, "a/users/\d*", " ")
but this if obviously not working, so close yet so far.
Any assistance is gratefully received.
Many thanks, VBVirg

You were actually on the right track, the regex you came up with is almost what you need:
a/users/\d*
But what your call did was actually replace what you wanted to preserve with a space.
The regex you're looking for would be more like this:
(a\/users\/\d*).*$
And you would use it in the Replace() method as follows:
s = Regex.Replace(s, "(a\/users\/\d*).*$", "$1") />
The $1 is a backreference to the capture group (the part of the regex in parentheses). So what this would do is take whatever part of the string matches that regex, and replace it with only what is in the capture group.

How about: s = s.regex.replace(s, "(a/users/\d*).*", "\1")
This will save the "a/users/(digits)" string to a variable (\1), so it doesn't get deleted by the replace function.

I think the following will do what you want:
s = Regex.Replace(s, "^(.*\/users\/\d*).*$", "$1")
It works by capturing the part of the string you are interested in and replacing the whole string with just the part that was captured.

Related

Regex substring

I'm trying to select a substring using regex and I'm going round in circles. I need to select everything before the first "_".
exampale URL - GI_2013_JUNE_10_VOL3_LASTCHANCE
So the result Im looking for from the URL above would be "GI". The text before the first "_" can vary in length.
Any help would be much apprecited
The regex would be:
^[^_]+
and grab the whole regex match. But as a comment says, using a substring function is more efficient!
^[^_]*
...is the expression you're looking for.
It basically says: Select everything that is not an underscore, starting at the beginning of the string.
http://regexr.com?356in

regex match string replace in new string (notepad )

In Notepad++ I need to match "dog" (search) in
<tag>old-string/dog.swf">more-old-string</tag>
then use a back-reference (\1) to include it in another string (replace):
new-string_\1>more-new-string
to give the result
new-string_dog.swf">more-new-string
I'm new to regex, so please show me how to do this first by matching "dog", then by excluding all old-string in the result.
Edit: I realize this might be confusing, so I posted the actual problem here: regex find word in string, replace word in new string (using Notepad++). I hope it makes more sense.
I am totally going off a limb here trying to understand what you want, but if I have understood you correctly you want to:
match a certain word in one string
create a new string with data from the old one, under the old string which should be untouched
If so, this regular expression: /^(.*?/([^"]+)\.swf)(".*$)/i with this replacement: \1\3\r\n<test>\2</test> should do the trick. I have used <test></test> to show you where you put your "new-string" stuff.
I hope this helps!

regex optional part in prefix, but do not include it in matches if it present

Problem is easier to be seen in code then described I got following regex
(?<=First(Second)?)\w{5}
and following sample data
FirstSecondText1
FirstText2
I only want matches Text1 & Text2 , I get 3 though, Secon is added, and I don't want that.
Played around, cant seem to get it to work.
You need an additional negative lookahead:
(?<=First(Second)?)(?!Second)\w{5}
If you want to avoid using Second twice, you could do it without lookaround and take the result of the first capturing group:
First(?:Second)?(\w{5})
You can try this regex (?<=First(Second)?)\w{5}$. All you have to do is to add a $ in the end so that the regex would not match the text Secon. You can use this as long as you are sure of the pattern that comes at the end of the input text. In this case it is \w{5}$

Regex matching in ColdFusion OR condition

I am attempting to write a CF component that will parse wikiCreole text. I am having trouble getting the correct matches with some of my regular expression though. I feel like if I can just get my head around the first one the rest will just click. Here is an example:
The following is sample input:
You can make things **bold** or //italic// or **//both//** or //**both**//.
Character formatting extends across line breaks: **bold,
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
My first attempt was:
<cfset out = REreplace(out, "\*\*(.*?)\*\*", "<strong>\1</strong>", "all") />
Then I realized that it would not match where the ** is not given, and it should end where there are two carriage returns.
So I tried this:
<cfset out = REreplace(out, "\*\*(.*?)[(\*\*)|(\r\n\r\n)]", "<strong>\1</strong>", "all") />
and it is close but for some reason it gives you this:
You can make things <strong>bold</strong>* or //italic// or <strong>//both//</strong>* or //<strong>both</strong>*//.
Character formatting extends across line breaks: <strong>bold,</strong>
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
Any ideas?
PS: If anyone has any suggestions for better tags, or a better title for this post I am all ears.
The [...] represents a character class, so this:
[(\*\*)|(\r\n\r\n)]
Is effectively the same as this:
[*|\r\n]
i.e. it matches a single "*" and the "|" isn't an alternation.
Another problem is that you replace the double linefeed. Even if your match succeeded you would end up merging paragraphs. You need to either restore it or not consume it in the first place. I'd use a positive lookahead to do the latter.
In Perl I'd write it this way:
$string =~ s/\*\*(.*?)(?:\*\*|(?=\n\n))/<strong>$1<\/strong>/sg;
Taking a wild guess, the ColdFusion probably looks like this:
REreplace(out, "\*\*(.*?)(?:\*\*|(?=\r\n\r\n))", "<strong>\1</strong>", "all")
You really should change your
(.*?)
to something like
[^*]*?
to match any character except the *. I don't know if that is the problem, but it could be the any-character . is eating one of your stars. It also a generally accepted "best practice" when trying to balance matching characters like the double star or html start/end tags to explicitly exclude them from your match set for the inner text.
*Disclaimer, I didn't test this in ColdFusion for the nuances of the regex engine - but the idea should hold true.
I know this is an older question but in response to where Ryan Guill said "I tried the $1 but it put a literal $1 in there instead of the match" for ColdFusion you should use \1 instead of $1
I always use a regex web-page. It seems like I start from scratch every time I used regex.
Try using '$1' instead of \1 for this one - the replace is slightly different... but I think the pattern is what you need to get working.
Getting closer with this:
**(.?)**|//(.?)//
The tricky part is the //** or **//
Ok, first checking for //bold//
then //bold// then bold, then
//bold//
**//(.?)//**|//**(.?)**//|**(.?)**|//(.?)//
I find this app immensely helpful when I'm doing anything with regex:
http://www.gskinner.com/RegExr/desktop/
Still doesn't help with your actual issue, but could be useful going forward.

Need regexp to find substring between two tokens

I suspect this has already been answered somewhere, but I can't find it, so...
I need to extract a string from between two tokens in a larger string, in which the second token will probably appear again meaning... (pseudo code...)
myString = "A=abc;B=def_3%^123+-;C=123;" ;
myB = getInnerString(myString, "B=", ";" ) ;
method getInnerString(inStr, startToken, endToken){
return inStr.replace( EXPRESSION, "$1");
}
so, when I run this using expression ".+B=(.+);.+"
I get "def_3%^123+-;C=123;" presumably because it just looks for the LAST instance of ';' in the string, rather than stopping at the first one it comes to.
I've tried using (?=) in search of that first ';' but it gives me the same result.
I can't seem to find a regExp reference that explains how one can specify the "NEXT" token rather than the one at the end.
any and all help greatly appreciated.
Similar question on SO:
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Replace patterns that are inside delimiters using a regular expression call
RegEx matching HTML tags and extracting text
You're using a greedy pattern by not specifying the ? in it. Try this:
".+B=(.+?);.+"
Try this:
B=([^;]+);
This matches everything between B= and ; unless it is a ;. So it matches everything between B= and the first ; thereafter.
(This is a continuation of the conversation from the comments to Evan's answer.)
Here's what happens when your (corrected) regex is applied: First, the .+ matches the whole string. Then it backtracks, giving up most of the characters it just matched until it gets to the point where the B= can match. Then the (.+?) matches (and captures) everything it sees until the next part, the semicolon, can match. Then the final .+ gobbles up the remaining characters.
All you're really interested in is the "B=" and the ";" and whatever's between them, so why match the rest of the string? The only reason you have to do that is so you can replace the whole string with the contents of the capturing group. But why bother doing that if you can access contents of the group directly? Here's a demonstration (in Java, because I can't tell what language you're using):
String s = "A=abc;B=def_3%^123+-;C=123;";
Pattern p = Pattern.compile("B=(.*?);");
Matcher m = p.matcher(s);
if (m.find())
{
System.out.println(m.group(1));
}
Why do a 'replace' when a 'find' is so much more straightforward? Probably because your API makes it easier; that's why we do it in Java. Java has several regex-oriented convenience methods in its String class: replaceAll(), replaceFirst(), split(), and matches() (which returns true iff the regex matches the whole string), but not find(). And there's no convenience method for accessing capturing groups, either. We can't match the elegance of Perl one-liners like this:
print $1 if 'A=abc;B=def_3%^123+-;C=123;' =~ /B=(.*?);/;
...so we content ourselves with hacks like this:
System.out.println("A=abc;B=def_3%^123+-;C=123;"
.replaceFirst(".+B=(.*?);.+", "$1"));
Just to be clear, I'm not saying not to use these hacks, or that there's anything wrong with Evan's answer--there isn't. I just think we should understand why we use them, and what trade-offs we're making when we do.