regex match string replace in new string (notepad ) - regex

In Notepad++ I need to match "dog" (search) in
<tag>old-string/dog.swf">more-old-string</tag>
then use a back-reference (\1) to include it in another string (replace):
new-string_\1>more-new-string
to give the result
new-string_dog.swf">more-new-string
I'm new to regex, so please show me how to do this first by matching "dog", then by excluding all old-string in the result.
Edit: I realize this might be confusing, so I posted the actual problem here: regex find word in string, replace word in new string (using Notepad++). I hope it makes more sense.

I am totally going off a limb here trying to understand what you want, but if I have understood you correctly you want to:
match a certain word in one string
create a new string with data from the old one, under the old string which should be untouched
If so, this regular expression: /^(.*?/([^"]+)\.swf)(".*$)/i with this replacement: \1\3\r\n<test>\2</test> should do the trick. I have used <test></test> to show you where you put your "new-string" stuff.
I hope this helps!

Related

Regex expression to match word in a string

I've been banging my head with this issue as I can't get this expression to work.
I'm trying to match and output a specific word from a string, so for example, take this string:
<ANIMALS>
<value>DOG CAT COW</value>
</ANIMALS>
And now I want to match any one of them and return that value otherwise none, let's say, COW.
I've tried a lot of varying expressions with no luck such as:
IF(VALUE == "/(^|\W)COW($|\W)/", "COWVALUE", "NONE")
This doesn't work, nor do any other variants I've tried. If I keep the original string as a single word and no actual calling expression, just a word, then it always works. As soon as I introduce a string of words then I can't make it happen.
Could anyone help please?
Thanks!
Get rid of (^|\W) and ($|\W), since that only matches at word boundaries, but you want to match COW when it's not by itself as a word. The regular expression should just be /COW/, it will match that string wherever it appears in the string.
BTW, to match word boundaries you can use \b rather than those alternations.
Not sure what programming language you're using but following is a simple example based in Javascript to do achieve your goals.
regex demo

Regex expression that selects only specific word

Welcome guys, I am just new to this community!
Here is the case, I am having some strings like these
thatisanappleaaa
thatisanappleaaa bad
thatisanappleaaa.bad
thatisanapplebadaaa
thatisanbadappleaaa
thatisanbadbadappleaaa
badthatisanappleaaa
and trying to use Sublime Text 3 Find and replace function to achieve the following (note that only the first line is being replaced)
thatisanorangeaaa
thatisanappleaaa bad
thatisanappleaaa.bad
thatisanapplebadaaa
thatisanbadappleaaa
thatisanbadbadappleaaa
badthatisanappleaaaa
Is there a regex that filters "apple" in "thatisanappleaaa"(which is line one) only without the presence of "bad" in any position (except between "apple") in the string, given that the string "bad" does not change every time it appears?
Try
(\w+)apple(\w+).*
will select all text wrapped around apple
if you want to select text trailing after apple use
apple(\w+).*
After reading your description I'm assuming you want to replace the word apple only in sentences which do not have any occurrences of the word bad.
I've used a regex which uses a negative lookahead and used parentheses to capture apple which can then be replaced with any word, in your case orange.
Regex: ^(?!.*bad).*(apple)
DEMO

Regex Fine End of Search Field then remove rest of line

Hi I have been looking for a Regex I can find most of what im after but not quite right.
Im trying to do a find a replace using regex, which i can get to work but not quite the way i want to.
An example of what i am searching is
10/01/14PUT/a/users/84335httpetcetcetcete
10/01/14GET/a/users/663/badges?thisisatest
10/01/14GET/a/users/8836:thisisatestetc
What im trying to do is and the end of the user digits as shown below by a % i have put in temporarily i want to remove the rest of the line.
10/01/14PUT/a/users/84335%httpetcetcetcete
10/01/14GET/a/users/663%/badges?thisisatest
10/01/14GET/a/users/8836%:thisisatestetc
I have been using s = s.regex.replace(s, "a/users/\d*", " ")
but this if obviously not working, so close yet so far.
Any assistance is gratefully received.
Many thanks, VBVirg
You were actually on the right track, the regex you came up with is almost what you need:
a/users/\d*
But what your call did was actually replace what you wanted to preserve with a space.
The regex you're looking for would be more like this:
(a\/users\/\d*).*$
And you would use it in the Replace() method as follows:
s = Regex.Replace(s, "(a\/users\/\d*).*$", "$1") />
The $1 is a backreference to the capture group (the part of the regex in parentheses). So what this would do is take whatever part of the string matches that regex, and replace it with only what is in the capture group.
How about: s = s.regex.replace(s, "(a/users/\d*).*", "\1")
This will save the "a/users/(digits)" string to a variable (\1), so it doesn't get deleted by the replace function.
I think the following will do what you want:
s = Regex.Replace(s, "^(.*\/users\/\d*).*$", "$1")
It works by capturing the part of the string you are interested in and replacing the whole string with just the part that was captured.

replacing all open tags with a string

Before somebody points me to that question, I know that one can't parse html with regex :) And this is not what I am trying to do.
What I need is:
Input: a string containing html.
Output: replace all opening tags
***<tag>
So if I get
<a><b><c></a></b></c>, I want
***<a>***<b>***<c></a></b></c>
as output.
I've tried something like:
(<[~/].+>)
and replace it with
***$1
But doesn't really seem to work the way I want it to. Any pointers?
Clarification: it's guaranteed that there are no self closing tags nor comments in the input.
You just have two problems: ^ is the character to exclude items from a character class, not ~; and the .+ is greedy, so will match as many characters as possible before the final >. Change it to:
(<[^/].+?>)
You can also probably drop the parentheses and replace with $0 or $&, depending on the language.
Try using: (<[^/].*?>) and replace it with ***$1

Need regexp to find substring between two tokens

I suspect this has already been answered somewhere, but I can't find it, so...
I need to extract a string from between two tokens in a larger string, in which the second token will probably appear again meaning... (pseudo code...)
myString = "A=abc;B=def_3%^123+-;C=123;" ;
myB = getInnerString(myString, "B=", ";" ) ;
method getInnerString(inStr, startToken, endToken){
return inStr.replace( EXPRESSION, "$1");
}
so, when I run this using expression ".+B=(.+);.+"
I get "def_3%^123+-;C=123;" presumably because it just looks for the LAST instance of ';' in the string, rather than stopping at the first one it comes to.
I've tried using (?=) in search of that first ';' but it gives me the same result.
I can't seem to find a regExp reference that explains how one can specify the "NEXT" token rather than the one at the end.
any and all help greatly appreciated.
Similar question on SO:
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Replace patterns that are inside delimiters using a regular expression call
RegEx matching HTML tags and extracting text
You're using a greedy pattern by not specifying the ? in it. Try this:
".+B=(.+?);.+"
Try this:
B=([^;]+);
This matches everything between B= and ; unless it is a ;. So it matches everything between B= and the first ; thereafter.
(This is a continuation of the conversation from the comments to Evan's answer.)
Here's what happens when your (corrected) regex is applied: First, the .+ matches the whole string. Then it backtracks, giving up most of the characters it just matched until it gets to the point where the B= can match. Then the (.+?) matches (and captures) everything it sees until the next part, the semicolon, can match. Then the final .+ gobbles up the remaining characters.
All you're really interested in is the "B=" and the ";" and whatever's between them, so why match the rest of the string? The only reason you have to do that is so you can replace the whole string with the contents of the capturing group. But why bother doing that if you can access contents of the group directly? Here's a demonstration (in Java, because I can't tell what language you're using):
String s = "A=abc;B=def_3%^123+-;C=123;";
Pattern p = Pattern.compile("B=(.*?);");
Matcher m = p.matcher(s);
if (m.find())
{
System.out.println(m.group(1));
}
Why do a 'replace' when a 'find' is so much more straightforward? Probably because your API makes it easier; that's why we do it in Java. Java has several regex-oriented convenience methods in its String class: replaceAll(), replaceFirst(), split(), and matches() (which returns true iff the regex matches the whole string), but not find(). And there's no convenience method for accessing capturing groups, either. We can't match the elegance of Perl one-liners like this:
print $1 if 'A=abc;B=def_3%^123+-;C=123;' =~ /B=(.*?);/;
...so we content ourselves with hacks like this:
System.out.println("A=abc;B=def_3%^123+-;C=123;"
.replaceFirst(".+B=(.*?);.+", "$1"));
Just to be clear, I'm not saying not to use these hacks, or that there's anything wrong with Evan's answer--there isn't. I just think we should understand why we use them, and what trade-offs we're making when we do.