Regex: If brackets, then set to empty, otherwise pass through - regex

I have a string that may or may not have brackets in it. If there are brackets in the string, then I want to set it to an empty string (or a null value). Otherwise, I want to do *.?.
This is what I have so far, but it's definitely not right:
(?(?=[\[^\]])("")|(.*?)).
One example string is: abcdef[12345].
I don't want to match on something like: mousecat.
What would be the best way to do this operation? There is a Grok template with regex statements that gets executed within a Java program.
Thanks!
Keren
Edit: How to do this if I have numbers within the brackets? Rather than just anything within the brackets.

The pattern:
^.*[\[\]].*$|^([^\[\]]*)$
The replacement:
$1

try this
var str = "abcdef[xxxx]";
console.log(/\[.+\]/i.exec(str))

Related

Regular expression and extracting a value

I need to get hold oif the |value| below:
{"token":"<input name=\"__RequestVerificationToken\" type=\"hidden\" value=\"KhWUxVIL697p18Gm3T1b4pCmXjK7iQujsJieYiLOKcKmKbdvC55kgaqg4G-uGqeUzmV3x6EMAV_ejPHe-Ok2kFqnjzVmvZmHySMpwKzGvq01\" />"}
What kind of regular expression would match that?
I have tried to us this:
.check(regex("input name='__RequestVerificationToken' type='hidden' value='([A-Za-z0-9+=/'-'_]+?)'").saveAs("token")))
But it does not match.
Also using a regex tester does not get me anywhere, please help me.
I would use something like this:
regex("<input.+__RequestVerificationToken.+value=\\?(\"|\')(.+)\\?(\"|\').+>")
It can be made shorter, but I was not sure how actual example string looks (does it have escape chars at once, does it use single or double qoutes).
assuming that the string in your question is exactly the way it appears, with escaped double quotes \" etc.
here is the code:
val regexGroupExtractor = """.*value=\\"(.*)\\".*""".r
val regexGroupExtractor(e) = s
// e == "KhWUxVIL697p18Gm3T1b4pCmXjK7iQujsJieYiLOKcKmKbdvC55kgaqg4G-uGqeUzmV3x6EMAV_ejPHe-Ok2kFqnjzVmvZmHySMpwKzGvq01"
In general with regex it is often helpful to think of the pattern in reverse: instead of specifying what is included, specify what is not. In your case there is no need to specify which characters are "in" inside the (), instead focus on where the part you want starts and ends. Specifically in your example - quotes are outside the string you want, in fact the quotes are exactly the edges, so in my regex I capture whatever it is between them.

replacing all open tags with a string

Before somebody points me to that question, I know that one can't parse html with regex :) And this is not what I am trying to do.
What I need is:
Input: a string containing html.
Output: replace all opening tags
***<tag>
So if I get
<a><b><c></a></b></c>, I want
***<a>***<b>***<c></a></b></c>
as output.
I've tried something like:
(<[~/].+>)
and replace it with
***$1
But doesn't really seem to work the way I want it to. Any pointers?
Clarification: it's guaranteed that there are no self closing tags nor comments in the input.
You just have two problems: ^ is the character to exclude items from a character class, not ~; and the .+ is greedy, so will match as many characters as possible before the final >. Change it to:
(<[^/].+?>)
You can also probably drop the parentheses and replace with $0 or $&, depending on the language.
Try using: (<[^/].*?>) and replace it with ***$1

Regular Expression Troubles

Given the following type of string:
"#First Thing# #Another One##No Space# Main String #After Main# #EndString#"
I would like to come up with a regular expression that can return all the text surrounded by the # symbols as matches. One of the things giving me grief is the fact that the # symbol is both the opening and closing delimiter. All of my attempts at a regex have just returned the entire string. The other issue is that it is possible for part of the string to not be surrounded by # symbols, as shown by the substring "Main String" above. Does anyone have any ideas? I have toyed around with Negative Look-behind assertion a bit, but haven't been able to get it to work. There may or may not be a space in between the groups of #'s but I want to ignore them (not match against them) if there are. The other option would be to just write a string parser routine, which would be fairly easy, but I would prefer to use a regex if possible.
/((#[^#]+#)|([^#]+))/
Perhaps something like the above will match what you want.
This will match the space in between two hashes. Hmm.
/((#[^#]+#)|([^#]*[^#\s]+[^#]*))/
That will get rid of the nasty space, I think.
[Edit]
I think that this is what you need:
(?<=#)[^#]+?(?=#)
With input #First Thing# #Another One##No Space# Main String #After Main# matches:
First Thing
Another One
No Space
Main String
After Main
The second match is the space between Thing# and #Another.
[EDIT] To ignore space:
(?<=)(?!\s+)[^#]+?(?=#)
If you want to ignore trailing spaces:
(?<=)(?!\s+)[^#]+?(?=\s*#)
Try this. The first and last groups should not be captured and the .*? should be lazy
(?:#)(.*?)(?:#)
I think this is what you really need:
((#[^#]+#)|([^#]*[^#\s]+[^#]*))
but it will not capture the #'s around Main String

Extract querystring value from url using regex

I need to pull a variable out of a URL or get an empty string if that variable is not present.
Pseudo code:
String foo = "http://abcdefg.hij.klmnop.com/a/b/c.file?foo=123&zoo=panda";
String bar = "http://abcdefg.hij.klmnop.com/a/b/c.file";
when I run my regex I want to get 123 in the first case and empty string in the second.
I'm trying this as my replace .*?foo=(.*?)&?.*
replacing this with $1 but that's not working when foo= isn't present.
I can't just do a match, it has to be a replace.
You can try this:
[^?]+(?:\?foo=([^&]+).*)?
If there are parameters and the first parameter is named "foo", its value will be captured in group #1. If there are no parameters the regex will still succeed, but I can't predict what will happen when you access the capturing group. Some possibilities:
it will contain an empty string
it will contain a null reference, which will be automatically converted to
an empty string
the word "null"
your app will throw an exception because group #1 didn't participate in the match.
This regex matches the sample strings you provided, but it won't work if there's a parameter list that doesn't include "foo", or if "foo" is not the first parameter. Those options can be accommodated too, assuming the capturing group thing works.
I think you need to do a match, then a regex. That way you can extract the value if it is present, and replace it with "" if it is not. Something like this:
if(foo.match("\\?foo=([^&]+)")){
String bar = foo.replace("\\?foo=([^&]+)", $1);
}else{
String bar = "";
}
I haven't tested the regex, so I don't know if it will work.
In perl you could use this:
s/[^?*]*\??(foo=)?([\d]*).*/$2/
This will get everything up to the ? to start, and then isolate the foo, grab the numbers in a group and let the rest fall where they may.
There's an important rule when using regular expressions : don't try to put unnecessary processing into it. Sometimes things can't be done only by using one regular expression. Sometimes it is more advisable to use the host programming language.
Marius' answer makes use of this rule : rather than finding a convoluted way of replacing-something-only-if-it-exists, it is better to use your programming language to check for the pattern's presence, and replace only if necessary.

Need regexp to find substring between two tokens

I suspect this has already been answered somewhere, but I can't find it, so...
I need to extract a string from between two tokens in a larger string, in which the second token will probably appear again meaning... (pseudo code...)
myString = "A=abc;B=def_3%^123+-;C=123;" ;
myB = getInnerString(myString, "B=", ";" ) ;
method getInnerString(inStr, startToken, endToken){
return inStr.replace( EXPRESSION, "$1");
}
so, when I run this using expression ".+B=(.+);.+"
I get "def_3%^123+-;C=123;" presumably because it just looks for the LAST instance of ';' in the string, rather than stopping at the first one it comes to.
I've tried using (?=) in search of that first ';' but it gives me the same result.
I can't seem to find a regExp reference that explains how one can specify the "NEXT" token rather than the one at the end.
any and all help greatly appreciated.
Similar question on SO:
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Replace patterns that are inside delimiters using a regular expression call
RegEx matching HTML tags and extracting text
You're using a greedy pattern by not specifying the ? in it. Try this:
".+B=(.+?);.+"
Try this:
B=([^;]+);
This matches everything between B= and ; unless it is a ;. So it matches everything between B= and the first ; thereafter.
(This is a continuation of the conversation from the comments to Evan's answer.)
Here's what happens when your (corrected) regex is applied: First, the .+ matches the whole string. Then it backtracks, giving up most of the characters it just matched until it gets to the point where the B= can match. Then the (.+?) matches (and captures) everything it sees until the next part, the semicolon, can match. Then the final .+ gobbles up the remaining characters.
All you're really interested in is the "B=" and the ";" and whatever's between them, so why match the rest of the string? The only reason you have to do that is so you can replace the whole string with the contents of the capturing group. But why bother doing that if you can access contents of the group directly? Here's a demonstration (in Java, because I can't tell what language you're using):
String s = "A=abc;B=def_3%^123+-;C=123;";
Pattern p = Pattern.compile("B=(.*?);");
Matcher m = p.matcher(s);
if (m.find())
{
System.out.println(m.group(1));
}
Why do a 'replace' when a 'find' is so much more straightforward? Probably because your API makes it easier; that's why we do it in Java. Java has several regex-oriented convenience methods in its String class: replaceAll(), replaceFirst(), split(), and matches() (which returns true iff the regex matches the whole string), but not find(). And there's no convenience method for accessing capturing groups, either. We can't match the elegance of Perl one-liners like this:
print $1 if 'A=abc;B=def_3%^123+-;C=123;' =~ /B=(.*?);/;
...so we content ourselves with hacks like this:
System.out.println("A=abc;B=def_3%^123+-;C=123;"
.replaceFirst(".+B=(.*?);.+", "$1"));
Just to be clear, I'm not saying not to use these hacks, or that there's anything wrong with Evan's answer--there isn't. I just think we should understand why we use them, and what trade-offs we're making when we do.