Regular expression and extracting a value

Regular expression and extracting a value - regex

I need to get hold oif the |value| below:
{"token":"<input name=\"__RequestVerificationToken\" type=\"hidden\" value=\"KhWUxVIL697p18Gm3T1b4pCmXjK7iQujsJieYiLOKcKmKbdvC55kgaqg4G-uGqeUzmV3x6EMAV_ejPHe-Ok2kFqnjzVmvZmHySMpwKzGvq01\" />"}
What kind of regular expression would match that?
I have tried to us this:
.check(regex("input name='__RequestVerificationToken' type='hidden' value='([A-Za-z0-9+=/'-'_]+?)'").saveAs("token")))
But it does not match.
Also using a regex tester does not get me anywhere, please help me.

I would use something like this:
regex("<input.+__RequestVerificationToken.+value=\\?(\"|\')(.+)\\?(\"|\').+>")
It can be made shorter, but I was not sure how actual example string looks (does it have escape chars at once, does it use single or double qoutes).

assuming that the string in your question is exactly the way it appears, with escaped double quotes \" etc.
here is the code:
val regexGroupExtractor = """.*value=\\"(.*)\\".*""".r
val regexGroupExtractor(e) = s
// e == "KhWUxVIL697p18Gm3T1b4pCmXjK7iQujsJieYiLOKcKmKbdvC55kgaqg4G-uGqeUzmV3x6EMAV_ejPHe-Ok2kFqnjzVmvZmHySMpwKzGvq01"
In general with regex it is often helpful to think of the pattern in reverse: instead of specifying what is included, specify what is not. In your case there is no need to specify which characters are "in" inside the (), instead focus on where the part you want starts and ends. Specifically in your example - quotes are outside the string you want, in fact the quotes are exactly the edges, so in my regex I capture whatever it is between them.

Related

Regex: If brackets, then set to empty, otherwise pass through

I have a string that may or may not have brackets in it. If there are brackets in the string, then I want to set it to an empty string (or a null value). Otherwise, I want to do *.?.
This is what I have so far, but it's definitely not right:
(?(?=[\[^\]])("")|(.*?)).
One example string is: abcdef[12345].
I don't want to match on something like: mousecat.
What would be the best way to do this operation? There is a Grok template with regex statements that gets executed within a Java program.
Thanks!
Keren
Edit: How to do this if I have numbers within the brackets? Rather than just anything within the brackets.

The pattern:
^.*[\[\]].*$|^([^\[\]]*)$
The replacement:
$1

try this
var str = "abcdef[xxxx]";
console.log(/\[.+\]/i.exec(str))

Regex - find all instances of words that begin with # but do not contain 'administrator'

I am having a hard time getting my head around this regex. What I am trying to do is as follows:
Match any occurrence of words that begin with #. So, for example, if the code finds the following tags #jon, #james, #jill, then it should hide the text.
But if the code finds occurrences of the following tag: #ADMINISTRATOR, then it should display the text
In addition, if the code finds no occurrences of any words tagged with #, it should also display the text.
Essentially, I want to hide any comments that are hashed tagged with a user name other than ADMINISTRATOR.
So far, I have the following code:
if (mb_ereg_match(".*(#[^ADMINISTRATOR]){1,}.*", $comment))
{
$hideComment = true;
}else
{
$hideComment = false;
}
The above code works for the most part, except for when the text being searched contains any one of the following:
#A, #AD, #ADM, #ADMI, #ADMIN, etc.
then the code does not hide the comment, which is not what I want. I only want an exact match to '#ADMINISTRATOR' to display the comments. Plus, any comment that contains no tags should also be displayed.
Any idea what I am doing wrong?

This is a negative lookahead based regex that will work for you:
(?i)#(?!ADMINISTRATOR)\w+
Here is a Live Demo

I've not used whatever program you're using to write your regex, but the syntax in general isn't doing what you think it is. When you use a set of [], you are saying that what lies within is a class of characters. Your regular expression states I'm looking for something that follows a #, but that something doesn't begin with an A, or any of the following characters.
What you want to use is another grouping. You can use () instead of [] to represent a specific group of characters. However, as you may notice, () is also what you use to capture part of your regex. Thus, you'll want to use a non-matching group. In python, non-matching groups look like this: (?:ADMINISTRATOR)
All put together, your regex might look something like this in python:
mb_ereg_match("(#.*(?!ADMINISTRATOR))\w ",$COMMENT)

An interval in a regex will always match a single character, whether negated or not. [ADMINISTRATOR] will match either an A, D, M and so forth. [^ADMINISTRATOR] will match anything that is not an A, D, M, etc.
If you want a regex that does not have a given string, I'd suggest using a negative lookahead instead, as anubhava suggested.

Matching single or double quoted strings in Vim

I am having a hard time trying to match single or double quoted strings with Vim's
regular expression engine.
The problem is that I am assigning the regular expression to a variable and then using that
to play with matchlist.
For example, let's assume I know I am in a line that contains a quoted string and I want to match it:
let regex = '\v"(.*)"'
That would work to match anything that is double-quoted. Similarly, this would match single quoted strings:
let regex = "\v'(.*)'"
But If I try to use them both, like:
let regex = '\v['|"](.*)['|"]'
or
let regex = '\v[\'|\"](.*)[\'|\"]'
Then Vim doesn't know how to deal with it because it thinks that some quotes are not being closed in the actual variable definition and it messes up the regular expression.
What would be the best way to catch single or double quoted strings with a regular expression?
Maybe (probably!) I am missing something really simple to be able to use both quotes and not worry about the surrounding quotes for the actual regular expression.
Note that I prefer single quotes for regular expression because that way I do not need to double-backslash for escaping.

You need to use back references. Like so:
let regex = '\([''"]\)\(.\{-}\)\1'
Or with very-magic
let regex = '\v([''"])(.{-})\1'
Alternatively you could use (as it will not mess with your sub-matches):
let regex = '\%("\([^"]*\)"\|''\([^'']*\)''\)'
or with very magic:
let regex = '\v%("([^"]*)"|''([^'']*)'')'

look at this post
Replacing quote marks around strings in Vim?
might help in some way

This is a workable script I write for syntax the quoted strings.
syntax region myString start=/\v"/ skip=/\v(\\[\\"]){-1}/ end=/\v"/
syntax region myString start=/\v'/ end=/\v'/
You may use \v(\\[\\"]){-1} to skip something.

Extract querystring value from url using regex

I need to pull a variable out of a URL or get an empty string if that variable is not present.
Pseudo code:
String foo = "http://abcdefg.hij.klmnop.com/a/b/c.file?foo=123&zoo=panda";
String bar = "http://abcdefg.hij.klmnop.com/a/b/c.file";
when I run my regex I want to get 123 in the first case and empty string in the second.
I'm trying this as my replace .*?foo=(.*?)&?.*
replacing this with $1 but that's not working when foo= isn't present.
I can't just do a match, it has to be a replace.

You can try this:
[^?]+(?:\?foo=([^&]+).*)?
If there are parameters and the first parameter is named "foo", its value will be captured in group #1. If there are no parameters the regex will still succeed, but I can't predict what will happen when you access the capturing group. Some possibilities:
it will contain an empty string
it will contain a null reference, which will be automatically converted to
an empty string
the word "null"
your app will throw an exception because group #1 didn't participate in the match.
This regex matches the sample strings you provided, but it won't work if there's a parameter list that doesn't include "foo", or if "foo" is not the first parameter. Those options can be accommodated too, assuming the capturing group thing works.

I think you need to do a match, then a regex. That way you can extract the value if it is present, and replace it with "" if it is not. Something like this:
if(foo.match("\\?foo=([^&]+)")){
String bar = foo.replace("\\?foo=([^&]+)", $1);
}else{
String bar = "";
}
I haven't tested the regex, so I don't know if it will work.

In perl you could use this:
s/[^?*]*\??(foo=)?([\d]*).*/$2/
This will get everything up to the ? to start, and then isolate the foo, grab the numbers in a group and let the rest fall where they may.

There's an important rule when using regular expressions : don't try to put unnecessary processing into it. Sometimes things can't be done only by using one regular expression. Sometimes it is more advisable to use the host programming language.
Marius' answer makes use of this rule : rather than finding a convoluted way of replacing-something-only-if-it-exists, it is better to use your programming language to check for the pattern's presence, and replace only if necessary.

Need regexp to find substring between two tokens

I suspect this has already been answered somewhere, but I can't find it, so...
I need to extract a string from between two tokens in a larger string, in which the second token will probably appear again meaning... (pseudo code...)
myString = "A=abc;B=def_3%^123+-;C=123;" ;
myB = getInnerString(myString, "B=", ";" ) ;
method getInnerString(inStr, startToken, endToken){
return inStr.replace( EXPRESSION, "$1");
}
so, when I run this using expression ".+B=(.+);.+"
I get "def_3%^123+-;C=123;" presumably because it just looks for the LAST instance of ';' in the string, rather than stopping at the first one it comes to.
I've tried using (?=) in search of that first ';' but it gives me the same result.
I can't seem to find a regExp reference that explains how one can specify the "NEXT" token rather than the one at the end.
any and all help greatly appreciated.
Similar question on SO:
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Replace patterns that are inside delimiters using a regular expression call
RegEx matching HTML tags and extracting text

You're using a greedy pattern by not specifying the ? in it. Try this:
".+B=(.+?);.+"

Try this:
B=([^;]+);
This matches everything between B= and ; unless it is a ;. So it matches everything between B= and the first ; thereafter.

(This is a continuation of the conversation from the comments to Evan's answer.)
Here's what happens when your (corrected) regex is applied: First, the .+ matches the whole string. Then it backtracks, giving up most of the characters it just matched until it gets to the point where the B= can match. Then the (.+?) matches (and captures) everything it sees until the next part, the semicolon, can match. Then the final .+ gobbles up the remaining characters.
All you're really interested in is the "B=" and the ";" and whatever's between them, so why match the rest of the string? The only reason you have to do that is so you can replace the whole string with the contents of the capturing group. But why bother doing that if you can access contents of the group directly? Here's a demonstration (in Java, because I can't tell what language you're using):
String s = "A=abc;B=def_3%^123+-;C=123;";
Pattern p = Pattern.compile("B=(.*?);");
Matcher m = p.matcher(s);
if (m.find())
{
System.out.println(m.group(1));
}
Why do a 'replace' when a 'find' is so much more straightforward? Probably because your API makes it easier; that's why we do it in Java. Java has several regex-oriented convenience methods in its String class: replaceAll(), replaceFirst(), split(), and matches() (which returns true iff the regex matches the whole string), but not find(). And there's no convenience method for accessing capturing groups, either. We can't match the elegance of Perl one-liners like this:
print $1 if 'A=abc;B=def_3%^123+-;C=123;' =~ /B=(.*?);/;
...so we content ourselves with hacks like this:
System.out.println("A=abc;B=def_3%^123+-;C=123;"
.replaceFirst(".+B=(.*?);.+", "$1"));
Just to be clear, I'm not saying not to use these hacks, or that there's anything wrong with Evan's answer--there isn't. I just think we should understand why we use them, and what trade-offs we're making when we do.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression and extracting a value - regex

I would use something like this: regex("<input.+__RequestVerificationToken.+value=\\?(\"|\')(.+)\\?(\"|\').+>") It can be made shorter, but I was not sure how actual example string looks (does it have escape chars at once, does it use single or double qoutes).

Related

Regex: If brackets, then set to empty, otherwise pass through

Regex - find all instances of words that begin with # but do not contain 'administrator'

Matching single or double quoted strings in Vim

Extract querystring value from url using regex

Need regexp to find substring between two tokens

Categories

Resources