I have a string in this shape
State#Received#ID#e23d8926-1327-4fde-9ea7-d364af3325e0
I want to extract the State value via RegEx. So in this above example I only want to extract Received
I have tried the following ([^State#])([A-Za-z]) which matches Received but I am stuck at excluding the rest of the string #ID#e23d8926-1327-4fde-9ea7-d364af3325e0
You should not use a parenthesis for the group you don't want to capture. My solution is that:
State#(?'state'[^#]+)#
Sample: https://regex101.com/r/vAr65j/1
Related
I am using a data analysis package that exposes a Regex function for string parsing. I am trying to parse a response from a website that is in the format...
key1=val1&key2=val2&key3=val3 ...
[There is the possibility that the keys and values may be percent encoded, but the current return values are not, the current return values are tokens and other info that are alphanumeric].
I understand this data to be www-form-urlencoded, or alternatively it might be known as query string format.
The object is to extract the value for a given key, if the order of the keys cannot be relied upon. For example, I might know that one of the keys I should receive is "token", so what regex pattern can I use to extract the value for the key "token"? I have searched for this but cannot find anything that does what I need, but if there is a duplicate question, apologies in advance.
In Alteryx, you may use Tokenize with a regex containing a capturing group around the part you need to extract:
The Tokenize Method allows you to specify a regular expression to match on and that part of the string is parsed into separate columns (or rows). When using the Tokenize method, you want to match to the whole token, and if you have a marked group, only that part is returned.
I bolded the part of the method description that proves that if there is a capturing group, only this part will be returned rather than the whole match.
Thus, you may use
(?:^|[?&])token=([^&]*)
where instead of token you may use any of the keys the value for which you want to extract.
See the regex demo.
Details
(?:^|[?&]) - the start of a string, ? or & (if the string is just a plain key-value pair string, you may omit ? and use (?:^|&) or (?<![^&]))
token - the key
= - an equal sign
([^&]*) - Group 1 (this will get extracted): 0 or more chars other than & (if you do not want to extract empty values, replace * with + quantifier).
I am capturing the session id from the string. I want to add word(prefix) before the extracted session id.
Sample input: key=this is sample input; MySessionId=hhjsfd436763jhjhfdjs87787.hghht77f54; key7=jhu8787; type=raw; oldkey=jkjf8787;
I have formed the below regex to capture the MySessionId.
MySessionId=([^.]*)
I want to add a word before the extracted string like below.
Expected output:
ABCD-1234-hhjsfd436763jhjhfdjs87787
Any way to achieve this through Regular expression?
It really depends what language you're using, you'll need to find a function that replaces text in a string (usually it's called replace). It looks like you're dealing with cookies so I'll show you an example in javascript:
//$1 refers to the first group captured by the regex
//i think other languages use $1 too but you should probably check
string = string.replace(/MySessionId([^.]*)/, "ABCD-1234-$1")
For example, if a value matches "an email pattern", then remove the key also associated to it from the url.
If url is
https://stackoverflow.com/?key1=test&key2=test#gmail.com&key3=something&key4=another#email.com,
then remove key2=test#gmail.com and key4=another#email.com, so that the new url will be
https://stackoverflow.com/?key1=test&key3=something
Here, key names are not fixed and they can be anything. and also the position of the keys is not fixed.
So, want a regex to get the entire string which does not contain those key value pairs. I tried to generate the regex to match the unwanted key value pairs, but could get the rest of the string which does not match the regex.
I did it using a java program. But looking at a regex so that I can apply in the xml and avoid a java program
This is mainly to use in urlrewritefilter (tuckey) and want to remove certain query strings matching a regex.
Here is a simple solution in java (I saw your question is tagged as java). This is basically pattern that matches ? or & followed by a word then a = and then an email. You can substitute that part [.\w]+#[\w]+\.\w+ with a better email regex. Finding email with regex can be tricky with stranger emails but this would be the basic idea.
public class HelloWorld{
public static void main(String []args){
String url="https://stackoverflow.com/?key1=test&key2=test#gmail.com&key3=something&key4=another#email.com";
System.out.println(url.replaceAll("[?&]\\w+=[.\\w]+#[\\w]+\\.\\w+",""));
}
}
I am very new in grok syntax. I have lines:
/app-name/version/code_suffix/sync
for example:
/my-app/v1/O03_ABCD/sync
/my-app/v1/O04/sync
and I need to parse code which always consist from 3 characters. I tried something using:
http://grokconstructor.appspot.com/do/match
but with no success
This regex will match each part of your format and put it in a named capturing group :
/(?<appName>[^/]*)/(?<version>[^/]*)/(?<code>[^\W_]{3})(?:_(?<suffix>[^/]*))?/sync
You can try it here, and it also works on grokConstructor.
Inside openRefine I want to run the below regex on a website's source that finds email addresses with a mailto link. My trouble is when running value.match, I get this error:
Parsing error at offset 12: Bad regular expression (Unclosed character class near index 10 .*mailto:[^ ^)
I have tested the expression in another environment without value.match and it works.
value.match(/.*mailto:[^/"/']*.com.*/)
So if you have text like:
Blah blah mail me
To extract the email address using the match function in OpenRefine you need to use:
value.match(/.*mailto:([^\"\']*.com).*/)
This will give an array containing the email address, which is captured using a capture group. To extract the email address from the array (which is necessary if you want to store the mail address in an OpenRefine cell) you need to get the string value from the array. e.g.:
value.match(/.*mailto:([^\"\']*.com).*/)[0]
The difference between your original expression and this one is that the characters are escaped correctly and there is a capture group - basically implementing the advice from #LukStorms in the comments above.
isNotNull(value.match(/.*mailto:[^\"\']*.com.*/))
as described on our Reference page for the match() function, it return an array of capture groups in your RegEx pattern and then isNotNull() outputs True or False if your value is like that pattern:
https://github.com/OpenRefine/OpenRefine/wiki/GREL-String-Functions#matchstring-s-regexp-p
also described here: https://github.com/OpenRefine/OpenRefine/wiki/Understanding-Regular-Expressions#basic-examples
You can also use get() as described here in Recipes on our wiki, BUT will only work well if you have only 1 email address per cell (its because the get() function without number from or to, makes assumptions and uses the length of the array to determine the last element and pushes out only the last element, not the first, or third, etc.):
https://github.com/OpenRefine/OpenRefine/wiki/Recipes#find-a-sub-pattern-that-exists-at-the-end-of-a-string
For example:
get(value.match(/.*(mailto:[^\"\']*.com).*/),0)