I have been trying to get this to work and I am nearly there but can quite get the last match. This is the regex im using:
^`.*` (.*?)(\(.*?\))?\s
These are some examples of the patterns I'm trying to match
1.`asgKey` tinyblob
2.`is_asg` bit(1) DEFAULT NULL
3.`lastModified` datetime DEFAULT NULL
This regex will match 2 and 3 but not 1. I have tried adding ? and * to the space char but it then doesnt match anything. I think I am misunderstanding the matching groups
(.*?) - match any number of characters
(\(.*?\))? - if there are brackets match anything inside them else ignore
\s - space character
group 1 is the string group 2 is the contents of the brackets if they exist
You're matching them one at a time, right? Then what's the \s meant to match for #1?
`asgKey` tinyblob
^ ^ ^^ ^
| | || |
` .* ` (.*?)
There's nothing left, so \s can't match. Maybe you want (?:\s|$) to match a space or EOL.
That said, consider using (\S+) instead of (.*?), as it'll only match non-spaces, and thus will do the same thing, but faster.
Related
How can I get only the middle part of a combined name with PCRE regex?
name: 211103_TV_storyname_TYPE
result: storyname
I have used this single line: .(\d)+.(_TV_) to remove the first part: 211103_TV_
Another idea is to use (_TYPE)$ but the problem is that I don´t have in all variations of names a space to declare a second word to use the ^ for the first word and $ for the second.
The variation of the combined name is fix for _TYPE and the TV.
The numbers are changing according to the date. And the storyname is variable.
Any ideas?
Thanks
With your shown samples, please try following regex, this creates one capturing group which contains matched values in it.
.*?_TV_([^_]*)(?=_TYPE)
OR(adding a small variation of above solution with fourth bird's nice suggestion), following is without lazy match .*? unlike above:
_TV_([^_]*)(?=_TYPE)
Here is the Online demo for above regex
Explanation: Adding detailed explanation for above.
.*?_ ##Using Lazy match to match till 1st occurrence of _ here.
TV_ ##Matching TV_ here.
([^_]*) ##Creating 1st capturing group which has everything before next occurrence of _ here.
(?=_TYPE) ##Making sure previous values are followed by _TYPE here.
You could match as least as possible chars after _TV_ until you match _TYPE
\d_TV_\K.*?(?=_TYPE)
\d_TV_ Match a digit and _TV_
\K Forget what is matched until now
.*? Match as least as possible characters
(?=_TYPE) Assert _TYPE to the right
Regex demo
Another option without a non greedy quantifier, and leaving out the digit at the start:
_TV_\K[^_]*+(?>_(?!TYPE)[^_]*)*(?=_TYPE)
_TV_ Match literally
\K[^_]*+ Forget what is matched until now and optionally match any char except _
(?>_(?!TYPE)[^_]*)* Only allow matching _ when not directly followed by TYPE
(?=_TYPE) Assert _TYPE to the right
Regex demo
Edit
If you want to replace the 2 parts, you can use an alternation and replace with an empty string.
If it should be at the start and the end of the string, you can prepend ^ and append $ to the pattern.
\b\d{6}_TV_|_TYPE\b
\b\d{6}_TV_ A word boundary, match 6 digits and _TV_
| Or
_TYPE\b Match _TYPE followed by a word boundary
Regex demo
Here i put some additional Screenshots to the post. With the Documentation that appears on the help button. And you see the forms and what i see.
Documentation
The regular expressions we use are based on PCRE - Perl Compatible Regular Expressions. Full specification can be found here: http://www.pcere.org and http://perldoc.perl.org/perlre.html
Summary of some useful terms:
Metacharacters
\ Quote the next metacharacter
^ Match the beginning of the line
. Match any character (except newline)
$ Match the end of the line (or before newline at the end)
| Alternation
() Grouping
[] Character class
Quantifiers
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
Charcter Classes
\w Match a "word" character (alphanumeric plus mao}
\W Match a non-"word" character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
Capture buffers
The bracketing construct (...) creates capture buffers. To refer to
Within the same pattern, use \1 for the first, \2 for the second, and so on. Outside the match use "$" instead of "". The \ notation works in certain circumstances outside the match. See the warning below about \1 vs $1 for details.
Referring back to another part of the match is called a backreference.
Examples
Replace story with certain prefix letters M N or E to have the prefix "AA":
`srcPattern "(M|N|E ) ([A-Za-z0-9\s]*)"`
`trgPattern "AA$2" `
`"N StoryWord1 StoryWord2" -> "AA StoryWord1 StoryWord2"`
`"E StoryWord1 StoryWord2" -> "AA StoryWord1 StoryWord2"`
`"M StoryWord1 StoryWord2" -> "AA StoryWord1 StoryWord2"`
"NoMatchWord StoryWord1 StoryWord2" -> "NoMatchWord StoryWord1 StoryWord2" (no match found, name remains the same)
I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.
I'm trying to take a query parameter and verify if the syntax provided by the user is correct. Regex seems like the best choice for this, but I'm having trouble making it so the pattern doesn't allow for repeating itself.
The pattern I came up with is:
(^(\w+)(=|!=|>=|>|<=|<|~)((')(.*)('))(\s(AND|OR)\s)(\w+)(=|!=|>=|>|<=|<|~)((')(.*)('))$)
The syntax provided by the user should to be:
[field][predicate][single quote][value][single quote][white space][logical operator][white space][field][predicate][single quote][value][single quote]
Where:
field is [any word]
predicate is [= | != | >= | > | <= | < | ~]
logical operator is [AND | OR (with a space on both sides)]
value is [any word wrapped by single quotes]
An example looks like this: field1='value1' OR field2='value2'
The problem I am having is that the pattern I created allows for things like this:
field1='value1' OR field2='value2field1='value' OR field2='value2'' [This shouldn't work but does]
field1='value1' OR field2='value2 field1='value' OR field2='value2'' [This shouldn't work but does]
field1='value1' OR field2='value2' AND field3='value3' OR field4='value4'' [This shouldn't work but does]
Any help would be appreciated making it so the pattern doesn't match if it repeats.
You might use:
^\w+(?:<=|=>|!=|[~<>=])'\w+'(?: (?:OR|AND) \w+(?:<=|=>|!=|[~<>=])'\w+')*$
^ Start of string
\w+ Match 1 or more word chars
(?: Non capture group
<=|=>|!=|[~<>=] Match one of the alternatives
) Close group
\w+ Match 1 or more word chars between single quotes
(?: Non capture group
(?:OR|AND) \w+ Match space, either AND or OR and 1+ word chars
(?:<=|=>|!=|[~<>=]) Match one of the alternatives
\w+ Match 1 or more word chars between single quotes
)* Close group and repeat 0+ times to also match without AND or OR
$ End of string
If there should be at least a single AND or OR the quantifier of the last group could be + instead of *
The single chars in the predicate could be added to a character class [~<>=] to take out a few alternations.
Regex demo
I would like to match text between two strings, although the last string/character might not aways be available.
String1: 'www.mywebsite.com/search/keyword=toys'
String2: 'www.mywebsite.com/search/keyword=toys&lnk=hp1'
Here I want to match the value in keyword= that is 'toys' and I am using
(?<=keyword=)(.*)(?=&|$)
Works for String1 but for String2 it matches everything after '&'
What am I doing wrong?
.* is greedy. It takes everything it can, therefore stops at the end of the string ($) and not at the & character.
Change it to its non-greedy version - .*?
with t as
(
select explode
(
array
(
'www.mywebsite.com/search/keyword=toys'
,'www.mywebsite.com/search/keyword=toys&lnk=hp1'
)
) as (val)
)
select regexp_extract(val,'(?<=keyword=)(.*?)(?=&|$)',0)
from t
;
+------+
| toys |
+------+
| toys |
+------+
You do not need to bother with greediness when you need to match zero or more occurrences of any characters but a specific character (or set of characters). All you need is to get rid of the lookahead and the dot pattern and use [^&]* (or, if the value you expect should not be an empty string, [^&]+):
(?<=keyword=)[^&]+
Code:
select regexp_extract(val,'(?<=keyword=)[^&]+', 0) from t
See the regex demo
Note you do not even need a capturing group since the 0 argument instructs regexp_extract to retrieve the value of the whole match.
Pattern details
(?<=keyword=) - a positive lookbehind that matches a location that is immediately preceded with keyword=
[^&]+ - any 1+ chars other than & (if you use * instead of +, it will match 0 or more occurrences).
I have the following string
abc|ghy|33d
The regex below matches it fine
^([\d\w]{3}[|]{1})+[\d\w]{3}$
The string changes but the characters separated by the pipe are always in 3's ... so we can have
krr|455
we can also have
ddc
Here's where the problem happens: The regex explained above doesn't match the string if there is only one set of letters ... i.e. "dcc"
Let's do this step by step.
Your regex :
^([\d\w]{3}[|]{1})+[\d\w]{3}$
We can already see some changes. [|]{1} is equivalent to \|.
Then, we see that you match the first part (aaa|) at least once (the + operator matches once at least). Also, \w matches numbers.
The * operator matches 0 or more. So :
^(?:\w{3}\|)*\w{3}$
works.
See here.
Explanation
^ Matches beggining of string
(?:something)* matches something zero time or more. the group is non-capturing as you won't need to
\w{3} matches 3 alphanumeric characters
\| matches |
$ matches end of string.
^[\d\w]{3}(?:[|][\d\w]{3}){0,2}$
You simply quantify the variable part.See demo.
https://regex101.com/r/tS1hW2/18
You can modify your regex as below:
^([\d\w]{3})(\|[\d\w]{3})*$
here first match 3 alphaNumeric and then alphaNum with | as prefix.
Demo
Your description is a little awkward, but I'm guessing you want to be able to match
abc
abc|def
abc|def|ghi
You can do that with
/^\w{3}(?:\|\w{3}){0,2}$/
Visualization
Explanation
^ — match beginning of string
\w{3} — match any 3 of [A-Za-z0-9_]
(? ... )? — non-capturing group, 0 or 1 matches
\| — literal | character
$ — end of string
If the goal is to match any amount of 3-letter segments, you can use
/^(?:\w{3}(?:\||$))+$/