Regex: match a non nested code block - regex

I am currently writing a small texteditor. With this texteditor users are able to create small scripts for a very simple scripting engine.
For a better overview I want to highlight codeblocks with the same command like GoTo(x,y) or Draw(x,y).
To achieve this I want to use Regular Expresions (I am already using it to highlight other things like variables)
Here is my Expression (I know it's very ugly):
/(?<!GoTo|Draw|Example)(^(?:GoTo|Draw|Example)\(.+\)*?$)+(?!GoTo|Draw|Example)/gm
The "logic":
(?< !GoTo|Draw|Example) : Negative Lookbehind. No GoTo/Draw/Example command in the line before (inserted a space to avoid rendering problems)
(^(?:GoTo|Draw|Example)(.+)*?$) now macth GoTo/Draw/Example() until line end (even match a comment)
"+" find last pattern min. one times
UNTIL in the next line does not contain GoTo/Draw/Example (negative lookahead)
(for testing at regex101.com, finaly I need this for vb.net)
It matches the following:
-Code- -result- execpted result
GoTo(5656) -> MATCH 1 -> MATCH 1
sdsd
GoTo(sdsd) --comment -> MATCH 2 -> MATCH 2
GoTo(23329); -> MATCH 3 -> MATCH 2
Test()
GoTo(12) -> MATCH 4 -> MATCH 3
LALA
Draw(23) -> MATCH 5 -> MATCH 4
Draw(24) -> MATCH 6 -> MATCH 4
Draw(25) -> MATCH 7 -> MATCH 4
But what I want to achieve is, that the complete "blocks" of the same command are matched. In this case Match 2 & 4 and Match 5 & 6 & 7 should be one match.
Image: Group example:
Tested with http://regex101.com/, the programming lanuage is vb.net.

The expression you are looking for is
((?:^GoTo.+\n?)+)|((?:Draw.+\n?)+)
You can see this at work here
The key here is the use of \n to match "everything including the end of the line" so you can match across multiple lines. Also note the use of ?: to get non-capturing inner groups (which are repeated) so we don't end up with inner and outer matches (we only want "the whole block" matched). Finally, the | separates an entire block of Draw from an entire block of GoTo. Obviously if you have other keywords, you can repeat for those as well.

Related

REGEX to find match with any whitespace plus a special character plus a single whitespace plus anything with exceptions:

Intro:
I am looking to make a code hinter in Javascript for Vue i18n localizations.
Details:
Using Node readline to read line by line through a Vue file I want to find one pattern using REGEX (of the many patterns I am looking for) which story-wise is as follows:
For a single string,
find any amount of whitespace (spaces or indents)
PLUS
exactly one closing parenthesis
PLUS
exactly one space (for now, this might change)
PLUS (tricky part, bare with me)
any amount of characters, numbers, or special characters except {{$t('[anything here]'}} or {{ $t('[anything here]' }} or if there is nothing after the closing parenthesis altogether this line would fail to match the pattern.
1 | )
2 | )
3 | ) {
4 | ) {{
5 | ) Cancel
6 | ) .[];\`\'.;l][
7 | ) {{ $t('common.cancel') }}
8 | ) {{$t('common.cancel')}}
Lines 1-2 and lines 7-8 should not match. Only lines 3-6 should match.
Attempted Solution:
So far my REGEX pattern is this:
\s+\)\s{1}(.*) which does not match Lines 1 and 2 (good thing) because of the lack of a single whitespace after the closing parenthesis.
Problem:
It allows Lines 7 and 8 to pass. I can't figure out how to say anything is allowed BUT the three exception scenarios mentioned in the story of what I am trying to achieve.
My brain now:
Thinking baby steps, I want to negate a { after the single whitespace portion. If I try \s+\)\s{1}(.*)[^\{], the not block would negate any of the lines with an opening curly bracket from passing the match. But that's not the case because I am assuming the (.*) portion renders the negate block useless. Can't seem to even make this baby step. Please help.
Following your requirements, I came up with this pattern:
^\s+\)\s(?!.*{{ ?\$t\('[^']*'\) ?}}).+$
It's using the common 'Everything but..' approach (here & here)
with the extra bits added in the front and at the end.
Online Test

Regex number pipe

I got this regular expression:
192\.168\.[1|2|5|20]\.[0-9]{1,3}
192.168.2.123 -> OK
192.168.5.123 -> OK
192.168.20.123 -> Error
I want to accept just value: 1 - 2 - 5 - 20 on X --> 192.168.X.122
(the rest of regular expression is correct, i just got the problem when i try to get value 20)
I can't reproduce your observations, but here is a pattern which should meet your requirements:
192\.168\.(?:1|2|5|20)\.122
Demo
It looks like you were confounding character classes, which are characters inside square brackets, with an alternation, which are different patterns of text, one of which needs to match.
This
[1|2|5|20]
actually says to match the numbers 0, 1, 2, 5 or pipe. If you want to match any of these numbers, then use an alternation:
(1|2|5|20)

Convert a regex expression to erlang's re syntax?

I am having hard time trying to convert the following regular expression into an erlang syntax.
What I have is a test string like this:
1,2 ==> 3 #SUP: 1 #CONF: 1.0
And the regex that I created with regex101 is this (see below):
([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)
:
But I am getting weird match results if I convert it to erlang - here is my attempt:
{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
Also, I get more than four matches. What am I doing wrong?
Here is the regex101 version:
https://regex101.com/r/xJ9fP2/1
I don't know much about erlang, but I will try to explain. With your regex
>{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
>re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
{match,[{0, 28},{0,3},{8,1},{16,1},{25,3}]}
^^ ^^
|| ||
|| Total number of matched characters from starting index
Starting index of match
Reason for more than four groups
First match always indicates the entire string that is matched by the complete regex and rest here are the four captured groups you want. So there are total 5 groups.
([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)
<-------> <----> <---> <--------->
First group Second group Third group Fourth group
<----------------------------------------------------------------->
This regex matches entire string and is first match you are getting
(Zero'th group)
How to find desired answer
Here we want anything except the first group (which is entire match by regex). So we can use all_but_first to avoid the first group
> re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M, [{capture, all_but_first, list}]).
{match,["1,2","3","1","1.0"]}
More info can be found here
If you are in doubt what is content of the string, you can print it and check out:
1> RE = "([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)".
"([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)"
2> io:format("RE: /~s/~n", [RE]).
RE: /([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)/
For the rest of issue, there is great answer by rock321987.

Regular Expressions in R

I found somewhat similar questions
R - Select string text between two values, regex for n characters or at least m characters,
but I'm still having trouble
say I have a string in r
testing_String <- "AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7"
And I need to be able to pull anything between the first element in the string that contains 2 characters (AK) and PADK,ADK. PADK and ADK will change in character but will always be 4 and 3 characters in length respectively.
So I would need to pull
ADAK NAS
I came up with this but its picking up everything from AK to ADK
^[A-Za-z0_9_]{2}(.*?) +[A-Za-z0_9_]{4}|[A-Za-z0_9_]{3,}
If I understood your question correctly, this should do the trick:
\b[A-Z]{2}\s+(.+?)\s+[A-Z]{4}\s+[A-Z]{3}\b
Demo
You'll have to switch the perl = TRUE option (to use a decent regex engine).
\b means word boundary. So this pattern looks for a match starting with a 2-letter word and ending with a 4 letter word followed by a 3 letter word. Your value will be in the first group.
Alternatively, you can write the following to avoid using the capturing group:
\b[A-Z]{2}\s+\K.+?(?=\s+[A-Z]{4}\s+[A-Z]{3}\b)
But I'd prefer the first method because it's easier to read.
Lookbehind is supported for perl=TRUE, so this regex will do what you want:
(?<=\w{2}\s).*?(?=\s+[^\s]{4}\s[^\s]{2})

Complete Regex Pattern- String Exclusion, Optional End Brackets, Multiple Matches

I'm parsing a bunch of line items on an inventory list and while each line describes something similar, the text format was not standardized. I'm been working on a regex pattern for the past few days but I'm not having much luck with getting a pattern that can match all of my test scenarios. I hoping that someone with a lot more regex experience might be able to point out a few errors in the the pattern
Pattern To Match the palette number: \([Pp]alette [No\.\s]?#?(.*?)\),
1. Warehouse A, (Palette #91L41)
# Match Result Correct: 91L41
2. Warehouse B Palette No. 214
# Match Result Incorrect: no match
3. Warehouse Lot Storage C (Palette No. 9),
# Match Result Incorrect: o. 9 //I don't quite understand why it matches the o
4. Store Location D of Palette (Palette #1),
# Match Result Correct: 1
5. Store Location E of Palette, Empty, lot #45,
# Match Result Incorrect: no match
I've also tried to make the parenthesis optional so that it will match examples 2 and 5 but it's too greedy and included the previously mentioned lot word
Anything in brackets causes the engine to look for ONE of the provided characters. Your pattern successfully matches, for example, strings like: Palette Nabcdefg
To indicate one of different options, you'll need to use paranthesis. What you're actually looking for should look something like this: [Pp]alette (No\.?\s?|#)?(\d+?)
Though it seems highly ineffective to not standardize the pattern. Your last case for example could be completely incompatible since it seems to be capable of containing possibly any kind of input.
A little bit of explanation on matching your patterns with regular expressions. You really don't need to look for and match your parentheses ( .. ) in this case.
Let's say we want to just find any string with the word Palette that is followed with whitespace and the # symbol and capture the Palette sequence from it.
You could simply just use the following:
[Pp]alette\s+#([A-Z0-9]+)
This will result in capturing 91L41 and 1 from the matched patterns
1. Warehouse A, (Palette #91L41)
4. Store Location D of Palette (Palette #1)
Now say we want to find any string that has Palette, followed by whitespace and either a # symbol or No.
We can use a Non-capturing group for this. Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything.
So we could do something like:
[Pp]alette\s+(?:No[ .]+|#)([A-Z0-9]+)
Now this results in matching the following strings and capturing 91L41, 214, 9 and 1
1. Warehouse A, (Palette #91L41)
2. Warehouse B Palette No. 214
3. Warehouse Lot Storage C (Palette No. 9)
4. Store Location D of Palette (Palette #1)
And last if you want to match all the following strings and capture the Palette sequence.
[Pp]alette[\w, ]+(?:No[ .]+|#)([A-Z0-9]+)
See working demo and an explanation on this regular expression.
Everyone has a different way of using regular expressions, this is just one of many ways you can simply understand and accomplish this.
This should work for your case:
[Pp]alette.*?(?:No\.?|#)\s*(\w+)
This will search following types of patterns:
[Pp]alette{any_characters}No.{optonal_spaces}(alphanumeric)
[Pp]alette{any_characters}No{optonal_spaces}(alphanumeric)
[Pp]alette{any_characters}#{optonal_spaces}(alphanumeric)
Check it in action here
MATCH 1
1. [26-31] `91L41`
MATCH 2
1. [60-63] `214`
MATCH 3
1. [104-105] `9`
MATCH 4
1. [148-149] `1`
MATCH 5
1. [195-197] `45`