How to use regex with cson - regex

I wanna capture logical operators from ooRexx with regex in a .cson file because I want support syntax highlighting of ooRexx with the Atom editor. Those are the operators I try to cover:
>= <= \> \< \= >< <> == \== // && || ** ¬> ¬< ¬= ¬== >> << >>= \<< ¬<< \>> ¬>> <<=
And this is the regex part in the cson file:
'match': '\\+ | - | [\\\\] | \\/ | % | \\* | \\| | & |=|¬|>|<|
>= | <= | ([\\\\]>) | ([\\\\]<) | ([\\\\]=) | >< | <> | == | ([\\\\]==) |
\\/\\/ | && | \\|\\| | \\*\\* | ¬> | ¬< | ¬= | ¬== | >> | << | >>= | ([\\\\]<<) | ¬<< |
([\\\\]>>) | ¬>> | <<='
I'm struggling with the slashes (forward and backward) and also with the double **My knowledge about regex is very basic, to say it nicely. Is there somebody who can help me with that?

You have spaces around the pipe bars: these spaces are counted in the regular expression. So when you write something like | \*\* |, the double asterisks get caught, but only if they are surrounded by a space on each side, and not if they're affixed to a word or at the beginning/end of a line. Same issue with the slashes — I have tested it, and it does seem to catch them for me, but only as long as your slashes (or asterisks) are between two spaces.
A few other things to keep in mind:
You shouldn't need the square brackets around backslashes; they're useful to provide classes of possible characters to match. For instance, [<>]= will catch both >= and <=. Writing [\\] is equivalent to writing \\ directly because \\ counts as a single character, due to the first escaping backslash. Similarly, your parentheses here are not being used; see grouping.
Also think of using repetition operators like + and *. So \\>+ will catch both \> and \>>.
Finally, the question mark will help you avoid repetition, by marking the previous character (or group of characters, in square brackets) as optional. ==? will match both = and ==.
You can group together a LOT of your statements with these three tricks combined… I'll leave that exercise to you!
Just another hint when developing long regular expressions — use a tester like Regex101 or similar with a test file to see your changes in real time, and debuggers like Regexper will help you understand how your regular expression is parsed.

Related

Regex inside VIM to reuse part of the string and replace the other

honestly bemused and I didn't expect to be asking this but here goes.
In side VIM, and only VIM, I want to perform a global search and replace. My target text is:
51099 analgesic
43045 analgesic
70145 analgesic
52338 analgesic
41214
55309
34373
47003
50659
51327
This goes on for several thousand lines. For all those lines that do not end "\tanalgesic" (notice the tab), I would like to retain the number and insert "\tanalgesic". I've tried several ways, none of which works (obviously).
Outside of VIM (in a general regex checker), [0-9]+$ finds all instances of "one or more digits and the end of the line". Within VIM, this does not work (:/ will have been added to represent moving into command mode then "/" for a search). I'm so baffled by why this should be the case.
Although this does not work, I expect the solution is going to look similar:
:%s/[0-9]+$/(1)\tanalgesic/g
You could use for example vim very magic mode with \v
Use \1 for a backreference to group 1.
:%s/\v(\d+$)/\1\tanalgesic/g
The command without very magic mode, with escaped parenthesis for the group, and the escaped plus sign for the quantifier:
%s/\([0-9]\+$\)/\1\tanalgesic/g
| | | | |
| | | | ^ All occurrences
| | | ^^ Backreference to group 1
| | ^^ Escaped plus sign
| ^^ Escaped parenthesis for group 1
^ Substitute in whole file
Output
51099 analgesic
43045 analgesic
70145 analgesic
52338 analgesic
41214 analgesic
55309 analgesic
34373 analgesic
47003 analgesic
50659 analgesic
51327 analgesic
See http://vimregex.com/ for more information.

Match a word in a list of words regex

I want the user to only be able to enter the values in the following regex:
^[AB | BC | MB | NB | NL | NS | NT | NU | ON |QC | PE | SK | YT]{2}$
My problem is that words like : PP AA QQ are accepted.
I am not sure how i can prevent that ? Thank you.
Site i use to verify the expression : https://regex101.com/
In most RegExp flavors, square brackets [] denotate character classes; that is, a set of individual tokens that can be matched in a specific position.
Because P is included in this character class (along with a quantifier of {2}) PP is matched.
Instead, you seem to want a group with alternatives; for that, you'd use parenthesis () (while also eliminating the whitespace, something it doesn't appear was intentional on your part):
^(AB|BC|MB|NB|NL|NS|NT|NU|ON|QC|PE|SK|YT){2}$
RegEx101
This matches things like ABBC, ABAB, NLBC, etc.

REGEX Replacing with exception

my first problem here is my nemesis regex.
I need a regex to replace every , with a "," from a text without replacing existing ,".
It looks like this:
Before:
abcd,efgh,ijkl,"","",mnop
After:
abcd","efgh","ijkl","","","mnop
I hope you can help me.
Solving a problem using regular expressions is nice but now you have two problems.
A simple solution that does not involve the usage of regular expressions to is do three simple string replacements: first replace , with "," then replace ","" with "," and in the end ""," with ",".
Let's see why this works:
| after 1st | after 2nd | after 3rd
original | replacement | replacement | replacement
----------+-------------+-------------+-------------
a,b | a","b | a","b | a","b
m",n | m"","n | m"","n | m","n
x,"y | x",""y | x","y | x","y
See it in action:
const input = 'abcd,efgh,ijkl,"","",mnop';
const output = input.replace(/,/g, '","').replace(/",""/g, '","').replace(/"","/g, '","');
console.log(output);
N.B. The code snippet above uses regular expressions because this is how JavaScript implements the "replace all" functionality. When the first argument of String.replace() is a string it replaces only its first occurrence.
I could use String.replaceAll() instead (it works with strings) but it is not widely supported by browsers yet.
Crudely, I think you are after something like:
(?:(?<!"),"|(?<=")",(?!")|(?<!"),(?!"))
Note: As mention by #WiktorStribiżew in the comments you could get rid of the outer non-capturing group: (?<!"),"|(?<=")",(?!")|(?<!"),(?!")
See the online Demo

Rematch same or part of previous matched group

I'm looking for a way to match part of - or the whole - previously matched group. For instance, assume we've the following text:
this is a very long text "with" some quoted strings I "need" to match in their own context
A regex like (.{1,20})(".*?")(.{1,20}) gives the following output:
# | 1st group | 2nd group | 3rd group
------------------------------------------------------------------
1 | is a very long text | "with" | some quoted strings
2 | I | "need" | to extract in their
The goal's to force the regex to re-match part of the 3rd group from the 1st match - or the whole match when quoted strings are quite near - when is matching the 2nd one. Basically I'd like to have the following output instead:
# | 1st group | 2nd group | 3rd group
------------------------------------------------------------------
1 | is a very long text | "with" | some quoted strings
2 | me quoted strings I | "need" | to extract in their
Probably, a backreference support would do the trick but go regex engine lacks of it.
If you go back to the original problem, you need to extract the quotes in context.
Since you don't have lookahead, you could use regexp just to match quotes (or even just strings.Index), and just get byte ranges, then expand to include context yourself by expanding the range (this may require more work if dealing with complex UTF strings).
Something like:
input := `this is a very long text "with" some quoted strings I "need" to extract in their own context`
re := regexp.MustCompile(`(".*?")`)
matches := re.FindAllStringIndex(input, -1)
for _, m := range matches {
s := m[0] - 20
e := m[1] + 20
if s < 0 {
s = 0
}
if e >= len(input) {
e = -1
}
fmt.Printf("%s\n", input[s:e])
}
https://play.golang.org/p/brH8v6OM-Fx

Regex to select NOT and operand

I am trying to break a string to array using Regex in C# .
I have for example the string
{([Field] = '100' OR [LaneDescription] LIKE '%DENTINPALEUW%'
OR [LaneDescription] = 'asdf' OR ([ObjectID] = 1) AND [ITEM_HEIGHT] >=
10 AND [SENDER_COMPANY] NOT LIKE '%DHL%'}
(Generated from Telerik RadFilter)
and i need it broken so i can pass it to a custom object with types: open parenthesis, field, comparator , value, close parenthesis.
So far and with the help of http://regexr.com i have reached to
\[([^\[\]]*)\]+|[\w'%]+|[()=]
but i need to get the '>=' and 'NOT LIKE' as one (and similar values like <> != etc..)
You can see my late night attempts at http://regexr.com/39g6b
Any help would be much appreciated.
(PS: There are no newline characters at the string)
Try
\(|\)|\[[a-zA-Z0-9_]+\]|'.*?'|\d+|NOT LIKE|\w+|[=><!]+
Demo.
Explanation:
\( // match "(" literally
| // or
\) // ")"
| // or
\[[a-zA-Z0-9_]+\] // any words inside square braces []
|
'.*?' // strings enclosed in single quotes '' (escape sequences can easily trip this up though)
|
\d+ // digits
|
NOT LIKE // "NOT LIKE", because this is the only token that can contain whitespace
|
\w+ // words like "NOT", "AND", etc
|
[=><!]+ // operators like ">", "!=", etc