Using regex how do I stop identifiers matching strings - regex

Let's say I have this "code" I want to lex.
var text = 'hello'
Here's my regex.
String: ([a-z\s]+)
Identifier: [a-z]+
Now when I put my code into regexr.com and use the identifier regex, it matches the string as an identifier, how would I stop it from matching strings as identifiers?

What identifies a string? Quotation marks. In your case: single quotes.
Therefore, we want to match the content between quotes as a string. To do so, we can use the following lazy regex:
'.*?'
To allow both quotes, you could use: '.*?'|".*?" or the same with a backreference (['"]).*?\1.
If it is allowed to escape strings, it gets even more complicated. I suggest using a recursive regex to do so:
((['"])(?>[^'"\\]++|\\.|(?1))*+\2)
Samples matched:
a = "abc dsfsd", b= ' abc dsfsd'
c ="abc\" dsfsd"
d= "abc\\"
To match any identifiers but the strings you could use:
[a-z]+(?=([^']*['][^']*['])*[^']*$)
(Or here a version that matches both types of quotes: [a-z]+(?=([^'"]*(["'])[^"']*\2)*[^"']*$))
Again, it gets more involved if you want to account for escaped quotes:
[a-z]+(?=([^"'\\]*(\\.|(["'])([^"'\\]*\\.)*[^"'\\]*\3))*[^"']*$)
I hope, this helps.

Related

Regex in Flutter to find double quotes enclosed words and escaped single quotes

I am using this Regex in my Flutter App to find words enclosed by single-quotes that end with a .tr:
r"'[^'\\]*(?:\\.[^'\\]*)*'\s*\.tr\b"
Now I need another expression that is almost the same but looks for words enclosed by dobule-quotes, ending with .tr and might contain escaped single-quotes.
I tried simply changing the single quotes to double quotes from the first expression, but Flutter is giving me errors... I need to escaped some characters but I can not make it work. Any idea?
An edge case it should match is:
"Hello, I\'m Chris".tr
You may use this regex for double quoted text that can have any escaped character followed by .tr and word boundary:
r""""[^"\\]*(?:\\.[^"\\]*)*"\s*\.tr\b"""
RegEx Demo
you need to use \ before every " in your RegExp's source, try this:
RegExp regExp = new RegExp(r'\"[^\"\\]*(?:\\.[^\"\\]*)*\"\s*\.tr\b');
print("${regExp.hasMatch('"Hello, I\'m Chris".tr')}"); // result = true

How to exclude part of string using regex and change add this part and the and of string?

I've got a little problem with regex.
I got few strings in one file looking like this:
TEST.SYSCOP01.D%%ODATE
TEST.SYSCOP02.D%%ODATE
TEST.SYSCOP03.D%%ODATE
...
What I need is to define correct regex and change those string name for:
TEST.D%%ODATE.SYSCOP.#01
TEST.D%%ODATE.SYSCOP.#02
TEST.D%%ODATE.SYSCOP.#03
Actually, I got my regex:
r".SYSCOP[0-9]{2}.D%%ODATE" - for finding this in file
But how should look like the changing regex? I need to have the numbers from a string at the and of new string name.
.D%%ODATE.SYSCOP.# - this is just string, no regex and It didn't work
Any idea?
Find: (SYSCOP)(\d+)\.(D%%ODATE)
Replace: $3.$1.#$2 or \3.\1.#\2 for Python
Demo
You may use capturing groups with backreferences in the replacement part:
s = re.sub(r'(\.SYSCOP)([0-9]{2})(\.D%%ODATE)', r'\3\1.#\2', s)
See the regex demo
Each \X in the replacement pattern refers to the Nth parentheses in the pattern, thus, you may rearrange the match value as per your needs.
Note that . must be escaped to match a literal dot.
Please mind the raw string literal, the r prefix before the string literals helps you avoid excessive backslashes. '\3\1.#\2' is not the same as r'\3\1.#\2', you may print the string literals and see for yourself. In short, inside raw string literals, string escape sequences like \a, \f, \n or \r are not recognized, and the backslash is treated as a literal backslash, just the one that is used to build regex escape sequences (note that r'\n' and '\n' both match a newline since the first one is a regex escape sequence matching a newline and the second is a literal LF symbol.)

Groovy : RegEx for matching Alphanumeric and underscore and dashes

I am working on Grails 1.3.6 application. I need to use Regular Expressions to find matching strings.
It needs to find whether a string has anything other than Alphanumeric characters or "-" or "_" or "*"
An example string looks like:
SDD884MMKG_JJGH1222
What i came up with so far is,
String regEx = "^[a-zA-Z0-9*-_]+\$"
The problem with above is it doesn't search for special characters at the end or beginning of the string.
I had to add a "\" before the "$", or else it will give an compilation error.
- Groovy:illegal string body character after dollar sign;
Can anyone suggest a better RegEx to use in Groovy/Grails?
Problem is unescaped hyphen in the middle of the character class. Fix it by using:
String regEx = "^[a-zA-Z0-9*_-]+\$";
Or even shorter:
String regEx = "^[\\w*-]+\$";
By placing an unescaped - in the middle of character class your regex is making it behave like a range between * (ASCII 42) and _ (ASCII 95), matching everything in this range.
In Groovy the $ char in a string is used to handle replacements (e.g. Hello ${name}). As these so called GStrings are only handled, if the string is written surrounding it with "-chars you have to do extra escaping.
Groovy also allows to write your strings without that feature by surrounding them with ' (single quote). Yet the easiest way to get a regexp is the syntax with /.
assert "SDD884MMKG_JJGH1222" ==~ /^[a-zA-Z0-9*-_]+$/
See Regular Expressions for further "shortcuts".
The other points from #anubhava remain valid!
It's easier to reverse it:
String regEx = "^[^a-zA-Z0-9\\*\\-\\_]+\$" /* This matches everything _but_ alnum and *-_ */

Regex - Match string between quotes (") but do not match (\") before the string

I need to match a string that is in quotations, but make sure the first quotation is not escaped.
For example: First \"string\" is "Hello \"World\"!"
Should match only Hello \"World\"!
I am trying to modify (")(?:(?=(\\?))\2.)*?"
I tried adding [^\\"] to ("), and that kinda works, but it matches either only (") or every other letter that isn't (\") and I can't figure out a way to modify ([\\"]") to only match (") if it is not (\")
This is what I have so far ([^\\"]")(?:(?=(\\?))\2.)*?"
I've been trying to figure it out using these two pages, but still cannot get it.
Can Regex be used for this particular string manipulation?
RegEx: Grabbing values between quotation marks
Thanks
You can use negative look behind like this:
(?<!\\)"(.*?)(?<!\\)"
Check see it in action here on regex101
The first match group contains:
Hello \"World\"!

Regular expression to replace spaces with dashes within a sub string.

I've been struggling to find a way to replace spaces with dashes in a string but only spaces that are within a particular part of the string.
Source:
ABC *This is a sub string* DEF
My attempt at a regular expression:
/\s/g
If I use the regular expression to match spaces and replace I get the following result:
ABC-*This-is-a-sub-string*-DEF
But I only want to replace spaces within the text surrounded by the two asterisks.
Here is what I'm trying to achieve:
ABC *This-is-a-sub-string* DEF
Not sure why type of regular expressions I'm using as I'm using the find and replace in TextMate with Regular Expressions option enabled.
It's important to note that the strings that I will be running this regular expression search and replace on will have different text but it's just the spaces within the asterisks that I want to match.
Any help will be appreciated.
To identify spaces that are surrounded by asterisks, the key observation is, that, if asterisks appear only in pairs, the spaces you look for are always followed by an odd number of asterisks.
The regex
\ (?=[^*]*\*([^*]*\*[^*]*\*)*[^*]*$)
will match the once that should be replaced. Textmate would have to support look-ahead assertions for this to work.
s/(?<!\*)\s(?!\*)(?!$)/-/g
If TextMate supports Perl style regex commands (I have no experience with it all, sorry), this is a one-liner that should work.
try this one
/(?<=\*.*)\s(?=.*\*)/g
but it won't work in javascript if you want to use it in it, since it uses also lookbehind which is not supported in js
Try this: \s(\*[^*]*\*)\s. It will match *This is a sub string* in group 1. Then replace to -$1-.
Use this regexp to get spaces from within asterisks
(.)(*(.(\ ).)*)(.)
Take 4th element of the array provided by regex {4} and replace it with dashes.
I find this site very good for creating regular expressions.
It depends on your programming language but in many of them you can use lambda functions with your regular expression replacement statements and thereby perform further replacement on substrings.
Here's an example in Python:
string = "ABC *This is a sub string* DEF"
import re
new_string = re.sub("\*(.*?)\*", lambda x: '*' + x.group(1).replace(" ", "-") + '*', a)
That should give you ABC *This-is-a-sub-string* DEF.