Exclude special charcters with Regex - regex

I am trying to get version number of my assembly. I use regex for it and here is my pattern.
$pattern = '\[assembly: AssemblyVersion\("(.*)"\)\]'
It works good but in AssemblyInfo.cs and AssemblyInfo.vb there are some special characters as example
in cs file
// by using the '*' as shown below:
// [assembly: AssemblyVersion("1.0.*")]
[assembly: AssemblyVersion("3.7.1.0")]
in .vb file
' <Assembly: AssemblyVersion("1.0.*")>
<Assembly: AssemblyVersion("3.2.0.0")>
<Assembly: AssemblyFileVersion("1.0.0.0")>
So I want to exclude // and ' charachters in my pattern. I tried to exclude it with [^//] but it does not work. I tried something else but it did not work either.
And the second question is
in .vb file, there are different starting.
<Assembly: AssemblyVersion("3.2.0.0")>
and in c# file there are different starting
[assembly: AssemblyVersion("3.7.1.0")]
How i can include also vb version into my pattern?
Where is the problem?

You can use negative lookbehind if your library supports it
(?<!\/\/\s|\'\s)\[assembly: AssemblyVersion\("(.*)"\)\]
Edit:
For matching different brackets you can just use [variant1|variant2] syntax
(?<!\/\/\s|\'\s)[<\[][Aa]ssembly: AssemblyVersion\("(.*)"\)[>\]]

You want to "exclude" rows which start either with / or '.
Start from setting m (multi-line) flag in you regex.
It ensures that ^ matches start of each line (not the whole string).
Then start the regex from:
^ - acting now (in multi-line mode) as a row start marker,
(?!\/|') - a negative lookahead group with two variants
inside (do not allow that the line starts from either / or '),
\s* - an optional sequence of spaces.
and then goes your regex.
So the whole regex should look like below:
^(?!\/|')\s*\[assembly: AssemblyVersion\("(.*)"\)\]
(remember about the m flag).
Negative lookbehind solution mentioned in other answers has such a flaw
that even if a row starts from ' or / but has no space thereafter,
such a regex will fail.

Related

Regex to match(extract) string between dot(.)

I want to select some string combination (with dots(.)) from a very long string (sql). The full string could be a single line or multiple line with new line separator, and this combination could be in start (at first line) or a next line (new line) or at both place.
I need help in writing a regex for it.
Examples:
String s = I am testing something like test.test.test in sentence.
Expected output: test.test.test
Example2 (real usecase):
UPDATE test.table
SET access = 01
WHERE access IN (
SELECT name FROM project.dataset.tablename WHERE name = 'test' GROUP BY 1 )
Expected output: test.table and project.dataset.tablename
, can I also add some prefix or suffix words or space which should be present where ever this logic gets checked. In above case if its update regex should pick test.table, but if the statement is like select test.table regex should not pick it up this combinations and same applies for suffix.
Example3: This is to illustrate the above theory.
INS INTO test.table
SEL 'abcscsc', wu_id.Item_Nbr ,1
FROM test.table as_t
WHERE as_t.old <> 0 AND as_t.date = 11
AND (as_t.numb IN ('11') )
Expected Output: test.table, test.table (Key words are INTO and FROM)
Things Not Needed in selection:as_t.numb, as_t.old, as_t.date
If I get the regex I can use in program to extract this word.
Note: Before and after string words to the combination could be anything like update, select { or(, so we have to find the occurrence of words which are joined together with .(dot) and all the number of such occurrence.
I tried something like this:
(?<=.)(.?)(?=.)(.?) -: This only selected the word between two .dot and not all.
.(?<=.)(.?)(?=.)(.?). - This everything before and after.
To solve your initial problem, we can just use some negation. Here's the pattern I came up with:
[^\s]+\.[^\s]+
[^ ... ] Means to make a character class including everything except for what's between the brackets. In this case, I put \s in there, which matches any whitespace. So [^\s] matches anything that isn't whitespace.
+ Is a quantifier. It means to find as many of the preceding construct as you can without breaking the match. This would happily match everything that's not whitespace, but I follow it with a \., which matches a literal .. The \ is necessary because . means to match any character in regex, so we need to escape it so it only has its literal meaning. This means there has to be a . in this group of non-whitespace characters.
I end the pattern with another [^\s]+, which matches everything after the . until the next whitespace.
Now, to solve your secondary problem, you want to make this match only work if it is preceded by a given keyword. Luckily, regex has a construct almost specifically for this case. It's called a lookbehind. The syntax is (?<= ... ) where the ... is the pattern you want to look for. Using your example, this will only match after the keywords INTO and FROM:
(?<=(?:INTO|FROM)\s)[^\s]+\.[^\s]+
Here (?:INTO|FROM) means to match either the text INTO or the text FROM. I then specify that it should be followed by a whitespace character with \s. One possible problem here is that it will only match if the keywords are written in all upper case. You can change this behavior by specifying the case insensitive flag i to your regex parser. If your regex parser doesn't have a way to specify flags, you can usually still specify it inline by putting (?i) in front of the pattern, like so:
(?i)(?<=(?:INTO|FROM)\s)[^\s]+\.[^\s]+
If you are new to regex, I highly recommend using the www.regex101.com website to generate regex and learn how it works. Don't forget to check out the code generator part for getting the regex code based on the programming language you are using, that's a cool feature.
For your question, you need a regex that understands any word character \w that matches between 0 and unlimited times, followed by a dot, followed by another series of word character that repeats between 0 and unlimited times.
So here is my solution to your question:
Your regex in JavaScript:
const regex = /([\w][.][\w])+/gm;
in Java:
final String regex = "([\w][.][\w])+";
in Python:
regex = r"([\w][.][\w])+"
in PHP:
$re = '/([\w][.][\w])+/m';
Note that: this solution is written for your use case (to be used for SQL strings), because now if you have something like '.word' or 'word..word', it will still catch it which I assume you don't have a string like that.
See this screenshot for more details

Regex - match up to first literal

I have some lines of code I am trying to remove some leading text from which appears like so:
Line 1: myApp.name;
Line 2: myApp.version
Line 3: myApp.defaults, myApp.numbers;
I am trying and trying to find a regex that will remove anything up to (but excluding) myApp.
I have tried various regular expressions, but they all seem to fail when it comes to line 3 (because myApp appears twice).
The closest I have come so far is:
.*?myApp
Pretty simple - but that matches both instances of myApp occurrences in Line 3 - whereas I'd like it to match only the first.
There's a few hundred lines - otherwise I'd have deleted them all manually by now.
Can somebody help me? Thanks.
You need to add an anchor ^ which matches the starting point of a line ,
^.*?(myApp)
DEMO
Use the above regex and replace the matched characters with $1 or \1. So that you could get the string myApp in the final result after replacement.
Pattern explanation:
^ Start of a line.
.*?(myApp) Shortest possible match upto the first myApp. The string myApp was captured and stored into a group.(group 1)
All matched characters are replaced with the chars present inside the group 1.
Your regular expression works in Perl if you add the ^ to ensure that you only match the beginnings of lines:
cat /tmp/test.txt | perl -pe 's/^.*?myApp/myApp/g'
myApp.name;
myApp.version
myApp.defaults, myApp.numbers;
If you wanted to get fancy, you could put the "myApp" into a group that doesn't get captured as part of the expression using (?=) syntax. That way it doesn't have to be replaced back in.
cat /tmp/test.txt | perl -pe 's/^.*?(?=myApp)//g'
myApp.name;
myApp.version
myApp.defaults, myApp.numbers;

find a single quote at the end of a line starting with "mySqlQueryToArray"

I'm trying to use regex to find single quotes (so I can turn them all into double quotes) anywhere in a line that starts with mySqlQueryToArray (a function that makes a query to a SQL DB). I'm doing the regex in Sublime Text 3 which I'm pretty sure uses Perl Regex. I would like to have my regex match with every single quote in a line so for example I might have the line:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name'");
I want the regex to match in that line both of the quotes around $name but no other characters in that line. I've been trying to use (?<=mySqlQueryToArray.*)' but it tells me that the look behind assertion is invalid. I also tried (?<=mySqlQueryToArray)(?<=.*)' but that's also invalid. Can someone guide me to a regex that will accomplish what I need?
To find any number of single quotes in a line starting with your keyword you can use the \G anchor ("end of last match") by replacing:
(^\h*mySqlQueryToArray|(?!^)\G)([^\n\r']*)'
With \1\2<replacement>: see demo here.
Explanation
( ^\h*mySqlQueryToArray # beginning of line: check the keyword is here
| (?!^)\G ) # if not at the BOL, check we did match sth on this line
( [^\n\r']* ) ' # capture everything until the next single quote
The general idea is to match everything until the next single quote with ([^\n\r']*)' in order to replace it with \2<replacement>, but do so only if this everything is:
right after the beginning keyword (^mySqlQueryToArray), or
after the end of the last match ((?!^)\G): in that case we know we have the keyword and are on a relevant line.
\h* accounts for any started indent, as suggested by Xælias (\h being shortcut for any kind of horizontal whitespace).
https://stackoverflow.com/a/25331428/3933728 is a better answer.
I'm not good enough with RegEx nor ST to do this in one step. But I can do it in two:
1/ Search for all mySqlQueryToArray strings
Open the search panel: ⌘F or Find->Find...
Make sure you have the Regex (.* ) button selected (bottom left) and the wrap selector (all other should be off)
Search for: ^\s*mySqlQueryToArray.*$
^ beginning of line
\s* any indentation
mySqlQueryToArray your call
.* whatever is behind
$ end of line
Click on Find All
This will select every occurrence of what you want to modify.
2/ Enter the replace mode
⌥⌘F or Find->Replace...
This time, make sure that wrap, Regex AND In selection are active .
Them search for '([^']*)' and replace with "\1".
' are your single quotes
(...) si the capturing block, referenced by \1 in the replace field
[^']* is for any character that is not a single quote, repeated
Then hit Replace All
I know this is a little more complex that the other answer, but this one tackles cases where your line would contain several single-quoted string. Like this:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name' and Value='1234'");
If this is too much, I guess something like find: (?<=mySqlQueryToArray)(.*?)'([^']*)'(.*?) and replace it with \1"\2"\3 will be enough.
You can use a regex like this:
(mySqlQueryToArray.*?)'(.*?)'(.*)
Working demo
Check the substitution section.
You can use \K, see this regex:
mySqlQueryToArray[^']*\K'(.*?)'
Here is a regex demo.

Hierarchical path RegExp

I have to remove a known "level" from a hierarchical path using a regular expression.
In other terms, I want to go from 'a/b/X/c/d' to 'a/b/c/d', where X can be at any level of the path.
Using Javascript as an example, I have crafted the following:
str = str.replace(/^(?:(.+\/)|)X(?:$|\/(.+$))/, "$1$2")
which works fine when X is either the root or is in the middle of the path, but leaves a trailing slash when X comes last in the path. I could make a subsequent replace to handle those instances, but would it be possible to create a better RegEx that matches all the cases?
Thanks.
Edit: To clarify, all levels of the path might contain any number of characters and I'm only interested in removing a level only if it matches X exactly.
Search: \bX/|/X(?=$)
Replace: Empty String
In the Regex Demo, see the substitutions at the bottom.
Input
a/b/X/c/d
X/a/b/c/d
a/b/c/d/X
Output
a/b/c/d
a/b/c/d
a/b/c/d
Explanation
\b assert word boundary
X/ match X/
OR |
Match /X, if the lookahead (?=$) can assert that what follows is the end of the string

replaceregexp to trim not working properly in Netbeans

I'm trying to to a trim to some values using replaceregexp. Everything looks great when I try it in software like EditPad Pro.
Here's a sample of what I want to accomplish:
mf.version.impl = 2.01.00
mf.version.spec= 2.01.00
Notice the extra spaces after the last digit.
Then I'm using this pattern:
[0-9]+.[0-9]+.[0-9]+[ ]*
But it doesn't work in Netbeans.
Here's my ant command for it:
<!--If postfix is empty, remove the empty space-->
<replaceregexp file="../Xinco/nbproject/project.properties"
match="mf.version.spec?=?[0-9]+.[0-9]+.[0-9]+[ ]*"
replace="mf.version.spec = ${version_high}.${version_mid}.${version_low}"
byline="false"/>
<replaceregexp file="../Xinco/nbproject/project.properties"
match="mf.version.impl?=?[0-9]+.[0-9]+.[0-9]+[ ]*"
replace="mf.version.impl = ${version_high}.${version_mid}.${version_low}"
byline="true"/>
${version_high}.${version_mid}.${version_low} are variables already defined that correspond to 2.01.00 respectively.
It results in
mf.version.impl = 2.01.00
mf.version.spec = 2.01.00
Notice one extra space after the last digit.
I did debug the ant calls and it seems like the above command is not executing like a match didn't occur.
Any idea?
Since you don't care about the value, you don't have to match it explicitly. Try:
^mf\.version\.impl\s*=.*$
Meaning:
^ - start of the line (on multiline mode)
mf\.version\.impl - the string "mf.version.impl" literally, with the dots escaped.
\s* - zero or more spaces
.* - anything else (we can ignore the version, since you change it with a constant), all the way through to the...
$ - end of the line
Bonus track:
Looking at the specs, it looks like you can catch both lines with a single regex (not sure it works though):
^(mf\.version\.(impl|spec))\s*=.*$
and the replace rule:
replace="\1 = ${version_high}.${version_mid}.${version_low}"
This will replace \1 with the value it captured before, so again, you only need a single rule. (for trivia, usually $1 is used in replaces, but not here)
You should probably escape your .'s and use a capture group
e.g. (Perl regex for example)
s/([0-9]+\.[0-9]+\.[0-9]+)[ ]*/$1/