I am converting a code of mine from MATLAB to julia, thus I need to replace parentheses used for indexing: they are of the type () in MATLAB and of the type [] in julia. Functions parentheses are of the same type in both, i.e. ().
I thought that the fastest way to do this was to use Notepad++, finding all of the parenthes and then replacing them with brackets when need.
Anyhow it does not work as expected.
I won't copy all of the function I am converting now, but some parts as example:
x= coord(:,1);
y= coord(:,2);
natG_coord(1,1)= sqrt(1/3);
natG_coord(2,1)= -sqrt(1/3);
natG_coord(3,1)= -sqrt(1/3);
natG_coord(4,1)= sqrt(1/3);
for i=1:4
dNG(1,i)= (1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
dNG(2,i)= -(1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
dNG(3,i)= -(1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
dNG(4,i)= (1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
end
I tried finding \((.*)\) and replacing with [$1], but it does not get all of the parentheses. For instance, it gets the ones in declaring x and y, the sqrt value but does not get the natG_coord indexes. In the for cycle, it only gets the last expression of each line, i.e. (1-etaG(i)^2), but the external parenthes, not the etaG index (which is actually what I need to replace).
I cannot see a pattern in the choice and thus cannot come up with a solution.
Other solutions not to get mad doing this parenthesis by parenthesis is fine!
Thank you all for your help.
edit
#stribizhev: the final result should be this:
x= coord[:,1]
y= coord[:,2]
natG_coord[1,1]= sqrt(1/3)
natG_coord[2,1]= -sqrt(1/3)
natG_coord[3,1]= -sqrt(1/3)
natG_coord[4,1]= sqrt(1/3)
for i=1:4
dNG[1,i]= (1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
dNG[2,i]= -(1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
dNG[3,i]= -(1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
dNG[4,i]= (1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
end
What I get finding \((.*)\) and replacing with [$1] one time is:
x= coord[:,1];
y= coord[:,2];
natG_coord[1,1)= sqrt(1/3];
natG_coord[2,1)= -sqrt(1/3];
natG_coord[3,1)= -sqrt(1/3];
natG_coord[4,1)= sqrt(1/3];
for i=1:4
dNG[1,i)= (1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
dNG[2,i)= -(1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
dNG[3,i)= -(1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
dNG[4,i)= (1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
end
What I get finding \(((?>[^()]|(?R))*)\) and replacing all with [$1] one time is (I know you said several times, if I do it it'll replace every matching braces in the end):
x= coord[:,1];
y= coord[:,2];
natG_coord[1,1]= sqrt[1/3];
natG_coord[2,1]= -sqrt[1/3];
natG_coord[3,1]= -sqrt[1/3];
natG_coord[4,1]= sqrt[1/3];
for i=1:4
dNG[1,i]= [1+etaG(i)]/4 + csiG[i]*[1+etaG(i)]/2 - [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
dNG[2,i]= -[1+etaG(i)]/4 + csiG[i]*[1+etaG(i)]/2 + [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
dNG[3,i]= -[1-etaG(i)]/4 + csiG[i]*[1-etaG(i)]/2 + [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
dNG[4,i]= [1-etaG(i)]/4 + csiG[i]*[1-etaG(i)]/2 - [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
end
What I get finding \(([^()]*)\) replacing all with [$1] one time is:
x= coord[:,1];
y= coord[:,2];
natG_coord[1,1]= sqrt[1/3];
natG_coord[2,1]= -sqrt[1/3];
natG_coord[3,1]= -sqrt[1/3];
natG_coord[4,1]= sqrt[1/3];
for i=1:4
dNG[1,i]= (1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
dNG[2,i]= -(1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
dNG[3,i]= -(1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
dNG[4,i]= (1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
end
So the last one is exactly what I was looking for. Once I go with the "find next" command, I can decide whether they are indexing parantheses or not and substitute them or not (avoiding the sqrt function input, for instance).
Thank you very much for your help.
Since the \(([^()]*)\) (to replace with [$1]) worked for you, here is the explanation:
\(([^()]*)\)
Matches:
\( - an opening round bracket
([^()]*) - Capture group 1 matches zero or more characters other than ( and ) (with [^()]*)
\)- a closing round bracket
This regex above will match all last nested level parentheses, that do not have any parentheses inside them.
Answering Aaron's remark about replacing the parentheses inside the quoted strings, it is great that Notepad++ supports Boost conditional replacement patterns. We can match what we do not need to modify and replace with self, and use another replacement for the other matches.
(?<o1>"[^"\\]*(?:\\.[^"\\]*)*")|(?<o2>\(([^()]*)\))
And replace with (?{o1}$+{o1}:[$3]).
Note that "[^"\\]*(?:\\.[^"\\]*)*" matches C strings with escaped entities correctly and efficiently. The replacement pattern means to replace with the quoted string (if o1 group matched) or with [+Group 3 value+] (if the other group matched).
If you need to replace outer balanced parentheses, use
\(((?>[^()]|(?R))*)\)
And replace with [$1] (see demo). If you need to replace the overlapping parenthetical substrings, you will need to hit Replace All several times.
Regex explanation:
\( # an outer literal opening round bracket
( # start group 1
(?> # start of atomic group
[^()] # any character other than ( and )
| # OR
(?R) # recursively match the whole pattern
)* # end atomic group and repeat zero or more times
) # end of group 1
\) # match a literal closing round bracket
If the strings you need to replace those parentheses should be preceded with word characters, use
(\w+)(\(((?>[^()]|(?2))*)\))
And replace with $1[$3]. See demo
This regex uses a (?2) subroutine that just repeats the second capture group subpattern.
Now, avoiding to match these inside quoted strings. Assume we have var d = "r(string here)" and we do not want to turn the () to [] here. Instead of (\w+)(\(((?>[^()]|(?2))*)\)) (with $1[$3] replacement), use
(?<o1>"[^"\\]*(?:\\.[^"\\]*)*")|(?<o2>(\w+)(\(((?>[^()]|(?4))*)\)))
And (?{o1}$+{o1}:$3[$5]) as the replacement. This will keep var d = "r(string here)" string intact, and will turn var f = a(fg()g) into var f = a[fg()g].
Related
I am converting a code of mine from MATLAB to julia, thus I need to replace parentheses used for indexing: they are of the type () in MATLAB and of the type [] in julia. Functions parentheses are of the same type in both, i.e. ().
I thought that the fastest way to do this was to use Notepad++, finding all of the parenthes and then replacing them with brackets when need.
Anyhow it does not work as expected.
I won't copy all of the function I am converting now, but some parts as example:
x= coord(:,1);
y= coord(:,2);
natG_coord(1,1)= sqrt(1/3);
natG_coord(2,1)= -sqrt(1/3);
natG_coord(3,1)= -sqrt(1/3);
natG_coord(4,1)= sqrt(1/3);
for i=1:4
dNG(1,i)= (1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
dNG(2,i)= -(1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
dNG(3,i)= -(1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
dNG(4,i)= (1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2)/4;
end
I tried finding \((.*)\) and replacing with [$1], but it does not get all of the parentheses. For instance, it gets the ones in declaring x and y, the sqrt value but does not get the natG_coord indexes. In the for cycle, it only gets the last expression of each line, i.e. (1-etaG(i)^2), but the external parenthes, not the etaG index (which is actually what I need to replace).
I cannot see a pattern in the choice and thus cannot come up with a solution.
Other solutions not to get mad doing this parenthesis by parenthesis is fine!
Thank you all for your help.
edit
#stribizhev: the final result should be this:
x= coord[:,1]
y= coord[:,2]
natG_coord[1,1]= sqrt(1/3)
natG_coord[2,1]= -sqrt(1/3)
natG_coord[3,1]= -sqrt(1/3)
natG_coord[4,1]= sqrt(1/3)
for i=1:4
dNG[1,i]= (1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
dNG[2,i]= -(1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
dNG[3,i]= -(1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
dNG[4,i]= (1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4
end
What I get finding \((.*)\) and replacing with [$1] one time is:
x= coord[:,1];
y= coord[:,2];
natG_coord[1,1)= sqrt(1/3];
natG_coord[2,1)= -sqrt(1/3];
natG_coord[3,1)= -sqrt(1/3];
natG_coord[4,1)= sqrt(1/3];
for i=1:4
dNG[1,i)= (1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
dNG[2,i)= -(1+etaG(i))/4 + csiG(i)*(1+etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
dNG[3,i)= -(1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 + (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
dNG[4,i)= (1-etaG(i))/4 + csiG(i)*(1-etaG(i))/2 - (1-etaG(i)^2)/4 - 2*csiG(i)*(1-etaG(i)^2]/4;
end
What I get finding \(((?>[^()]|(?R))*)\) and replacing all with [$1] one time is (I know you said several times, if I do it it'll replace every matching braces in the end):
x= coord[:,1];
y= coord[:,2];
natG_coord[1,1]= sqrt[1/3];
natG_coord[2,1]= -sqrt[1/3];
natG_coord[3,1]= -sqrt[1/3];
natG_coord[4,1]= sqrt[1/3];
for i=1:4
dNG[1,i]= [1+etaG(i)]/4 + csiG[i]*[1+etaG(i)]/2 - [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
dNG[2,i]= -[1+etaG(i)]/4 + csiG[i]*[1+etaG(i)]/2 + [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
dNG[3,i]= -[1-etaG(i)]/4 + csiG[i]*[1-etaG(i)]/2 + [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
dNG[4,i]= [1-etaG(i)]/4 + csiG[i]*[1-etaG(i)]/2 - [1-etaG(i)^2]/4 - 2*csiG[i]*[1-etaG(i)^2]/4;
end
What I get finding \(([^()]*)\) replacing all with [$1] one time is:
x= coord[:,1];
y= coord[:,2];
natG_coord[1,1]= sqrt[1/3];
natG_coord[2,1]= -sqrt[1/3];
natG_coord[3,1]= -sqrt[1/3];
natG_coord[4,1]= sqrt[1/3];
for i=1:4
dNG[1,i]= (1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
dNG[2,i]= -(1+etaG[i])/4 + csiG[i]*(1+etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
dNG[3,i]= -(1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 + (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
dNG[4,i]= (1-etaG[i])/4 + csiG[i]*(1-etaG[i])/2 - (1-etaG[i]^2)/4 - 2*csiG[i]*(1-etaG[i]^2)/4;
end
So the last one is exactly what I was looking for. Once I go with the "find next" command, I can decide whether they are indexing parantheses or not and substitute them or not (avoiding the sqrt function input, for instance).
Thank you very much for your help.
Since the \(([^()]*)\) (to replace with [$1]) worked for you, here is the explanation:
\(([^()]*)\)
Matches:
\( - an opening round bracket
([^()]*) - Capture group 1 matches zero or more characters other than ( and ) (with [^()]*)
\)- a closing round bracket
This regex above will match all last nested level parentheses, that do not have any parentheses inside them.
Answering Aaron's remark about replacing the parentheses inside the quoted strings, it is great that Notepad++ supports Boost conditional replacement patterns. We can match what we do not need to modify and replace with self, and use another replacement for the other matches.
(?<o1>"[^"\\]*(?:\\.[^"\\]*)*")|(?<o2>\(([^()]*)\))
And replace with (?{o1}$+{o1}:[$3]).
Note that "[^"\\]*(?:\\.[^"\\]*)*" matches C strings with escaped entities correctly and efficiently. The replacement pattern means to replace with the quoted string (if o1 group matched) or with [+Group 3 value+] (if the other group matched).
If you need to replace outer balanced parentheses, use
\(((?>[^()]|(?R))*)\)
And replace with [$1] (see demo). If you need to replace the overlapping parenthetical substrings, you will need to hit Replace All several times.
Regex explanation:
\( # an outer literal opening round bracket
( # start group 1
(?> # start of atomic group
[^()] # any character other than ( and )
| # OR
(?R) # recursively match the whole pattern
)* # end atomic group and repeat zero or more times
) # end of group 1
\) # match a literal closing round bracket
If the strings you need to replace those parentheses should be preceded with word characters, use
(\w+)(\(((?>[^()]|(?2))*)\))
And replace with $1[$3]. See demo
This regex uses a (?2) subroutine that just repeats the second capture group subpattern.
Now, avoiding to match these inside quoted strings. Assume we have var d = "r(string here)" and we do not want to turn the () to [] here. Instead of (\w+)(\(((?>[^()]|(?2))*)\)) (with $1[$3] replacement), use
(?<o1>"[^"\\]*(?:\\.[^"\\]*)*")|(?<o2>(\w+)(\(((?>[^()]|(?4))*)\)))
And (?{o1}$+{o1}:$3[$5]) as the replacement. This will keep var d = "r(string here)" string intact, and will turn var f = a(fg()g) into var f = a[fg()g].
I have a regular expression, it's basically to update log4j syntax to log4j2 syntax, removing the string replacement. The regular expression is as follows
(?:^\(\s*|\s*\+\s*|,\s*)(?:[\w\(\)\.\d+]*|\([\w\(\)\.\d+]*\s*(?:\+|-)\s*[\w\(\)\.\d+]*\))(?:\s\+\s*|\s*\);)
This will successfully match the variables in the following strings
("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);
But not '+ thingCollection.get(0).getMyId()' in
("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);
I am getting better with regular expressions, but this one has me a bit stumped. Thanks!
For some reason, when some people are writing a regex pattern, they forget that the whole of the Perl language is still available
I would just delete all the strings and find the remaining substrings that look like variable names
use strict;
use warnings 'all';
use feature qw/ say fc /;
use List::Util 'uniq';
my #variables;
while ( <DATA> ) {
s/"[^"]*"//g;
push #variables, /\b[a-z]\w*/ig;
}
say for sort { fc $a cmp fc $b } uniq #variables;
__DATA__
("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);
("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);
output
address
countInUse
e
endTime
get
getMyId
otherThingId
secondThingId
size
startTime
thingCollection
thingId
things
You should be able to simplify your regex to match things in between '+' signs.
(?:\+)([^"]*?)(?:[\+,])
Working Example
(Note the ? after the * this makes the * lazy so it matches as little as possible to catch all occurrences)
If you want just the variable you could access the first capture group from that expression or ignore the capture group to get the full match.
Updated Version (?:\+)([^"]*?)(?:[\+,])|\s([^"+]*?)\);Working Example
Note with the new version that the variable might get placed into capture group 2 instead of 1
You might be able to pare it down to this (?:^\(\s*|\s*\+\s*|,\s*)(?:[\w().\s+]+|\([\w().\s+-]*\))(?:(?=,)|\s*\+\s*|\s*\);)
101 regex
It consolidates some constructs.
To fix the immediate problem, I added a comma in some classes.
A note that this kind of regex is fraught with problematic type of flow.
(?:
^ \( \s*
| \s* \+ \s*
| , \s*
)
(?:
[\w().\s+]+
| \( [\w().\s+-]* \)
)
(?:
(?= , )
| \s* \+ \s*
| \s* \);
)
I want to match the expression of var * var, var * num, num * var and num * num separately, i.e. using four different regular expression.
my var could be s1,s2,...,S1,S2,...,v1,v2,...V1,V2....
my num could be any float number
for var*var, I use:
[vVsS][0-9]+\s*[*/]\s*[vVsS][0-9]+
and it works well
for var*num and num*var, I use:
[vVsS][0-9]+\s*[*/]\s*[0-9]+[.]?[0-9]*
and
[0-9]+[.]?[0-9]*\s*[*/]\s*(vVsS)[0-9]+
but it returns nothing when I try the input:
2*4 + s1* 7 + v3 * 2 + s3 * V2 + 5*v1
UPDATE: I could do that now.
For example, for the case of var * num
[vVsS][0-9]+\s*[*/]\s*[0-9]+(?:[.][0-9]+)? works well, as Wiktor Stribiżew suggests in comment.
But I didn't find some explanation on the use of(?:) online. Anyone has idea on that?
You may use
[vVsS][0-9]+\s*[*/]\s*[0-9]+(?:[.][0-9]+)?
The pattern matches:
[vVsS][0-9]+ - a letter from the character class (either v, V, s or S) followed with one or more digits
\s*[*/]\s* - a / or * enclosed with zero or more whitespaces
[0-9]+ - one or more digits
(?:[.][0-9]+)? - an optional non-capturing group matching a dot and one or more digits.
I have a group of variable var:
> var
[1] "a1" "a2" "a3" "a4"
here is what I want to achieve: using regex and change strings such as this:
3*a1 + a1*a2 + 4*a3*a4 + a1*a3
to
3a1 + a1*a2 + 4a3*a4 + a1*a3
Basically, I want to trim "*" that is not in between any values in var. Thank you in advance
Can do find (?<![\da-z])(\d+)\* replace $1
(?<! [\da-z] )
( \d+ ) # (1)
\*
Or, ((?:[^\da-z]|^)\d+)\* for the assertion impaired engines
( # (1 start)
(?: [^\da-z] | ^ )
\d+
) # (1 end)
\*
Leading assertions are bad anyways.
Benchmark
Regex1: (?<![\da-z])(\d+)\*
Options: < none >
Completed iterations: 100 / 100 ( x 1000 )
Matches found per iteration: 2
Elapsed Time: 1.09 s, 1087.84 ms, 1087844 µs
Regex2: ((?:[^\da-z]|^)\d+)\*
Options: < none >
Completed iterations: 100 / 100 ( x 1000 )
Matches found per iteration: 2
Elapsed Time: 0.77 s, 767.04 ms, 767042 µs
You can create a dynamic regex out of the var to match and capture *s that are inside your variables, and reinsert them back with a backreference in gsub, and remove all other asterisks:
var <- c("a1","a2","a3","a4")
s = "3*a1 + a1*a2 + 4*a3*a4 + a1*a3"
block = paste(var, collapse="|")
pat = paste0("\\b((?:", block, ")\\*)(?=\\b(?:", block, ")\\b)|\\*")
gsub(pat, "\\1", s, perl=T)
## "3a1 + a1*a2 + 4a3*a4 + a1*a3"
See the IDEONE demo
Here is the regex:
\b((?:a1|a2|a3|a4)\*)(?=\b(?:a1|a2|a3|a4)\b)|\*
Details:
\b - leading word boundary
((?:a1|a2|a3|a4)\*) - Group 1 matching
(?:a1|a2|a3|a4) - either one of your variables
\* - asterisk
(?=\b(?:a1|a2|a3|a4)\b) - a lookahead check that there must be one of your variables (otherwise, no match is returned, the * is matched with the second branch of the alternation)
| - or
\* - a "wild" literal asterisk to be removed.
Taking the equation as a string, one option is
gsub('((?:^| )\\d)\\*(\\w)', '\\1\\2', '3*a1 + a1*a2 + 4*a3*a4 + a1*a3')
# [1] "3a1 + a1*a2 + 4a3*a4 + a1*a3"
which looks for
a captured group of characters, ( ... )
containing a non-capturing group, (?: ... )
containing the beginning of the line ^
or, |
a space (or \\s)
followed by a digit 0-9, \\d.
The capturing group is followed by an asterisk, \\*,
followed by another capturing group ( ... )
containing an alphanumeric character \\w.
It replaces the above with
the first captured group, \\1,
followed by the second captured group, \\2.
Adjust as necessary.
Thank #alistaire for offering a solution with non-capturing group. However, the solution replies on that there exists an space between the coefficient and "+" in front of it. Here's my modified solution based on his suggestion:
> ss <- "3*a1 + a1*a2+4*a3*a4 +2*a1*a3+ 4*a2*a3"
# my modified version
> gsub('((?:^|\\s|\\+|\\-)\\d)\\*(\\w)', '\\1\\2', ss)
[1] "3a1 + a1*a2+4a3*a4 +2a1*a3+ 4a2*a3"
# alistire's
> gsub('((?:^| )\\d)\\*(\\w)', '\\1\\2', ss)
[1] "3a1 + a1*a2+4*a3*a4 +2*a1*a3+ 4a2*a3"
I have to match multiple instances of either "int(" or "der("
So the expression must match these strings
VVEH + int(ACC_X) + der(FL_WSP)
VVEH + int(ACC_X) + int(FL_WSP)
VVEH + der(ACC_X) + der(FL_WSP)
and not these
VVEH + int(ACC_X) + log(FL_WSP)
VVEH + der(ACC_X) + log(FL_WSP)
VVEH( \+ (int|der)\([^)]+\)){2,}
VVEH #Initial string
(
\+ #Escape the 'plus'
(int|der) #Either of your function names
\( #Escape the bracket
[^)]+ #Match anything inside the brackets
\) #Escape the bracket
){2,} #All of that stuff above at least twice