Balanced brackets in regexp - regex

How can I find maximum possible pattern in a string in Matlab, which matches some expression. Example will clarify what I mean:
str = 'tan(sin*cos)';
str = 'tan(sin(exp)*cos(exp))';
I want to find the patterns, which look like tan(\w*). But I want brackets in tan to be balanced. Is there any approach to do it?

It's not possible without recusrsive regular expressions. For example, this string:
str = 'tan(tan(tan(x) + 4) + cos(x))'
would have to be regex'ed "from the inside out", something only recursion can do.
Instead, I'd just use a more practical solution:
regexprep(str, 'tan', '')
and/or split further when necessary. Or, as Ruud already suggested, just use a loop:
str{1} = 'tan(x)';
str{2} = 'tan(sin(exp)*cos(exp)) + tan(tan(x) + 4)';
S = regexp(str, 'tan\(');
match = cell(size(str));
[match{:}] = deal({});
for ii = 1:numel(str)
if ~isempty(S{ii})
for jj = 1:numel(S{ii})
open = false;
start = S{ii}(jj)+4;
for kk = start : numel(str{ii})
switch str{ii}(kk)
case '('
open = true;
case ')'
if open
open = false;
else
match{ii}{end+1} = str{ii}(start:kk-1);
break;
end
end
end
end
end
end

Related

Lua: How do I place something between two or more repeating characters in a string?

This question is somewhat similar to this, but my task is to place something, in my case the dash, between the repeating characters, for example the question marks, using the gsub function.
Example:
"?" = "?"
"??" = "?-?"
"??? = "?-?-?"
Try this:
function test(s)
local t=s:gsub("%?%?","?-?"):gsub("%?%?","?-?")
print(#s,s,t)
end
for n=0,10 do
test(string.rep("?",n))
end
A possible solution using LPeg:
local lpeg = require 'lpeg'
local head = lpeg.C(lpeg.P'?')
local tail = (lpeg.P'?' / function() return '-?' end) ^ 0
local str = lpeg.Cs((head * tail + lpeg.P(1)) ^ 1)
for n=0,10 do
print(str:match(string.rep("?",n)))
end
print(str:match("?????foobar???foo?bar???"))
This what i can came out with scanning each letter by letter
function test(str)
local output = ""
local tab = {}
for let in string.gmatch(str, ".") do
table.insert(tab, let)
end
local i = 1
while i <= #tab do
if tab[i - 1] == tab[i] then
output = output.."-"..tab[i]
else
output = output..tab[i]
end
i = i + 1
end
return output
end
for n=0,10 do
print(test(string.rep("?",n)))
end

Regex for strings not starting with "My" or "By"

I need Regex which matches when my string does not start with "MY" and "BY".
I have tried something like:
r = /^my&&^by/
but it doesn't work for me
eg
mycountry = false ; byyou = false ; xyz = true ;
You could test if the string does not start with by or my, case insensitive.
var r = /^(?!by|my)/i;
console.log(r.test('My try'));
console.log(r.test('Banana'));
without !
var r = /^([^bm][^y]|[bm][^y]|[^bm][y])/i;
console.log(r.test('My try'));
console.log(r.test('Banana'));
console.log(r.test('xyz'));
if you are only concerned with only specific text at the start of the string than you can use latest js string method .startsWith
let str = "mylove";
if(str.startsWith('my') || str.startsWith('by')) {
// handle this case
}
Try This(Regex is NOT case sensitive):
var r = /^([^bm][y])/i; //remove 'i' for case sensitive("by" or "my")
console.log('mycountry = '+r.test('mycountry'));
console.log('byyou= '+r.test('byyou'));
console.log('xyz= '+r.test('xyz'));
console.log('Mycountry = '+r.test('Mycountry '));
console.log('Byyou= '+r.test('Byyou'));
console.log('MYcountry = '+r.test('MYcountry '));
console.log('BYyou= '+r.test('BYyou'));

Finding the shortest repetitive pattern in a string

I was wondering if there was a way to do pattern matching in Octave / matlab? I know Maple 10 has commands to do this but not sure what I need to do in Octave / Matlab. So if a number was 12341234123412341234 the pattern match would be 1234. I'm trying to find the shortest pattern that upon repetiton generates the whole string.
Please note: the numbers (only numbers will be used) won't be this simple. Also, I won't know the pattern ahead of time (that's what I'm trying to find). Please see the Maple 10 example below which shows that the pattern isn't known ahead of time but the command finds the pattern.
Example of Maple 10 pattern matching:
ns:=convert(12341234123412341234,string);
ns := "12341234123412341234"
StringTools:-PrimitiveRoot(ns);
"1234"
How can I do this in Octave / Matlab?
Ps: I'm using Octave 3.8.1
To find the shortest pattern that upon repetition generates the whole string, you can use regular expressions as follows:
result = regexp(str, '^(.+?)(?=\1*$)', 'match');
Some examples:
>> str = '12341234123412341234';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'1234'
>> str = '1234123412341234123';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'1234123412341234123'
>> str = 'lullabylullaby';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'lullaby'
>> str = 'lullaby1lullaby2lullaby1lullaby2';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'lullaby1lullaby2'
I'm not sure if this can be accomplished with regular expressions. Here is a script that will do what you need in the case of a repeated word called pattern.
It loops through the characters of a string called str, trying to match against another string called pattern. If matching fails, the pattern string is extended as needed.
EDIT: I made the code more compact.
str = 'lullabylullabylullaby';
pattern = str(1);
matchingState = false;
sPtr = 1;
pPtr = 1;
while sPtr <= length(str)
if str(sPtr) == pattern(pPtr) %// if match succeeds, keep looping through pattern string
matchingState = true;
pPtr = pPtr + 1;
pPtr = mod(pPtr-1,length(pattern)) + 1;
else %// if match fails, extend pattern string and start again
if matchingState
sPtr = sPtr - 1; %// don't change str index when transitioning out of matching state
end
matchingState = false;
pattern = str(1:sPtr);
pPtr = 1;
end
sPtr = sPtr + 1;
end
display(pattern);
The output is:
pattern =
lullaby
Note:
This doesn't allow arbitrary delimiters between occurrences of the pattern string. For example, if str = 'lullaby1lullaby2lullaby1lullaby2';, then
pattern =
lullaby1lullaby2
This also allows the pattern to end mid-way through a cycle without changing the result. For example, str = 'lullaby1lullaby2lullaby1'; would still result in
pattern =
lullaby1lullaby2
To fix this you could add the lines
if pPtr ~= length(pattern)
pattern = str;
end
Another approach is as follows:
determine length of string, and find all possible factors of the string length value
for each possible factor length, reshape the string and check
for a repeated substring
To find all possible factors, see this solution on SO. The next step can be performed in many ways, but I implement it in a simple loop, starting with the smallest factor length.
function repeat = repeats_in_string(str);
ns = numel(str);
nf = find(rem(ns, 1:ns) == 0);
for ii=1:numel(nf)
repeat = str(1:nf(ii));
if all(ismember(reshape(str,nf(ii),[])',repeat));
break;
end
end
This problem is a great Rorschach test for your approach to problem solving. I'll add a signal engineering solution, which should be simple since the signal is expected to be perfectly repetitive, assuming this holds: find the shortest pattern that upon repetition generates the whole string.
In the following str fed to the function is actually a column vector of floats, not a string, the original string having been converted with str2num(str2mat(str)'):
function res=findshortestrepel(str);
[~,ii] = max(fft(str-mean(str)));
res = str(1:round(numel(str)/(ii-1)));
I performed a small test, comparing this to the regexp solution and found it to be faster overall (blue squares), although somewhat inconsistently, and only if you don't consider the time required to convert the string into a vector of floats (green squares). However I did not pursue this further (not breaking records with this):
Times in sec.

Recursive tricks with regexp in Matlab

I tried to use regexprep to solve a problem - I'm given a string, that represents a function; it contains a patterns like these: 'sin(arcsin(f))' where f - any substring; and I need to replace it with simple 'f_2'. I successfully used regexprep unless I face with such string:
str = 'sin(arcsin(sin(arcsin(f_2))))*x^2';
str = regexprep(str, 'sin\(arcsin\((\w*)\)\)','$1');
it returns
str =
sin(arcsin(f_2))*x^2
But I want it to be
str =
f_2*x^2
Is there any way to solve it (except obvious solution with for-loops).
I was not able to test this, but I thinkg I found an expression that you can call multiple times to do what you asked for; each time it will "strip" one sin(arcsin()) pair out of your equation. Once it stops changing, you're done.
(.*)sin\(arcsin\((.*(\(.*?\))*)(\)\).*$)
Here is some Matlab code that shows how this might work:
str = 'sin(arcsin(sin(arcsin(f_2))))*x^2';
regex = (.*)sin\(arcsin\((.*(\(.*?\))*)(\)\).*$);
oldlength = 0
newlength = length(str)
while (newlength != oldlength)
oldlength = newlength;
str = regexprep(str, regex,'$1$2');
newlength = length(str);
end
As I said - I could not test this. Let me know if you have any problems with this.
Demo of the regular expression:
http://regex101.com/r/bR9gC7
Change your pattern to search for 1 or more (+) nested sin(arcsin( occurrences:
str = 'sin(arcsin(sin(arcsin(f_2))))*x^2';
str2 = regexprep(str, '(sin\(arcsin\()+(\w*)(\)\))+','$2')
str2 =
f_2*x^2

Replace each RegExp match with different text in ActionScript 3

I'd like to know how to replace each match with a different text?
Let's say the source text is:
var strSource:String = "find it and replace what you find.";
..and we have a regex such as:
var re:RegExp = /\bfind\b/g;
Now, I need to replace each match with different text (for example):
var replacement:String = "replacement_" + increment.toString();
So the output would be something like:
output = "replacement_1 it and replace what you replacement_2";
Any help is appreciated..
You could also use a replacement function, something like this:
var increment : int = -1; // start at -1 so the first replacement will be 0
strSource.replace( /(\b_)(.*?_ID\b)/gim , function() {
return arguments[1] + "replacement_" + (increment++).toString();
} );
I came up with a solution finally..
Here it is, if anyone needs:
var re:RegExp = /(\b_)(.*?_ID\b)/gim;
var increment:int = 0;
var output:Object = re.exec(strSource);
while (output != null)
{
var replacement:String = output[1] + "replacement_" + increment.toString();
strSource = strSource.substring(0, output.index) + replacement + strSource.substring(re.lastIndex, strSource.length);
output = re.exec(strSource);
increment++;
}
Thanks anyway...
leave off the g (global) flag and repeat the search with the appropriate replace string. Loop until the search fails
Not sure about actionscript, but in many other regex implementations you can usually pass a callback function that will execute logic for each match and replace.