create string in Matlab using only some parameters - regex

In Matlab, is it possible to create a string like:
f1-*f2-*f3-*f4-*f5-*f6
giving only as parameters:
f, 1:6 and -* ?
I tried:
for i=1:6; str = strcat(str, sprintf('f%d %s',i,'-* ')); end
but it doesn't work very well and seems ineficient for a larger number of files... Perhaps a regexp would be more suitable here?

This gives you the string with extra trailing separator:
str = sprintf('f%d-*', 1:6)
Perhaps you can just remove the last two characters from this. In general, a single sprintf for an array input is quite efficient.

strjoin for using -* as a delimiter, and strcat for combining the numbers with f:
>> strjoin(strcat('f',sprintfc('%d',1:6)),'-*')
ans =
f1-*f2-*f3-*f4-*f5-*f6
Because strcat accepts cell arrays, no loop is needed.

% //Data:
letter = 'f';
numbers = 1:6;
separator = '-*';
%// Let's go:
num = mat2cell(num2str(numbers(:)), ones(1,numel(numbers))); %// cell array
%// of strings from the numbers. Those strings may contain spaces.
%// Those will be removed later
s = strcat('f',num,'-*'); %// concatenate letter and separator to each number
s = [s{:}]; %// contatenate all
s = s(1:end-numel(separator)); %// remove last separator
s(s==' ') = []; %// remove spaces (in case of several-digit numbers)

Related

Matlab: using regexp to get a string that has a whitespace in between

I want to use Regex to acquire some ID's in a cellstring array, the array looks like this:
myString = '(['US04650Y1001', 'US90274P3029', 'HON WI', 'US41165F1012'])';
My pattern for regex is as follows:
pattern = '[A-Za-z0-9.^_]+';
newArr = regexp(myString, pattern,'match');
I'd like to get the ID called 'HON WI', but with my current pattern, its splitting it into two because my pattern can't deal with the whitespace properly. I would like to get the whole "HON WI", as well as my other strings, everything that's in '', these might have special characters like ^, . or _, but I don't know how to add the whitespace.
I already tried stuff like this, without success:
pattern = '[A-Za-z0-9.^_\s]+';
My new array should have, in each cell, the strings/ID's contained in myString (US04650Y1001, US90274P3029, HON WI and US41165F1012) with dimensions 1x4.
Another approach that seems to work but not entirely sure:
myString = strrep(myString,'([','');
myString = strrep(myString,'])','');
myString = regexp(myString,',','split');
myString = strrep(myString,'''','');
This seems to get me what I want, but I would like to know how can I alter the regex on my first approach.
Many thanks in advance.
You may use a mere '([^']+)' regex and use 'tokens' to get the captures:
myString = '([''US04650Y1001'', ''US90274P3029'', ''HON WI'', ''US41165F1012''])';
pattern = '''([^'']+)''';
newArr = regexp(myString, pattern,'match', 'tokens');
The newArr will look like
{
[1,1] = 'US04650Y1001'
[1,2] = 'US90274P3029'
[1,3] = 'HON WI'
[1,4] = 'US41165F1012'
}
You may option is to use lookaround assertions. The following will match any string made of alphanumeric character or underscore (\w), space (' ') or characters . or ^, that is located between quotes. This will specifically exclude the blank space next to the comma, in the separation between tokens, i.e. ', ' does not give a match.
Note that \s will match any blank space character (including tab, newline), this is why a space is preferred here:
pattern2='(?<='')[\w.^ ]+(?='')';
pattern2 =
(?<=')[\w.^ ]+(?=')
newArr = regexp(myString, pattern2,'match');
newArr'
ans =
'US04650Y1001'
'US90274P3029'
'HON WI'
'US41165F1012'

Allow user to pass a separator character by doubling it in C++

I have a C++ function that accepts strings in below format:
<WORD>: [VALUE]; <ANOTHER WORD>: [VALUE]; ...
This is the function:
std::wstring ExtractSubStringFromString(const std::wstring String, const std::wstring SubString) {
std::wstring S = std::wstring(String), SS = std::wstring(SubString), NS;
size_t ColonCount = NULL, SeparatorCount = NULL; WCHAR Separator = L';';
ColonCount = std::count(S.begin(), S.end(), L':');
SeparatorCount = std::count(S.begin(), S.end(), Separator);
if ((SS.find(Separator) != std::wstring::npos) || (SeparatorCount > ColonCount))
{
// SEPARATOR NEED TO BE ESCAPED, BUT DON'T KNOW TO DO THIS.
}
if (S.find(SS) != std::wstring::npos)
{
NS = S.substr(S.find(SS) + SS.length() + 1);
if (NS.find(Separator) != std::wstring::npos) { NS = NS.substr(NULL, NS.find(Separator)); }
if (NS[NS.length() - 1] == L']') { NS.pop_back(); }
return NS;
}
return L"";
}
Above function correctly outputs MANGO if I use it like:
ExtractSubStringFromString(L"[VALUE: MANGO; DATA: NOTHING]", L"VALUE")
However, if I have two escape separators in following string, I tried doubling like ;;, but I am still getting MANGO instead ;MANGO;:
ExtractSubStringFromString(L"[VALUE: ;;MANGO;;; DATA: NOTHING]", L"VALUE")
Here, value assigner is colon and separator is semicolon. I want to allow users to pass colons and semicolons to my function by doubling extra ones. Just like we escape double quotes, single quotes and many others in many scripting languages and programming languages, also in parameters in many commands of programs.
I thought hard but couldn't even think a way to do it. Can anyone please help me on this situation?
Thanks in advance.
You should search in the string for ;; and replace it with either a temporary filler char or string which can later be referenced and replaced with the value.
So basically:
1) Search through the string and replace all instances of ;; with \tempFill- It would be best to pick a combination of characters that would be highly unlikely to be in the original string.
2) Parse the string
3) Replace all instances of \tempFill with ;
Note: It would be wise to run an assert on your string to ensure that your \tempFill (or whatever you choose as the filler) is not in the original string to prevent an bug/fault/error. You could use a character such as a \n and make sure there are non in the original string.
Disclaimer:
I can almost guarantee there are cleaner and more efficient ways to do this but this is the simplest way to do it.
First as the substring does not need to be splitted I assume that it does not need to b pre-processed to filter escaped separators.
Then on the main string, the simplest way IMHO is to filter the escaped separators when you search them in the string. Pseudo code (assuming the enclosing [] have been removed):
last_index = begin_of_string
index_of_current_substring = begin_of_string
loop: search a separator starting at last index - if not found exit loop
ok: found one at ix
if char at ix+1 is a separator (meaning with have an escaped separator
remove character at ix from string by copying all characters after it one step to the left
last_index = ix+1
continue loop
else this is a true separator
search a column in [ index_of_current_substring, ix [
if not found: error incorrect string
say found at c
compare key_string with string[index_of_current_substring, c [
if equal - ok we found the key
value is string[ c+2 (skip a space after the colum), ix [
return value - search is finished
else - it is not our key, just continue searching
index_of_current_substring = ix+1
last_index = index_of_current_substring
continue loop
It should now be easy to convert that to C++

Split string with specified delimiter in lua

I'm trying to create a split() function in lua with delimiter by choice, when the default is space.
the default is working fine. The problem starts when I give a delimiter to the function. For some reason it doesn't return the last sub string.
The function:
function split(str,sep)
if sep == nil then
words = {}
for word in str:gmatch("%w+") do table.insert(words, word) end
return words
end
return {str:match((str:gsub("[^"..sep.."]*"..sep, "([^"..sep.."]*)"..sep)))} -- BUG!! doesnt return last value
end
I try to run this:
local str = "a,b,c,d,e,f,g"
local sep = ","
t = split(str,sep)
for i,j in ipairs(t) do
print(i,j)
end
and I get:
1 a
2 b
3 c
4 d
5 e
6 f
Can't figure out where the bug is...
When splitting strings, the easiest way to avoid corner cases is to append the delimiter to the string, when you know the string cannot end with the delimiter:
str = "a,b,c,d,e,f,g"
str = str .. ','
for w in str:gmatch("(.-),") do print(w) end
Alternatively, you can use a pattern with an optional delimiter:
str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+),?") do print(w) end
Actually, we don't need the optional delimiter since we're capturing non-delimiters:
str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+)") do print(w) end
Here's my go-to split() function:
-- split("a,b,c", ",") => {"a", "b", "c"}
function split(s, sep)
local fields = {}
local sep = sep or " "
local pattern = string.format("([^%s]+)", sep)
string.gsub(s, pattern, function(c) fields[#fields + 1] = c end)
return fields
end
"[^"..sep.."]*"..sep This is what causes the problem. You are matching a string of characters which are not the separator followed by the separator. However, the last substring you want to match (g) is not followed by the separator character.
The quickest way to fix this is to also consider \0 a separator ("[^"..sep.."\0]*"..sep), as it represents the beginning and/or the end of the string. This way, g, which is not followed by a separator but by the end of the string would still be considered a match.
I'd say your approach is overly complicated in general; first of all you can just match individual substrings that do not contain the separator; secondly you can do this in a for-loop using the gmatch function
local result = {}
for field in your_string:gsub(("[^%s]+"):format(your_separator)) do
table.insert(result, field)
end
return result
EDIT: The above code made a bit more simple:
local pattern = "[^%" .. your_separator .. "]+"
for field in string.gsub(your_string, pattern) do
-- ...and so on (The rest should be easy enough to understand)
EDIT2: Keep in mind that you should also escape your separators. A separator like % could cause problems if you don't escape it as %%
function escape(str)
return str:gsub("([%^%$%(%)%%%.%[%]%*%+%-%?])", "%%%1")
end

Dynamic regexprep in MATLAB

I have the following strings in a long string:
a=b=c=d;
a=b;
a=b=c=d=e=f;
I want to first search for above mentioned pattern (X=Y=...=Z) and then output like the following for each of the above mentioned strings:
a=d;
b=d;
c=d;
a=b;
a=f;
b=f;
c=f;
d=f;
e=f;
In general, I want all the variables to have an equal sign with the last variable on the extreme right of the string. Is there a way I can do it using regexprep in MATLAB. I am able to do it for a fixed length string, but for variable length, I have no idea how to achieve this. Any help is appreciated.
My attempt for the case of two equal signs is as follows:
funstr = regexprep(funstr, '([^;])+\s*=\s*+(\w+)+\s*=\s*([^;])+;', '$1 = $3; \n $2 = $3;\n');
Not a regexp but if you stick to Matlab you can make use of the cellfun function to avoid loop:
str = 'a=b=c=d=e=f;' ; %// input string
list = strsplit(str,'=') ;
strout = cellfun( #(a) [a,'=',list{end}] , list(1:end-1), 'uni', 0).' %'// Horchler simplification of the previous solution below
%// this does the same than above but more convoluted
%// strout = cellfun( #(a,b) cat(2,a,'=',b) , list(1:end-1) , repmat(list(end),1,length(list)-1) , 'uni',0 ).'
Will give you:
strout =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
Note: As Horchler rightly pointed out in comment, although the cellfun instruction allows to compact your code, it is just a disguised loop. Moreover, since it runs on cell, it is notoriously slow. You won't see the difference on such simple inputs, but keep this use when super performances are not a major concern.
Now if you like regex you must like black magic code. If all your strings are in a cell array from the start, there is a way to (over)abuse of the cellfun capabilities to obscure your code do it all in one line.
Consider:
strlist = {
'a=b=c=d;'
'a=b;'
'a=b=c=d=e=f;'
};
Then you can have all your substring with:
strout = cellfun( #(s)cellfun(#(a,b)cat(2,a,'=',b),s(1:end-1),repmat(s(end),1,length(s)-1),'uni',0).' , cellfun(#(s) strsplit(s,'=') , strlist , 'uni',0 ) ,'uni',0)
>> strout{:}
ans =
'a=d;'
'b=d;'
'c=d;'
ans =
'a=b;'
ans =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
This gives you a 3x1 cell array. One cell for each group of substring. If you want to concatenate them all then simply: strall = cat(2,strout{:});
I haven't had much experience w/ Matlab; but your problem can be solved by a simple string split function.
[parts, m] = strsplit( funstr, {' ', '='}, 'CollapseDelimiters', true )
Now, store the last part of parts; and iterate over parts until that:
len = length( parts )
for i = 1:len-1
print( strcat(parts(i), ' = ', parts(len)) )
end
I do not know what exactly is the print function in matlab. You can update that accordingly.
There isn't a single Regex that you can write that will cover all the cases. As posted on this answer:
https://stackoverflow.com/a/5019658/3393095
However, you have a few alternatives to achieve your final result:
You can get all the values in the line with regexp, pick the last value, then use a for loop iterating throughout the other values to generate the output. The regex to get the values would be this:
matchStr = regexp(str,'([^=;\s]*)','match')
If you want to use regexprep at any means, you should write a pattern generator and a replace expression generator, based on number of '=' in the input string, and pass these as parameters of your regexprep func.
You can forget about Regex and Split the input to generate the output looping throughout the values (similarly to alternative #1) .

Finding the shortest repetitive pattern in a string

I was wondering if there was a way to do pattern matching in Octave / matlab? I know Maple 10 has commands to do this but not sure what I need to do in Octave / Matlab. So if a number was 12341234123412341234 the pattern match would be 1234. I'm trying to find the shortest pattern that upon repetiton generates the whole string.
Please note: the numbers (only numbers will be used) won't be this simple. Also, I won't know the pattern ahead of time (that's what I'm trying to find). Please see the Maple 10 example below which shows that the pattern isn't known ahead of time but the command finds the pattern.
Example of Maple 10 pattern matching:
ns:=convert(12341234123412341234,string);
ns := "12341234123412341234"
StringTools:-PrimitiveRoot(ns);
"1234"
How can I do this in Octave / Matlab?
Ps: I'm using Octave 3.8.1
To find the shortest pattern that upon repetition generates the whole string, you can use regular expressions as follows:
result = regexp(str, '^(.+?)(?=\1*$)', 'match');
Some examples:
>> str = '12341234123412341234';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'1234'
>> str = '1234123412341234123';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'1234123412341234123'
>> str = 'lullabylullaby';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'lullaby'
>> str = 'lullaby1lullaby2lullaby1lullaby2';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result =
'lullaby1lullaby2'
I'm not sure if this can be accomplished with regular expressions. Here is a script that will do what you need in the case of a repeated word called pattern.
It loops through the characters of a string called str, trying to match against another string called pattern. If matching fails, the pattern string is extended as needed.
EDIT: I made the code more compact.
str = 'lullabylullabylullaby';
pattern = str(1);
matchingState = false;
sPtr = 1;
pPtr = 1;
while sPtr <= length(str)
if str(sPtr) == pattern(pPtr) %// if match succeeds, keep looping through pattern string
matchingState = true;
pPtr = pPtr + 1;
pPtr = mod(pPtr-1,length(pattern)) + 1;
else %// if match fails, extend pattern string and start again
if matchingState
sPtr = sPtr - 1; %// don't change str index when transitioning out of matching state
end
matchingState = false;
pattern = str(1:sPtr);
pPtr = 1;
end
sPtr = sPtr + 1;
end
display(pattern);
The output is:
pattern =
lullaby
Note:
This doesn't allow arbitrary delimiters between occurrences of the pattern string. For example, if str = 'lullaby1lullaby2lullaby1lullaby2';, then
pattern =
lullaby1lullaby2
This also allows the pattern to end mid-way through a cycle without changing the result. For example, str = 'lullaby1lullaby2lullaby1'; would still result in
pattern =
lullaby1lullaby2
To fix this you could add the lines
if pPtr ~= length(pattern)
pattern = str;
end
Another approach is as follows:
determine length of string, and find all possible factors of the string length value
for each possible factor length, reshape the string and check
for a repeated substring
To find all possible factors, see this solution on SO. The next step can be performed in many ways, but I implement it in a simple loop, starting with the smallest factor length.
function repeat = repeats_in_string(str);
ns = numel(str);
nf = find(rem(ns, 1:ns) == 0);
for ii=1:numel(nf)
repeat = str(1:nf(ii));
if all(ismember(reshape(str,nf(ii),[])',repeat));
break;
end
end
This problem is a great Rorschach test for your approach to problem solving. I'll add a signal engineering solution, which should be simple since the signal is expected to be perfectly repetitive, assuming this holds: find the shortest pattern that upon repetition generates the whole string.
In the following str fed to the function is actually a column vector of floats, not a string, the original string having been converted with str2num(str2mat(str)'):
function res=findshortestrepel(str);
[~,ii] = max(fft(str-mean(str)));
res = str(1:round(numel(str)/(ii-1)));
I performed a small test, comparing this to the regexp solution and found it to be faster overall (blue squares), although somewhat inconsistently, and only if you don't consider the time required to convert the string into a vector of floats (green squares). However I did not pursue this further (not breaking records with this):
Times in sec.