Decompose an ISO8601 format time stamp with regular expressions - regex

I want to extract the minutes and seconds from an time stamp of ISO8601 format. I made some tries with regexp but I have no experience on that.
Could you please help me on this?
Examples:
PT1M46S --> 1 minute, 46 seconds
PT36S --> 36 seconds
Thanks!

getPart = #(str, c) str2double(['0' regexp(str, ['\d*(?=' c ')'], 'match', 'once')]);
str = 'PT36S';
seconds = getPart(str, 'S');
minutes = getPart(str, 'M');
hours = getPart(str, 'H');
This looks for character c, finds the digits behind it and converts them to a double. It adds character '0' in the beginning because if regexp can't find a match it returns an empty string. Adding this converts empty strings to zero while not affecting other numbers. If you want to restrict it to the parts after PT, you can remove that from original string using
str = regexprep(str, '^.*PT', '');

Use datevec to turn strings representing hours, minutes etc into corresponding numeric values. See "help datestr" to understand rules for symbols used in second input to datevec, ie. the format string. Here's how you can convert the two examples given, I leave it to you to extend it to cover the entire format.
str = 'PT36S';
str = strrep(str, 'PT', ''); % PT have to go.
if ~ismember('M', str)
% To use a single format string, we must write zero minutes if none are there already
str = ['00M', str];
end
% Date string cannot contain characters y,m,d,H,M or S, so remove these
str = strrep(str, 'S', ' ');
str = strrep(str, 'M', ' ');
% Call datevec with appropriate format string
[~, ~, ~, ~, min, sec] = datevec(str, 'MM SS')
You can extend this to manage hours, days etc by including additional if loops similar to that above. I am not familiar with this standard beyond the examples given so let me know if it's not as simple as that.

Related

How can I get several characters from a std::string

I want to name a file according to the date and the time it was created. I'm using this code to get the date and time:
auto time = std::chrono::system_clock::now();
std::time_t end_time = chrono::system_clock::to_time_t(time);
std::string finaltime = std::ctime(&end_time);
I now want to eliminate all spaces and informations that I dont need from the finaltime-string . For this purpose I found out which characters I need. I need all except 11, 14, 17 (counter started at 0 for the first character).
In python there is a very simple way to do something like that, if you need all characters from 2 to 5 you can say mystring[2:5]. Is there somethig similar in c++ or is there another way to delete the chars I need
use the substr(a, b) function
std::string str2 = finaltime.substr (3,5); // 12:00

Allow user to pass a separator character by doubling it in C++

I have a C++ function that accepts strings in below format:
<WORD>: [VALUE]; <ANOTHER WORD>: [VALUE]; ...
This is the function:
std::wstring ExtractSubStringFromString(const std::wstring String, const std::wstring SubString) {
std::wstring S = std::wstring(String), SS = std::wstring(SubString), NS;
size_t ColonCount = NULL, SeparatorCount = NULL; WCHAR Separator = L';';
ColonCount = std::count(S.begin(), S.end(), L':');
SeparatorCount = std::count(S.begin(), S.end(), Separator);
if ((SS.find(Separator) != std::wstring::npos) || (SeparatorCount > ColonCount))
{
// SEPARATOR NEED TO BE ESCAPED, BUT DON'T KNOW TO DO THIS.
}
if (S.find(SS) != std::wstring::npos)
{
NS = S.substr(S.find(SS) + SS.length() + 1);
if (NS.find(Separator) != std::wstring::npos) { NS = NS.substr(NULL, NS.find(Separator)); }
if (NS[NS.length() - 1] == L']') { NS.pop_back(); }
return NS;
}
return L"";
}
Above function correctly outputs MANGO if I use it like:
ExtractSubStringFromString(L"[VALUE: MANGO; DATA: NOTHING]", L"VALUE")
However, if I have two escape separators in following string, I tried doubling like ;;, but I am still getting MANGO instead ;MANGO;:
ExtractSubStringFromString(L"[VALUE: ;;MANGO;;; DATA: NOTHING]", L"VALUE")
Here, value assigner is colon and separator is semicolon. I want to allow users to pass colons and semicolons to my function by doubling extra ones. Just like we escape double quotes, single quotes and many others in many scripting languages and programming languages, also in parameters in many commands of programs.
I thought hard but couldn't even think a way to do it. Can anyone please help me on this situation?
Thanks in advance.
You should search in the string for ;; and replace it with either a temporary filler char or string which can later be referenced and replaced with the value.
So basically:
1) Search through the string and replace all instances of ;; with \tempFill- It would be best to pick a combination of characters that would be highly unlikely to be in the original string.
2) Parse the string
3) Replace all instances of \tempFill with ;
Note: It would be wise to run an assert on your string to ensure that your \tempFill (or whatever you choose as the filler) is not in the original string to prevent an bug/fault/error. You could use a character such as a \n and make sure there are non in the original string.
Disclaimer:
I can almost guarantee there are cleaner and more efficient ways to do this but this is the simplest way to do it.
First as the substring does not need to be splitted I assume that it does not need to b pre-processed to filter escaped separators.
Then on the main string, the simplest way IMHO is to filter the escaped separators when you search them in the string. Pseudo code (assuming the enclosing [] have been removed):
last_index = begin_of_string
index_of_current_substring = begin_of_string
loop: search a separator starting at last index - if not found exit loop
ok: found one at ix
if char at ix+1 is a separator (meaning with have an escaped separator
remove character at ix from string by copying all characters after it one step to the left
last_index = ix+1
continue loop
else this is a true separator
search a column in [ index_of_current_substring, ix [
if not found: error incorrect string
say found at c
compare key_string with string[index_of_current_substring, c [
if equal - ok we found the key
value is string[ c+2 (skip a space after the colum), ix [
return value - search is finished
else - it is not our key, just continue searching
index_of_current_substring = ix+1
last_index = index_of_current_substring
continue loop
It should now be easy to convert that to C++

String masking - inserting dashes

I am writing a function to format a string. I receive a string of numbers, sometimes with dashes, sometimes not. I need to produce an output string of 14 characters, so if the input string contains less than 14, I need to pad it with zeros. then I need to mask the string of numbers by inserting dashes in appropriate places. Here is what I got so far:
strTemp = strTemp.Replace("-", "")
If IsNumeric(strTemp) Then
If strTemp.Length < 14 Then
strTemp = strTemp.PadRight(14 - strTemp.Length)
End If
output = String.Format(strTemp, "{00-000-0-0000-00-00}")
End If
The above works fine, except it just returns a string of numbers without putting in the dashes. I know I am doing something wrong with String.Format but so far I've only worked with pre-defined formats. Can anyone help? How can I use Regex for string formatting in this case?
This function should do the trick:
Public Function MaskFormat(input As String) As String
input = input.Replace("-", String.Empty)
If IsNumeric(input) Then
If input.Length < 14 Then
input = input.PadRight(14 - input.Length)
End If
Return String.Format("{0:00-000-0-0000-00-00}", CLng(input))
Else
Return String.Empty
End If
End Function
You can find more on String formatting here.

Verify and cut a string using regexp in matlab

I have the following string:
{'output',{'variable','VGRG_Pos_Var1/Parameters/D_foo'},'date',734704.60904050921}
I would like to verify the format of the string that the word 'variable' is the second word and i would like to retrive the string after the last '/' in the 3rd string (In this example 'D_foo').
how could i verify this and retrive the sting i search?
I tried the following:
regexp(str,'{''\w+'',{''variable'',''([(a-z)|(A-Z)|/|_])+')
without success
REMARK
The string to analysis is not splited after the komma, it is only due to length of the string.
EDIT
my string is:
'{''output'',{''variable'',''VGRG_Pos_Var1/Parameters/D_foo''},''date'',734704.60904050921}';
and not a cell, which could be understood. I added the sybol ' at the start and end of the string to symbolizied that it is a string.
I realise that you mention using regexp in the question, but I'm not sure if this is a requirement? If other solutions are acceptable you could try this:
str='{''output'',{''variable'',''VGRG_Pos_Var1/Parameters/D_foo''},''date'',734704.60904050921}';
parts1=textscan( str, '%s','delimiter',{',','{','}'},'MultipleDelimsAsOne',1);
parts2=textscan( parts1{1}{3}, '%s','delimiter',{'/',''''},'MultipleDelimsAsOne',1);
string=parts2{1}{end}
match=strcmp(parts1{1}{2},'variable')
To answer the first part of your question, you can write this:
str = {'output',{'variable','VGRG_Pos_Var1/Parameters/D_foo'},'date',734704.60904050921};
temp = str(2); %this holds the cell containing the two strings
if cmpstr(temp{1}(1), 'variable')
%do stuff
end
For the second part you can do this:
str = {'output',{'variable','VGRG_Pos_Var1/Parameters/D_foo'},'date',734704.60904050921};
temp = str(2); %like before, this contains the cell
temp = temp{1}(2); %this picks out the second string in the cell
temp = char(temp); %turns the item from a cell to a string
res = strsplit(temp, '/'); %splits the string where '/' are found, res is an array of strings
string = res(3); %assuming there will always be just 2 '/'s.

How to separate a line of input into multiple variables?

I have a file that contains rows and columns of information like:
104857 Big Screen TV 567.95
573823 Blender 45.25
I need to parse this information into three separate items, a string containing the identification number on the left, a string containing the item name, and a double variable containing the price. The information is always found in the same columns, i.e. in the same order.
I am having trouble accomplishing this. Even when not reading from the file and just using a sample string, my attempt just outputs a jumbled mess:
string input = "104857 Big Screen TV 567.95";
string tempone = "";
string temptwo = input.substr(0,1);
tempone += temptwo;
for(int i=1 ; temptwo != " " && i < input.length() ; i++)
{
temptwo = input.substr(j,j);
tempone += temp2;
}
cout << tempone;
I've tried tweaking the above code for quite some time, but no luck, and I can't think of any other way to do it at the moment.
You can find the first space and the last space using std::find_first_of and std::find_last_of . You can use this to better split the string into 3 - first space comes after the first variable and the last space comes before the third variable, everything in between is the second variable.
How about following pseudocode:
string input = "104857 Big Screen TV 567.95";
string[] parsed_output = input.split(" "); // split input string with 'space' as delimiter
// parsed_output[0] = 104857
// parsed_output[1] = Big
// parsed_output[2] = Screen
// parsed_output[3] = TV
// parsed_output[4] = 567.95
int id = stringToInt(parsed_output[0]);
string product = concat(parsed_output[1], parsed_output[2], ... ,parsed_output[length-2]);
double price = stringToDouble(parsed_output[length-1]);
I hope, that's clear.
Well try breaking down the files components:
you know a number always comes first, and we also know a number has no white spaces.
The string following the number CAN have whitespaces, but won't contain any numbers(i would assume)
After this title, you're going to have more numbers(with no whitespaces)
from these components, you can deduce:
grabbing the first number is as simple as reading in using the filestream <<.
getting the string requires you to check until you reach a number, grabbing one character at a time and inserting that into a string. the last number is just like the first, using the filestream <<
This seems like homework so i'll let you put the rest together.
I would try a regular expression, something along these lines:
^([0-9]+)\s+(.+)\s+([0-9]+\.[0-9]+)$
I am not very good at regex syntax, but ([0-9]+) corresponds to a sequence of digits (this is the id), ([0-9]+\.[0-9]+) is the floating point number (price) and (.+) is the string that is separated from the two number by sequences of "space" characters: \s+.
The next step would be to check if you need this to work with prices like ".50" or "10".