String tokenisation, split by token not separator

String tokenisation, split by token not separator - c++

I see how to tokenise a string in the traditional manner (i.e. this answer here How do I tokenize a string in C++?) but how can I split a string by its tokens, also including them?
For example given a date/time picture such as yyyy\MMM\dd HH:mm:ss, I would like to split into an array with the following:
"yyyy", "\", "MMM", "\", "dd", " " , "HH", ":", "mm", ":", "ss"
The "tokens" are yyyy, MMM, dd, HH, mm, ss in this example. I don't know what the separators are, only what the tokens are. The separators need to appear in the final result however. The complete list of tokens is:
"yyyy" // – four-digit year, e.g. 1996
"yy" // – two-digit year, e.g. 96
"MMMM" // – month spelled out in full, e.g. April
"MMM" // – three-letter abbreviation for month, e.g. Apr
"MM" // – two-digit month, e.g. 04
"M" // – one-digit month for months below 10, e.g. 4
"dd" // – two-digit day, e.g. 02
"d" // – one-digit day for days below 10, e.g. 2
"ss" // - two digit second
"s" // - one-digit second for seconds below 10
"mm" // - two digit minute
"m" // - one-digit minute for minutes below 10
"tt" // - AM/PM designator
"t" // - first character of AM/PM designator
"hh" // - 12 hour two-digit for hours below 10
"h" // - 12 hour one-digit for hours below 10
"HH" // - 24 hour two-digit for hours below 10
"H" // - 24 hour one-digit for hours below 10
I've noticed the standard library std::string isn't very strong on parsing and tokenising and I can't use boost. Is there a tight, idiomatic solution? I'd hate to break out a C-style algorithm for doing this. Performance isn't a consideration.

Perhaps http://www.cplusplus.com/reference/cstring/strtok/ is what you're looking for, with a useful example.
However, it eats the delimiters. You could solve that problem with comparing the base pointer and the resulting string, moving forward by the string length.
#include <iostream>
#include <cstdio>
#include <cstring>
#include <vector>
#include <sstream>
int main()
{
char data[] = "yyyy\\MMM\\dd HH:mm:ss";
std::vector<std::string> tokens;
char* pch = strtok (data,"\\:"); // pch holds 'yyyy'
while (pch != NULL)
{
tokens.push_back(pch);
int delimeterIndex = static_cast<int>(pch - data + strlen(pch)); // delimeter index: 4, 8, ...
std::stringstream ss;
ss << delimeterIndex;
tokens.push_back(ss.str());
pch = strtok (NULL,"\\:"); // pch holds 'MMM', 'dd', ...
}
for (const auto& token : tokens)
{
std::cout << token << ", ";
}
}
This gives output of:
yyyy, 4, MMM, 8, dd HH, 14, mm, 17, ss, 20,

Related

01 number input for date not giving output on system

I can't enter 01 into my c++ input, it will return with empty result but when i type other date such as 12 and 11 it does show in the system.
string month;
cin.ignore(1, '\n');
cout << "Please Enter Month For Example 01 for January:";
getline(cin, month);
string search_query = "SELECT DATE(OrderDate), SUM(TotalPrice) FROM order1 WHERE MONTH(OrderDate) like '%" + month + "%' GROUP BY DATE(OrderDate)";
const char* q = search_query.c_str();
qstate = mysql_query(conn, q);
DATABASE
This is what happen if I enter "01"
This is what happen when I enter "11"
This is when I type "12" it successfully show

The problem is that you are using the LIKE operator in MySQL, which checks to see if the pattern specified on the right occurs in the string specified on the left. The pattern "01" probably doesn't occur in the value on the left, since the string on the left should be "1" for January orders, and that doesn't have a "0" in it.
I also imagine that if you type "1" you would actually see all orders from January, October, November, and December since all those months have "1" in them.
Try using something like MONTH(OrderDate) = 1 instead.
To be more explicit: try changing the line in your program that defined search_query to this:
std::string search_query = "SELECT DATE(OrderDate), SUM(TotalPrice) FROM order1 WHERE MONTH(OrderDate) = " + month + " GROUP BY DATE(OrderDate)";
By the way, for a more robust program, you should also make sure you turn the user-supplied month number into an integer before adding it to the query string; right now you are letting users insert arbitrary strings into your query, which could be dangerous.

How to get a substring from a found string to a character in C++?

For example I have a string:
int random_int = 123; // let's pretend this integer could have various values!!
std::string str = "part1 Hello : part2 " + std::to_string(random_int) + " : part3 World ";
All parts are divided by the characters :
Let's say I want to find a substring from "part2" to the next character :, which would return part2 123 in this case.
I know how to find the pos of "part2" by str.find("part2"), but I don't know how to determine the length to the next : from that "part2", because the length can be of various length.
For example, I know that part3 substring could be extracted with str.substr(str.find("part3"));, but only because it's at the end...
So, is there a subtle way to get the substring part2 123 from that string?

Validating integer part of an string

I have a text file that I need to convert each line into integer.
The lines can begin with '#' to indicate a comment. Also, after the data it might be inline comment too...again indicated by '#'
So I have the example below:
QString time = "5000 #this is 5 seconds"; // OK
QString time = " 5000 # this is 5 seconds"; // OK..free spaceis allowed at start
QString time = "5000.00 #this is 5 seconds"; // invalid...no decimal
QString time = "s5000 # this is 5 seconds"; // invalid...does not start with numerical character
How can I take care of these cases? I mean in all the 4 example above except the last two I need to extract "5000". How to find out the last one is invalid?
So I mean what is the best fail-proof code to handle this task?

Another example using std::regex. Converting QString to a string_view is left as an exercise for the reader.
#include <regex>
#include <string_view>
#include <iostream>
#include <string>
#include <optional>
std::optional<std::string> extract_number(std::string_view input)
{
static constexpr char expression[] = R"xx(^\s*(\d+)\s*(#.*)?$)xx";
static const auto re = std::regex(expression);
auto result = std::optional<std::string>();
auto match = std::cmatch();
const auto matched = std::regex_match(input.begin(), input.end(), match, re);
if (matched)
{
result.emplace(match[1].first, match[1].second);
}
return result;
}
void emit(std::string_view candidate, std::optional<std::string> result)
{
std::cout << "offered: " << candidate << " - result : " << result.value_or("no match") << '\n';
}
int main()
{
const std::string_view candidates[] =
{
"5000 #this is 5 seconds",
" 5000 # this is 5 seconds",
"5000.00 #this is 5 seconds",
"s5000 # this is 5 seconds"
};
for(auto candidate : candidates)
{
emit(candidate, extract_number(candidate));
}
}
expected output:
offered: 5000 #this is 5 seconds - result : 5000
offered: 5000 # this is 5 seconds - result : 5000
offered: 5000.00 #this is 5 seconds - result : no match
offered: s5000 # this is 5 seconds - result : no match
https://coliru.stacked-crooked.com/a/2b0e088e6ed0576b

You can use this regex to validate and extract the digit from first grouping pattern that will capture your number,
^\s*(\d+)\b(?!\.)
Explanation:
^ - Start of string
\s* - Allows optional space before the number
(\d+) - Captures the number and places it in first grouping pattern
\b - Ensures the number does not match partially in a larger text because of the negative look ahead present ahead
(?!\.) - Rejects the match if there is a decimal following the number
Demo1
In case only last one is invalid, you can use this regex to capture number from first three entries,
^\s*(\d+)
Demo2

Decompose an ISO8601 format time stamp with regular expressions

I want to extract the minutes and seconds from an time stamp of ISO8601 format. I made some tries with regexp but I have no experience on that.
Could you please help me on this?
Examples:
PT1M46S --> 1 minute, 46 seconds
PT36S --> 36 seconds
Thanks!

getPart = #(str, c) str2double(['0' regexp(str, ['\d*(?=' c ')'], 'match', 'once')]);
str = 'PT36S';
seconds = getPart(str, 'S');
minutes = getPart(str, 'M');
hours = getPart(str, 'H');
This looks for character c, finds the digits behind it and converts them to a double. It adds character '0' in the beginning because if regexp can't find a match it returns an empty string. Adding this converts empty strings to zero while not affecting other numbers. If you want to restrict it to the parts after PT, you can remove that from original string using
str = regexprep(str, '^.*PT', '');

Use datevec to turn strings representing hours, minutes etc into corresponding numeric values. See "help datestr" to understand rules for symbols used in second input to datevec, ie. the format string. Here's how you can convert the two examples given, I leave it to you to extend it to cover the entire format.
str = 'PT36S';
str = strrep(str, 'PT', ''); % PT have to go.
if ~ismember('M', str)
% To use a single format string, we must write zero minutes if none are there already
str = ['00M', str];
end
% Date string cannot contain characters y,m,d,H,M or S, so remove these
str = strrep(str, 'S', ' ');
str = strrep(str, 'M', ' ');
% Call datevec with appropriate format string
[~, ~, ~, ~, min, sec] = datevec(str, 'MM SS')
You can extend this to manage hours, days etc by including additional if loops similar to that above. I am not familiar with this standard beyond the examples given so let me know if it's not as simple as that.

How to achieve this specific datetime format using boost?

I want to format a datetime like this:
YYYYMMDD_HHMMSS
eg, 4 digit year, followed by 2 digit months, followed by 2 digit day, underscore, 24-hour hour, 2 digit minutes, 2 digit seconds.
e.g.: 16th of February 2011, 8:05 am and 2 seconds would be:
20110216_080502
What format string should I use in the following code to achieve this? (And, if necessary, what code changes are needed):
//...#includes, namespace usings...
ptime now = second_clock::universal_time();
wstringstream ss;
time_facet *facet = new time_facet("???"); //what goes here?
ss.imbue(locale(cout.getloc(), facet));
ss << now;
wstring datetimestring = ss.str();
Here are some strings I've tried so far:
%Y%m%d_%H%M%S : "2011-Feb-16 16:51:16"
%Y%m%d : "2011-Feb-16"
%H%M%S : "16:51:16"
Here's another one:
%Y : "2011-Feb-16 16:51:16" huh??

I believe you need to use wtime_facet, not time_facet. See the working program I posted on your other question.

From date/time facet format flags:
"%Y%m%d_%H%M%S"

time_facet *facet = new time_facet("%Y%m%d_%H%M%S");

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

String tokenisation, split by token not separator - c++

Related

01 number input for date not giving output on system

How to get a substring from a found string to a character in C++?

Validating integer part of an string

Decompose an ISO8601 format time stamp with regular expressions

How to achieve this specific datetime format using boost?

Categories

Resources