Validating integer part of an string - c++

I have a text file that I need to convert each line into integer.
The lines can begin with '#' to indicate a comment. Also, after the data it might be inline comment too...again indicated by '#'
So I have the example below:
QString time = "5000 #this is 5 seconds"; // OK
QString time = " 5000 # this is 5 seconds"; // OK..free spaceis allowed at start
QString time = "5000.00 #this is 5 seconds"; // invalid...no decimal
QString time = "s5000 # this is 5 seconds"; // invalid...does not start with numerical character
How can I take care of these cases? I mean in all the 4 example above except the last two I need to extract "5000". How to find out the last one is invalid?
So I mean what is the best fail-proof code to handle this task?

Another example using std::regex. Converting QString to a string_view is left as an exercise for the reader.
#include <regex>
#include <string_view>
#include <iostream>
#include <string>
#include <optional>
std::optional<std::string> extract_number(std::string_view input)
{
static constexpr char expression[] = R"xx(^\s*(\d+)\s*(#.*)?$)xx";
static const auto re = std::regex(expression);
auto result = std::optional<std::string>();
auto match = std::cmatch();
const auto matched = std::regex_match(input.begin(), input.end(), match, re);
if (matched)
{
result.emplace(match[1].first, match[1].second);
}
return result;
}
void emit(std::string_view candidate, std::optional<std::string> result)
{
std::cout << "offered: " << candidate << " - result : " << result.value_or("no match") << '\n';
}
int main()
{
const std::string_view candidates[] =
{
"5000 #this is 5 seconds",
" 5000 # this is 5 seconds",
"5000.00 #this is 5 seconds",
"s5000 # this is 5 seconds"
};
for(auto candidate : candidates)
{
emit(candidate, extract_number(candidate));
}
}
expected output:
offered: 5000 #this is 5 seconds - result : 5000
offered: 5000 # this is 5 seconds - result : 5000
offered: 5000.00 #this is 5 seconds - result : no match
offered: s5000 # this is 5 seconds - result : no match
https://coliru.stacked-crooked.com/a/2b0e088e6ed0576b

You can use this regex to validate and extract the digit from first grouping pattern that will capture your number,
^\s*(\d+)\b(?!\.)
Explanation:
^ - Start of string
\s* - Allows optional space before the number
(\d+) - Captures the number and places it in first grouping pattern
\b - Ensures the number does not match partially in a larger text because of the negative look ahead present ahead
(?!\.) - Rejects the match if there is a decimal following the number
Demo1
In case only last one is invalid, you can use this regex to capture number from first three entries,
^\s*(\d+)
Demo2

Related

C++11 Regex search - Exclude empty submatches

From the following text I want to extract the number and the unit of measurement.
I have 2 possible cases:
This is some text 14.56 kg and some other text
or
This is some text kg 14.56 and some other text
I used | to match the both cases.
My problem is that it produces empty submatches, and thus giving me an incorrect number of matches.
This is my code:
std::smatch m;
std::string myString = "This is some text kg 14.56 and some other text";
const std::regex myRegex(
R"(([\d]{0,4}[\.,]*[\d]{1,6})\s+(kilograms?|kg|kilos?)|s+(kilograms?|kg|kilos?)(\s+[\d]{0,4}[\.,]*[\d]{1,6}))",
std::regex_constants::icase
);
if( std::regex_search(myString, m, myRegex) ){
std::cout << "Size: " << m.size() << endl;
for(int i=0; i<m.size(); i++)
std::cout << m[i].str() << std::endl;
}
else
std::cout << "Not found!\n";
OUTPUT:
Size: 5
kg 14.56
kg
14.56
I want an easy way to extract those 2 values, so my guess is that I want the following output:
WANTED OUTPUT:
Size: 3
kg 14.56
kg
14.56
This way I can always directly extract 2nd and 3th, but in this case I would also need to check which one is the number. I know how to do it with 2 separate searches, but I want to do it the right way, with a single search without using c++ to check if a submatch is an empty string.
Using this regex, you just need the contents of Group 1 and Group 2
((?:kilograms?|kilos?|kg)|(?:\d{0,4}(?:\.\d{1,6})))\s*((?:kilograms?|kilos?|kg)|(?:\d{0,4}(?:\.\d{1,6})))
Click for Demo
Explanation:
((?:kilograms?|kilos?|kg)|(?:\d{0,4}(?:\.\d{1,6})))
(?:kilograms?|kilos?|kg) - matches kilograms or kilogram or kilos or kilo or kg
| - OR
(?:\d{0,4}(?:\.\d{1,6})) - matches 0 to 4 digits followed by 1 to 6 digits of decimal part
\s* - matches 0+ whitespaces
You can try this out:
((?:(?<!\d)(\d{1,4}(?:[\.,]\d{1,6})?)\s+((?:kilogram|kilos|kg)))|(?:((?:kilogram|kilos|kg))\s+(\d{1,4}(?:[\.,]\d{1,6})?)))
As shown here: https://regex101.com/r/9O99Fz/3
USAGE -
As I've shown in the 'substitution' section, to reference the numeral part of the quantity, you have to write $2$5, and for the unit, write: $3$4
Explanation -
There are two capturing groups we could possibly need: the first one here (?:(?<!\d)(\d{1,4}(?:[\.,]\d{1,6})?)\s+((?:kilogram|kilos|kg))) is to match the number followed by the unit,
and the other (?:((?:kilogram|kilos|kg))\s+(\d{1,4}(?:[\.,]\d{1,6})?)) to match the unit followed by the number

c++11 (MSVS2012) regex looking for file names in multiple line std::string

I have been trying to search for a clear answer on this one, but not been able to find it.
So lets say I have the string (where \n could be \r\n - I want to handle both - not sure if that is relevant or not)
"4345t435\ng54t a_file_123.xml rk\ngreg a_file_j34.xml fger 43t54"
Then I want to get matches:
a_file_123.xml
a_file_j34.xml
Here is my test code:
const str::string s = "4345t435\ng54t a_file_123.xml rk\ngreg a_file_j34.xml fger 43t54";
std::smatch matches;
if (std::regex_search(s, matches, std::regex("a_file_(.*)\\.xml")))
{
std::cout << "total: " << matches.size() << std::endl;
for (unsigned int i = 0; i < matches.size(); i++)
{
std::cout << "match: " << matches[i] << std::endl;
}
}
Output is:
total: 2
match: a_file_123.xml
match: 123
I don't quite understand why match 2 is just "123"...
You only have one match, not two, as the regex_search method returns a single match. What you printed is two group values, Group 0 (the whole match, a_file_123.xml here) and Group 1 (the capturing group value, here, 123 that is a substring captured with a capturing group you defined as (.*) in the pattern).
If you want to match multiple strings, you need to use the regex iterator, not just a regex_search that only returns the first match.
Besides, .* is too greedy and will return weird results if you have more than 1 match on the same line. It seems you want to match letter or digits, so .* can be replaced with \w+. Well, if there can really be anything, just use .*?.
Use
const std::string s = "4345t435\ng54t a_file_123.xml rk\ngreg a_file_j34.xml fger 43t54";
const std::regex rx("a_file_\\w+\\.xml");
std::vector<std::string> results(std::sregex_token_iterator(s.begin(), s.end(), rx),
std::sregex_token_iterator());
std::cout << "Number of matches: " << results.size() << std::endl;
for (auto result : results)
{
std::cout << result << std::endl;
}
See the C++ demo yielding
Number of matches: 2
a_file_123.xml
a_file_j34.xml
Notes on regex
a_file_ - a literal substring
\\w+ - 1+ word chars (letters, digits, _) (note you may use [^.]*? here instead of \\w+ if you want to match any char, 0 or more repetitions, as few as possible, up to the first .xml)
\\. - a dot (if you do not escape it, it will match any char except line break chars)
xml - a literal substring.
See the regex demo

C++ regex finds only 1 sub match [duplicate]

This question already has answers here:
How to match multiple results using std::regex
(6 answers)
Closed 5 years ago.
// Example program
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string strr("1.0.0.0029.443");
std::regex rgx("([0-9])");
std::smatch match;
if (std::regex_search(strr, match, rgx)) {
for(int i=0;i<match.size();i++)
std::cout << match[i] << std::endl;
}
}
this program should write
1
0
0
0
0
2
9
4
4
3
but it writes
1
1
checked it here http://cpp.sh/ and on visual studio, both same results.
Why does it find only 2 matches and why are they same?
As I understand from answers here, regex search stops at first match and match variable holds the necessary (sub?)string value to continue(by repeating) for other matches. Also since it stops at first match, () charachters are used only for sub-matches within the result.
Being called once, regex_search returns only the first match in the match variable. The collection in match comprises the match itself and capture groups if there are any.
In order to get all matches call regex_search in a loop:
while(regex_search(strr, match, rgx))
{
std::cout << match[0] << std::endl;
strr = match.suffix();
}
Note that in your case the first capture group is the same as the whole match so there is no need in the group and you may define the regex simply as [0-9] (without parentheses.)
Demo: https://ideone.com/pQ6IsO
Problems:
Using if only gives you one match. You need to use a while loop to find all the matches. You need to search past the previous match in the next iteration of the loop.
std::smatch::size() returns 1 + number of matches. See its documentation. std::smatch can contain sub-matches. To get the entire text, use match[0].
Here's an updated version of your program:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string strr("1.0.0.0029.443");
std::regex rgx("([0-9])");
std::smatch match;
while (std::regex_search(strr, match, rgx)) {
std::cout << match[0] << std::endl;
strr = match.suffix();
}
}

C++11 regex matching capturing group multiple times

Could someone please help me to extract the text between the : and the ^ symbols using a JavaScript (ECMAScript) regular expression in C++11. I do not need to capture the hw-descriptor itself - but it does have to be present in the line in order for the rest of the line to be considered for a match. Also the :p....^, :m....^ and :u....^ can arrive in any order and there has to be at least 1 present.
I tried using the following regular expression:
static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
against the following text line:
"hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^"
Here is the code which posted on a live coliru. It shows how I attempted to solve this problem, however I am only getting 1 match. I need to see how to extract each of the potential 3 matches corresponding to the p m or u characters described earlier.
#include <iostream>
#include <string>
#include <vector>
#include <regex>
int main()
{
static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^";
// I seem to only get 1 match here, I was expecting
// to loop through each of the matches, looks like I need something like
// a pcre global option but I don't know how.
std::for_each(std::sregex_iterator(foo.cbegin(), foo.cend(), gRegex), std::sregex_iterator(),
[&](const auto& rMatch) {
for (int i=0; i< static_cast<int>(rMatch.size()); ++i) {
std::cout << rMatch[i] << std::endl;
}
});
}
The above program gives the following output:
g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
:uTEXT3^
TEXT3
With std::regex, you cannot keep mutliple repeated captures when matching a certain string with consecutive repeated patterns.
What you may do is to match the overall texts containing the prefix and the repeated chunks, capture the latter into a separate group, and then use a second smaller regex to grab all the occurrences of the substrings you want separately.
The first regex here may be
hw-descriptor((?::[pmu][^^]*\\^)+)
See the online demo. It will match hw-descriptor and ((?::[pmu][^^]*\\^)+) will capture into Group 1 one or more repetitions of :[pmu][^^]*\^ pattern: :, p/m/u, 0 or more chars other than ^ and then ^. Upon finding a match, use :[pmu][^^]*\^ regex to return all the real "matches".
C++ demo:
static const std::regex gRegex("hw-descriptor((?::[pmu][^^]*\\^)+)", std::regex::icase);
static const std::regex lRegex(":[pmu][^^]*\\^", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^ hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^";
std::smatch smtch;
for(std::sregex_iterator i = std::sregex_iterator(foo.begin(), foo.end(), gRegex);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << "Match value: " << m.str() << std::endl;
std::string x = m.str(1);
for(std::sregex_iterator j = std::sregex_iterator(x.begin(), x.end(), lRegex);
j != std::sregex_iterator();
++j)
{
std::cout << "Element value: " << (*j).str() << std::endl;
}
}
Output:
Match value: hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
Element value: :pTEXT1^
Element value: :mTEXT2^
Element value: :uTEXT3^
Match value: hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^
Element value: :pTEXT8^
Element value: :mTEXT8^
Element value: :uTEXT83^

Get String Between 2 Strings

How can I get a string that is between two other declared strings, for example:
String 1 = "[STRING1]"
String 2 = "[STRING2]"
Source:
"832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
How can I get the "I need this text here"?
Since this is homework, only clues:
Find index1 of occurrence of String1
Find index2 of occurrence of String2
Substring from index1+lengthOf(String1) (inclusive) to index2 (exclusive) is what you need
Copy this to a result buffer if necessary (don't forget to null-terminate)
Might be a good case for std::regex, which is part of C++11.
#include <iostream>
#include <string>
#include <regex>
int main()
{
using namespace std::string_literals;
auto start = "\\[STRING1\\]"s;
auto end = "\\[STRING2\\]"s;
std::regex base_regex(start + "(.*)" + end);
auto example = "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"s;
std::smatch base_match;
std::string matched;
if (std::regex_search(example, base_match, base_regex)) {
// The first sub_match is the whole string; the next
// sub_match is the first parenthesized expression.
if (base_match.size() == 2) {
matched = base_match[1].str();
}
}
std::cout << "example: \""<<example << "\"\n";
std::cout << "matched: \""<<matched << "\"\n";
}
Prints:
example: "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
matched: "I need this text here"
What I did was create a program that creates two strings, start and end that serve as my start and end matches. I then use a regular expression string that will look for those, and match against anything in-between (including nothing). Then I use regex_match to find the matching part of the expression, and set matched as the matched string.
For more info, see http://en.cppreference.com/w/cpp/regex and http://en.cppreference.com/w/cpp/regex/regex_search
Use strstr http://www.cplusplus.com/reference/clibrary/cstring/strstr/ , with that function you will get 2 pointers, now you should compare them (if pointer1 < pointer2) if so, read all chars between them.