I feel like this is a pretty basic question but I did not find a post for it. If you know one please link it below.
So what I'm trying to do is look through a string and extract the numbers in groups of 2.
here is my code:
int main() {
string line = "P112233";
boost::regex e ("P([0-9]{2}[0-9]{2}[0-9]{2})");
boost::smatch match;
if (boost::regex_search(line, match, e))
{
boost::regex f("([0-9]{2})"); //finds 11
boost::smatch match2;
line = match[0];
if (boost::regex_search(line, match2, f))
{
float number1 = boost::lexical_cast<float>(match2[0]);
cout << number1 << endl; // this works and prints out 11.
}
boost::regex g(" "); // here I want it to find the 22
boost::smatch match3;
if (boost::regex_search(line, match3, g))
{
float number2 = boost::lexical_cast<float>(match3[0]);
cout << number2 << endl;
}
boost::regex h(" "); // here I want it to find the 33
boost::smatch match4;
if (boost::regex_search(line, match4, h))
{
float number3 = boost::lexical_cast<float>(match4[0]);
cout << number3 << endl;
}
}
else
cout << "found nothing"<< endl;
return 0;
}
I was able to get the first number but I have no idea how to get the second(22) and third(33).
what's the proper expression I need to use?
As #Cornstalks mentioned you need to use 3 capture groups and then you access them like that:
int main()
{
std::string line = "P112233";
boost::regex e("P([0-9]{2})([0-9]{2})([0-9]{2})");
boost::smatch match;
if (boost::regex_search(line, match, e))
{
std::cout << match[0] << std::endl; // prints the whole string
std::cout << match[1] << ", " << match[2] << ", " << match[3] << std::endl;
}
return 0;
}
Output:
P112233
11, 22, 33
I don't favour regular expressions for this kind of parsing. The key point being that the numbers are still strings when you're done with that hairy regex episode.
I'd use Boost Spirit here instead, which parses into the numbers all at once, and you don't even have to link to the Boost Regex library either, because Spirit is header-only.
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
static qi::int_parser<int, 10, 2, 2> two_digits;
int main() {
std::string const s = "P112233";
std::vector<int> nums;
if (qi::parse(s.begin(), s.end(), "P" >> *two_digits, nums))
{
std::cout << "Parsed " << nums.size() << " pairs of digits:\n";
for(auto i : nums)
std::cout << " * " << i << "\n";
}
}
Parsed 3 pairs of digits:
* 11
* 22
* 33
Related
I need to convert letters into a dictionary of characters.
Here's an example:
letter
l: 1
e: 2
t: 2
r: 1
I did some research and found this helpful answer, but that was using getline() and separating words by spaces. Since I am trying to split by character I don't think I can use getline() since '' isn't a valid split character. I could convert to a char* array but I wasn't sure where that would get me.
This is fairly easy in other languages so I thought it wouldn't be too bad in C++. I was hoping there would be something like a my_map[key]++ or something. In Go I would write this as
// Word map of string: int values
var wordMap = make(map[string]int)
// For each letter, add to that key
for i := 0; i < len(word); i++ {
wordMap[string(word[i])]++
}
// In the end you have a map of each letter.
How could I apply this in C++?
How could I apply this in C++?
It could look rather similar to your Go code.
// Word map of char: int values
// (strings would be overkill, since you know they are a single character)
auto wordMap = std::map<char,int>{};
// For each letter, add to that key
for ( char c : word )
wordMap[c]++;
}
Here is the unicode version of Drew Dormann's answer:
#include <locale>
#include <codecvt>
std::string word = "some unicode: こんにちは世界";
std::map<char32_t, uint> wordMap;
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
for (auto c : converter.from_bytes(word)) {
wordMap[c]++;
}
for (const auto [c, v] : wordMap) {
std::cout << converter.to_bytes(c) << " : " << v << std::endl;
}
I wrote an article about this which can be checked out here. Below i have given 2 versions of the program. Version 1 keeps track of the character count in alphabetical order. But sometimes(in case) you want the character count in insertion order for which you can use Version 2.
Version 1: Get character count in ͟a͟l͟p͟h͟a͟b͟e͟t͟i͟c͟a͟l͟ ͟o͟r͟d͟e͟r͟
#include <iostream> //needed for std::cout, std::cin
#include <map> //needed for std::map
#include <iomanip> //needed for formating the output (std::setw)
int main()
{
std::string inputString; //user input will be read into this string variable
std::cout << "Enter a string: " << std::endl;
std::getline(std::cin, inputString);
//this map maps the char to their respective count
std::map < char, int > charCount;
//iterate through the inputString
for (char & c: inputString)
{
charCount[c]++;//increment the count for character c
}
std::cout << "Total unique characters are: " << charCount.size() << std::endl;
std::cout << "------------------------------------" << std::endl;
std::cout << "Character" << std::setw(10) << "Count" << std::endl;
std::cout << "------------------------------------" << std::endl;
for (std::pair < char, int > pairElement: charCount)
{
std::cout << std::setw(4) << pairElement.first << std::setw(13) << pairElement.second << std::endl;
}
return 0;
}
Version 2: Get character count in i͟n͟s͟e͟r͟t͟i͟o͟n͟ ͟o͟r͟d͟e͟r͟
#include <iostream>
#include <map>
#include <iomanip>
int main()
{
std::string inputString;
std::cout << "Enter a string: " << std::endl;
std::getline(std::cin, inputString);
std::map < char, int > charCount;
for (char & c: inputString)
{
charCount[c]++;
}
std::cout << "Total unique characters are: " << charCount.size() << std::endl;
std::cout << "------------------------------------" << std::endl;
std::cout << "Character" << std::setw(10) << "Count" << std::endl;
std::cout << "------------------------------------" << std::endl;
std::size_t i = 0;
//just go through the inputString instead of map
for(char &c: inputString)
{
std::size_t index = inputString.find(c);
if(index != inputString.npos && (index == i)){
std::cout << std::setw(4) << c << std::setw(13) << charCount.at(c)<<std::endl;
}
++i;
}
return 0;
}
how to extract digit number value?
std::regex legit_command("^\\([A-Z]+[0-9]+\\-[A-Z]+[0-9]+\\)$");
std::string input;
let say the user key in
(AA11-BB22)
i want get the
first_character = "aa"
first_number = 11
secondt_character = "bb"
second_number = 22
You could use capture groups. In the example below I replaced (AA11+BB22) with (AA11-BB22) to match the regex you posted. Note that regex_match only succeeds if the entire string matches the pattern so the beginning/end of line assertions (^ and $) are not required.
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
const string input = "(AA11-BB22)";
const regex legit_command("\\(([A-Z]+)([0-9]+)-([A-Z]+)([0-9]+)\\)");
smatch matches;
if(regex_match(input, matches, legit_command)) {
cout << "first_character " << matches[1] << endl;
cout << "first_number " << matches[2] << endl;
cout << "second_character " << matches[3] << endl;
cout << "second_number " << matches[4] << endl;
}
}
Output:
$ c++ main.cpp && ./a.out
first_character AA
first_number 11
second_character BB
second_number 22
I am trying to parse a file of locations with the following code however I am getting an odd regex_error and when I call the .what() function it simple gives "regex_error" with code 5, I can't seem to find the problem.
Code:
std::string line;
std::ifstream loc_file(argv[1]);
std::regex line_regex(R"(\S+)\s+([0-9\.]+) ([NS])\s+([0-9\.]+) ([EW])");
while (std::getline(loc_file, line)) {
std::smatch m;
std::regex_search(line, m, line_regex);
std::cout << "Location Matches:" << m.length() << std::endl;
std::cout << "Loc:" << m[1];
std::cout << " Lat:" << (m[3] == "S") ? -std::stod(m[2]) : std::stod(m[2]);
std::cout << " Lon:" << (m[5] == "W") ? -std::stod(m[4]) : std::stod(m[4]) << endl;
}
File Format:
Loc1 0.67408 N 23.47297 E
Loc2 3.01239 S 23.42157 W
OtherPlace 3.64530 S 17.47136 W
SecondPlace 26.13222 N 3.63386 E
I developed my regex on regex101.com you can test out my regex there
Also if it matters I am using VS2015
As it turns out it has to do with the fact that I am using an unescaped String Literal, which requires parentheses. The fixed code is here:
std::string line;
std::ifstream loc_file(argv[1]);
std::regex line_regex(R"((\S+)\s+([0-9\.]+) ([NS])\s+([0-9\.]+) ([EW]))");
while (std::getline(loc_file, line)) {
std::smatch m;
std::regex_search(line, m, line_regex);
std::cout << "Location Matches:" << m.length() << std::endl;
std::cout << "Loc:" << m[1];
std::cout << " Lat:" << (m[3] == "S") ? -std::stod(m[2]) : std::stod(m[2]);
std::cout << " Lon:" << (m[5] == "W") ? -std::stod(m[4]) : std::stod(m[4]) << endl;
}
I am trying to extract values from myString1 using std::stringstream like shown below:
// Example program
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
string myString1 = "+50years";
string myString2 = "+50years-4months+3weeks+5minutes";
stringstream ss (myString1);
char mathOperator;
int value;
string timeUnit;
ss >> mathOperator >> value >> timeUnit;
cout << "mathOperator: " << mathOperator << endl;
cout << "value: " << value << endl;
cout << "timeUnit: " << timeUnit << endl;
}
Output:
mathOperator: +
value: 50
timeUnit: years
In the output you can see me successfully extract the values I need, the math operator, the value and the time unit.
Is there a way to do the same with myString2? Perhaps in a loop? I can extract the math operator, the value, but the time unit simply extracts everything else, and I cannot think of a way to get around that. Much appreciated.
The problem is that timeUnit is a string, so >> will extract anything until the first space, which you haven't in your string.
Alternatives:
you could extract parts using getline(), which extracts strings until it finds a separator. Unfortunately, you don't have one potential separator, but 2 (+ and -).
you could opt for using regex directly on the string
you could finally split the strings using find_first_of() and substr().
As an illustration, here the example with regex:
regex rg("([\\+-][0-9]+[A-Za-z]+)", regex::extended);
smatch sm;
while (regex_search(myString2, sm, rg)) {
cout <<"Found:"<<sm[0]<<endl;
myString2 = sm.suffix().str();
//... process sstring sm[0]
}
Here a live demo applying your code to extract ALL the elements.
You could something more robust like <regex> like in the example below:
#include <iostream>
#include <regex>
#include <string>
int main () {
std::regex e ("(\\+|\\-)((\\d)+)(years|months|weeks|minutes|seconds)");
std::string str("+50years-4months+3weeks+5minutes");
std::sregex_iterator next(str.begin(), str.end(), e);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << "Expression: " << match.str() << "\n";
std::cout << " mathOperator : " << match[1] << std::endl;
std::cout << " value : " << match[2] << std::endl;
std::cout << " timeUnit : " << match[4] << std::endl;
++next;
}
}
Output:
Expression: +50years
mathOperator : +
value : 50
timeUnit : years
Expression: -4months
mathOperator : -
value : 4
timeUnit : months
Expression: +3weeks
mathOperator : +
value : 3
timeUnit : weeks
Expression: +5minutes
mathOperator : +
value : 5
timeUnit : minutes
LIVE DEMO
I'd use getline for the timeUnit, but since getline can take only one delimiter, I'd search the string separately for mathOperator and use that:
string myString2 = "+50years-4months+3weeks+5minutes";
stringstream ss (myString2);
size_t pos=0;
ss >> mathOperator;
do
{
cout << "mathOperator: " << mathOperator << endl;
ss >> value;
cout << "value: " << value << endl;
pos = myString2.find_first_of("+-", pos+1);
mathOperator = myString2[pos];
getline(ss, timeUnit, mathOperator);
cout << "timeUnit: " << timeUnit << endl;
}
while(pos!=string::npos);
Hi i wish to get the values of the following expression :
POLYGON(100 20, 30 40, 20 10, 21 21)
Searching POLYGON(100 20, 30 40, 20 10, 21 21)
When i execute the following code i obtains this result :
POLYGON(100 20, 30 40, 20 10, 21 21)
result = 100 20
r2 = 100
r2 = 20
r2 = , 21 21
r2 = 21
size = 7
I don't know why i not obtains the middled values...
Thank for your help
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
void testMatch(const boost::regex &ex, const string st) {
cout << "Matching " << st << endl;
if (boost::regex_match(st, ex)) {
cout << " matches" << endl;
}
else {
cout << " doesn’t match" << endl;
}
}
void testSearch(const boost::regex &ex, const string st) {
cout << "Searching " << st << endl;
string::const_iterator start, end;
start = st.begin();
end = st.end();
boost::match_results<std::string::const_iterator> what;
boost::match_flag_type flags = boost::match_default;
while(boost::regex_search(start, end, what, ex, flags))
{
cout << " " << what.str() << endl;
cout << " result = " << what[1] << endl;
cout << " r2 = " << what[2] << endl;
cout << " r2 = " << what[3] << endl;
cout << " r2 = " << what[4] << endl;
cout << " r2 = " << what[5] << endl;
cout << "size = " << what.size() << endl;
start = what[0].second;
}
}
int main(int argc, char *argv[])
{
static const boost::regex ex("POLYGON\\(((\\-?\\d+) (\\-?\\d+))(\\, (\\-?\\d+) (\\-?\\d+))*\\)");
testSearch(ex, "POLYGON(1 2)");
testSearch(ex, "POLYGON(-1 2, 3 4)");
testSearch(ex, "POLYGON(100 20, 30 40, 20 10, 21 21)");
return 0;
}
I am not a regex expert, but I read your regular expression and it seems to be correct.
This forum post appears to be talking about exactly the same thing, where Boost.Regex only returns the last result of a regular expression. Apparently by default Boost only keeps track of the last match of a repetition of matches. However, there is an experimental feature that allows you to change this. More info here, under "Repeated Captures".
There are 2 other "solutions" though:
Use a regex to track the first pair of numbers, then get the substring with that pair removed and do another regex on that substring, until you've got all input.
Use Boost.Spirit, it's probably more suited for parsing input than Boost.Regex.
I have got the result from IRC channel.
The regular expression is :
static const boost::regex ex("[\\d\\s]+");
static const boost::regex ex("[\\-\\d\\s]+");