im new to Regex and C++.
My problem is, that '=' is matching when I search for [a-zA-Z]. But this is only a-z without '='?
Can anyone help me please?
string string1 = "s=s;";
enum states state = s1;
regex statement("[a-zA-Z]+[=][a-zA-Z0-9]+[;]");
regex rg_left_letter("[a-zA-Z]");
regex rg_equal("[=]");
regex rg_right_letter("[a-zA-Z0-9]");
regex rg_semicolon("[;]");
for (const auto &s : string1) {
cout << "Current Value: " << s << endl;
// step(&state, s);
if (regex_search(&s, rg_left_letter)) {
cout << "matching: " << s << endl;
} else {
cout << "not matching: " << s << endl;
}
// cout << "Step Executed with sate: " << state << endl;
}
This outputs:
Current Value: s
matching: s
Current Value: =
matching: =
Current Value: s
matching: s
Current Value: ;
not matching: ;
When you write
regex_search(&s, rg_left_letter)
you basically search the C-String &s for a match character-wise, beginning at the character s. Therefore, your loop will search for a match in the remaining sub-strings
s=s;
=s;
s;
;
Which will always succeed, except in the last case, as there is always one character in the entire string that fits your regex. Note however that this assumes that std::string has some 0-termination added, which is, as far as I can tell, not guaranteed if you do not explicitely use the c_str() method, making your code UB.
What you really want to use is the function regex_match, together with your original regex just as simple as:
#include <iostream>
#include <regex>
int main()
{
std::regex statement("[a-zA-Z]+[=][a-zA-Z0-9]+[;]");
if(std::regex_match("s=s;", statement)) { std::cout << "Hooray!\n"; }
}
This is working for me:
int main(void) {
string string1 = "s=s;";
enum states state = s1;
regex statement("[a-zA-Z]+[=][a-zA-Z0-9]+[;]");
regex rg_left_letter("[a-zA-Z]");
regex rg_equal("[=]");
regex rg_right_letter("[a-zA-Z0-9]");
regex rg_semicolon("[;]");
//for (const auto &s : string1) {
for (int i = 0; i < string1.size(); i++) {
cout << "Current Value: " << string1[i] << endl;
// step(&state, s);
if (regex_match(string1.substr(i, 1), rg_left_letter)) {
cout << "matching: " << string1[i] << endl;
} else {
cout << "not matching: " << string1[i] << endl;
}
// cout << "Step Executed with sate: " << state << endl;
}
cout << endl;
return 0;
}
Related
I need to convert letters into a dictionary of characters.
Here's an example:
letter
l: 1
e: 2
t: 2
r: 1
I did some research and found this helpful answer, but that was using getline() and separating words by spaces. Since I am trying to split by character I don't think I can use getline() since '' isn't a valid split character. I could convert to a char* array but I wasn't sure where that would get me.
This is fairly easy in other languages so I thought it wouldn't be too bad in C++. I was hoping there would be something like a my_map[key]++ or something. In Go I would write this as
// Word map of string: int values
var wordMap = make(map[string]int)
// For each letter, add to that key
for i := 0; i < len(word); i++ {
wordMap[string(word[i])]++
}
// In the end you have a map of each letter.
How could I apply this in C++?
How could I apply this in C++?
It could look rather similar to your Go code.
// Word map of char: int values
// (strings would be overkill, since you know they are a single character)
auto wordMap = std::map<char,int>{};
// For each letter, add to that key
for ( char c : word )
wordMap[c]++;
}
Here is the unicode version of Drew Dormann's answer:
#include <locale>
#include <codecvt>
std::string word = "some unicode: こんにちは世界";
std::map<char32_t, uint> wordMap;
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
for (auto c : converter.from_bytes(word)) {
wordMap[c]++;
}
for (const auto [c, v] : wordMap) {
std::cout << converter.to_bytes(c) << " : " << v << std::endl;
}
I wrote an article about this which can be checked out here. Below i have given 2 versions of the program. Version 1 keeps track of the character count in alphabetical order. But sometimes(in case) you want the character count in insertion order for which you can use Version 2.
Version 1: Get character count in ͟a͟l͟p͟h͟a͟b͟e͟t͟i͟c͟a͟l͟ ͟o͟r͟d͟e͟r͟
#include <iostream> //needed for std::cout, std::cin
#include <map> //needed for std::map
#include <iomanip> //needed for formating the output (std::setw)
int main()
{
std::string inputString; //user input will be read into this string variable
std::cout << "Enter a string: " << std::endl;
std::getline(std::cin, inputString);
//this map maps the char to their respective count
std::map < char, int > charCount;
//iterate through the inputString
for (char & c: inputString)
{
charCount[c]++;//increment the count for character c
}
std::cout << "Total unique characters are: " << charCount.size() << std::endl;
std::cout << "------------------------------------" << std::endl;
std::cout << "Character" << std::setw(10) << "Count" << std::endl;
std::cout << "------------------------------------" << std::endl;
for (std::pair < char, int > pairElement: charCount)
{
std::cout << std::setw(4) << pairElement.first << std::setw(13) << pairElement.second << std::endl;
}
return 0;
}
Version 2: Get character count in i͟n͟s͟e͟r͟t͟i͟o͟n͟ ͟o͟r͟d͟e͟r͟
#include <iostream>
#include <map>
#include <iomanip>
int main()
{
std::string inputString;
std::cout << "Enter a string: " << std::endl;
std::getline(std::cin, inputString);
std::map < char, int > charCount;
for (char & c: inputString)
{
charCount[c]++;
}
std::cout << "Total unique characters are: " << charCount.size() << std::endl;
std::cout << "------------------------------------" << std::endl;
std::cout << "Character" << std::setw(10) << "Count" << std::endl;
std::cout << "------------------------------------" << std::endl;
std::size_t i = 0;
//just go through the inputString instead of map
for(char &c: inputString)
{
std::size_t index = inputString.find(c);
if(index != inputString.npos && (index == i)){
std::cout << std::setw(4) << c << std::setw(13) << charCount.at(c)<<std::endl;
}
++i;
}
return 0;
}
I'm trying to search with regex to find words like "long tonne" by ignoring any white spaces between "long" and "tonne", which I thought would be best with regex. with the code below, it will print out every converted output but I'm only trying to print one result. ex. if I enter 12 kg, I would only want my result to print the converted to lbs string.
So far, Ive tried :
removing the semi colon after the regex
adding { } brackets before if and after regex
running the code section by section by copying and pasting in another project (which i feel its the regex that I'm having issues with)
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main()
{
cout << "Enter your mass with unit: ";
double earthMass;
string unit;
string tolower(unit);
cin >> earthMass >> unit;
regex kg("kg|kgs");
if (regex_match("kg", kg))
{
double dKgToLb;
dKgToLb = (earthMass * 2.20462);
cout << "your converted mass is : " << dKgToLb << " lb" << endl;
}
regex pound("lb|lbs");
if (regex_match("lb", pound))
{
double dLbToKg;
dLbToKg = (earthMass * 0.453592);
cout << "your converted mass is : " << dLbToKg << " kg" << endl;
}
regex longTonne("long\\s*tonne|lg\\s*tn");
if (regex_match("long tonne", longTonne))
{
double dLongToShort;
dLongToShort = (earthMass * 1.12);
cout << "your converted mass is : " << dLongToShort << " sh tn" << endl;
}
regex shortTonne("short\\s*tonne|sh\\s*tn");
if (regex_match("short tonne", shortTonne))
{
double dShortToLong;
dShortToLong = (earthMass * 0.892857);
cout << "your converted mass is : " << dShortToLong << " lg tn" << endl;}
}
else if (!cin.good())
{
cerr << "your input is invalid\n";
return EXIT_FAILURE;
}
}
So in this program I'm trying to go through word by word and make it only lowercase letters, no whitespace or anything else. However, my string "temp" isn't holding anything in it. Is it because of the way I'm trying to modify it? Maybe I should try using a char * instead? Sorry if this is a stupid question, I'm brand new to c++, but I've been trying to debug it for hours and can't find much searching for this.
#include <string>
#include <iostream>
#include <fstream>
#include <ctype.h>
using namespace std;
int main(int argc, char* argv[]) {
/*if (argc != 3) {
cout << "Error: wrong number of arguments." << endl;
}*/
ifstream infile(argv[1]);
//infile.open(argv[1]);
string content((std::istreambuf_iterator<char>(infile)),
(std::istreambuf_iterator<char>()));
string final;
string temp;
string distinct[5000];
int distinctnum[5000] = { 0 };
int numdist = 0;
int wordcount = 0;
int i = 0;
int j = 0;
int k = 0;
int isdistinct = 0;
int len = content.length();
//cout << "test 1" << endl;
cout << "length of string: " << len << endl;
cout << "content entered: " << content << endl;
while (i < len) {
temp.clear();
//cout << "test 2" << endl;
if (isalpha(content[i])) {
//cout << "test 3" << endl;
if (isupper(content[i])) {
//cout << "test 4" << endl;
temp[j] = tolower(content[i]);
++j;
}
else {
//cout << "test 5" << endl;
temp[j] = content[i];
++j;
}
}
else {
cout << temp << endl;
//cout << "test 6" << endl;
++wordcount;
final = final + temp;
j = 0;
for (k = 0;k < numdist;k++) {
//cout << "test 7" << endl;
if (distinct[k] == temp) {
++distinctnum[k];
isdistinct = 1;
break;
}
}
if (isdistinct == 0) {
//cout << "test 8" << endl;
distinct[numdist] = temp;
++numdist;
}
}
//cout << temp << endl;
++i;
}
cout << wordcount+1 << " words total." << endl << numdist << " distinct words." << endl;
cout << "New output: " << final << endl;
return 0;
}
You can't add to a string with operator[]. You can only modify what's already there. Since temp is created empty and routinely cleared, using [] is undefined. The string length is zero, so any indexing is out of bounds. There may be nothing there at all. Even if the program manages to survive this abuse, the string length is likely to still be zero, and operations on the string will result in nothing happening.
In keeping with what OP currently has, I see two easy options:
Treat the string the same way you would a std::vector and push_back
temp.push_back(tolower(content[i]));
or
Build up a std::stringstream
stream << tolower(content[i])
and convert the result into a string when finished
string temp = stream.str();
Either approach eliminates the need for a j counter as strings know how long they are.
However, OP can pull and endrun around this whole problem and use std::transform
std::transform(content.begin(), content.end(), content.begin(), ::tolower);
to convert the whole string in one shot and then concentrate on splitting the lower case string with substring. The colons in front of ::tolower are there to prevent confusion with other tolowers since proper namespacing of the standard library has been switched off with using namespace std;
Off topic, it looks like OP is performing a frequency count on words. Look into std::map<string, int> distinct;. You can reduce the gathering and comparison testing to
distinct[temp]++;
I am trying to extract values from myString1 using std::stringstream like shown below:
// Example program
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
string myString1 = "+50years";
string myString2 = "+50years-4months+3weeks+5minutes";
stringstream ss (myString1);
char mathOperator;
int value;
string timeUnit;
ss >> mathOperator >> value >> timeUnit;
cout << "mathOperator: " << mathOperator << endl;
cout << "value: " << value << endl;
cout << "timeUnit: " << timeUnit << endl;
}
Output:
mathOperator: +
value: 50
timeUnit: years
In the output you can see me successfully extract the values I need, the math operator, the value and the time unit.
Is there a way to do the same with myString2? Perhaps in a loop? I can extract the math operator, the value, but the time unit simply extracts everything else, and I cannot think of a way to get around that. Much appreciated.
The problem is that timeUnit is a string, so >> will extract anything until the first space, which you haven't in your string.
Alternatives:
you could extract parts using getline(), which extracts strings until it finds a separator. Unfortunately, you don't have one potential separator, but 2 (+ and -).
you could opt for using regex directly on the string
you could finally split the strings using find_first_of() and substr().
As an illustration, here the example with regex:
regex rg("([\\+-][0-9]+[A-Za-z]+)", regex::extended);
smatch sm;
while (regex_search(myString2, sm, rg)) {
cout <<"Found:"<<sm[0]<<endl;
myString2 = sm.suffix().str();
//... process sstring sm[0]
}
Here a live demo applying your code to extract ALL the elements.
You could something more robust like <regex> like in the example below:
#include <iostream>
#include <regex>
#include <string>
int main () {
std::regex e ("(\\+|\\-)((\\d)+)(years|months|weeks|minutes|seconds)");
std::string str("+50years-4months+3weeks+5minutes");
std::sregex_iterator next(str.begin(), str.end(), e);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << "Expression: " << match.str() << "\n";
std::cout << " mathOperator : " << match[1] << std::endl;
std::cout << " value : " << match[2] << std::endl;
std::cout << " timeUnit : " << match[4] << std::endl;
++next;
}
}
Output:
Expression: +50years
mathOperator : +
value : 50
timeUnit : years
Expression: -4months
mathOperator : -
value : 4
timeUnit : months
Expression: +3weeks
mathOperator : +
value : 3
timeUnit : weeks
Expression: +5minutes
mathOperator : +
value : 5
timeUnit : minutes
LIVE DEMO
I'd use getline for the timeUnit, but since getline can take only one delimiter, I'd search the string separately for mathOperator and use that:
string myString2 = "+50years-4months+3weeks+5minutes";
stringstream ss (myString2);
size_t pos=0;
ss >> mathOperator;
do
{
cout << "mathOperator: " << mathOperator << endl;
ss >> value;
cout << "value: " << value << endl;
pos = myString2.find_first_of("+-", pos+1);
mathOperator = myString2[pos];
getline(ss, timeUnit, mathOperator);
cout << "timeUnit: " << timeUnit << endl;
}
while(pos!=string::npos);
# include <iostream>
# include <ctime>
using namespace std;
int stripWhite(char *str);
int main ()
{
char str[50];
cout << "Enter a sentence . " << endl;
cin >>str;
cout << "Your sentence without spaces is : " << endl;
cout << (str) << endl; // This is my problem. The sentence only prints the first word
stripWhite(str);
cout << "There were " << stripWhite(str) << " spaces." << endl;
return 0;
}
int stripWhite(char *str)
{
char *p = str;
int count = 0;
while (*p)
{
if (*p != ' ')
count++;
{
*p++;
}
}
return count;
If you don't want to replace your function with the C++ string type, you can use cin.getline to get a c string (char array)
cin.getline(str, 50);
std::cin treats spaces as end of string indicators.
In order to get the full sentence use std::getline. since this expects a std::string as one of its parameters, you will have to adjust your stripWhite-function accordingly:
# include <iostream>
# include <string>
using namespace std;
int stripWhite(string str); //change the formal parameter's type
int main ()
{
string str;
cout << "Enter a sentence . " << endl;
getline(cin, str,'\n'); //use getline to read everything that has been entered till the press of enter
cout << "Your sentence without spaces is : " << endl;
cout << (str) << endl; // This is my problem. The sentence only prints the first word
stripWhite(str);
cout << "There were " << stripWhite(str) << " spaces." << endl;
system("pause");
return 0;
}
int stripWhite(string str)
{
int count = 0;
char* p = str.c_str;
while (*p)
{
if (*p != ' ')
count++;
{
*p++;
}
}
return count;
}
As pointed out by others, you should use std::getline instead of cin >> str.
However, there are multiple other problems in the code you provided.
Why use char array when you could use std::string ? Why are you so sure that 50 characters will be enough ?
Your stripWhite function doesn't seem to strip anything : you count the number of non-space characters, but you are not actually removing anything. Note that if you switch to std::string instead of plain of char arrays, you could use a standard algorithm to do the job (on the top of my head, I guess std::remove would be appropriate)
Assuming that stripWhite did actually modify the input string, why would you want to call it twice from your main ? If the goal is to strip the string in the first place, and then print the number of removed space, make stripWhite return the number of removed spaces and store this result in the main.
For example :
const int nbSpacesStripped = stripWhite(str);
cout << "There were " << nbSpacesStripped << "spaces." << endl;
Behold Boost String Algorithms and more particularly the replace/erase routines.
# include <iostream>
# include <string>
size_t stripWhiteSpaces(std::string& str)
{
size_t const originalSize = str.size();
boost::erase_all(str, ' ');
return originalSize - str.size();
}
int main ()
{
std::string str;
std::cout << "Enter a sentence . \n";
getline(std::cin, str);
size_t const removed = stripWhiteSpaces(str);
std::cout << "Your sentence without spaces is :\n";
std::cout << (str) << '\n';
std::cout << "There were " << removed << " spaces.\n";
system("pause");
}