Using regex_match() on iterator with string in c++ - regex

working on a c++ project, I need to iterate on a string ( or char* depending the solution you could provide me ! ). So basically I'm doing this :
void Pile::evalExpress(char* expchar){
string express = expchar
regex number {"[+-*/]"};
for(string::iterator it = express.begin(); it!=express.end(); ++it){
if(regex_match(*it,number)){
cout<<*it<<endl;
}
}
}
char expchar[]="234*+";
Pile calcTest;
calcTest.evalExpress(expchar);
the iterator works well ( I can put a cout<<*it<<'endl above the if statement and I get a correct output )
and then when I try to compile :
error: no matching function for call to 'regex_match(char&, std::__cxx11::regex&)'
if(regex_match(*it,number)){
^
I have no idea why this is happening, I tried to don't use iterator and iterate directly on the expchar[i] but I have the same error with regex_match()...
Regards
Vincent

Read the error message! It tells you that you're trying to pass a single char to regex_match, which is not possible because it requires a string (or other sequence of characters) not a single character.
You could do if (std::regex_match(it, it+1, number)) instead. That says to search the sequence of characters from it to it+1 (i.e. a sequence of length one).
You can also avoid creating a string and iterate over the char* directly
void Pile::evalExpress(const char* expchar) {
std::regex number {"[+-*/]"};
for (const char* p = expchar; *p != '\0'; ++p) {
if (regex_match(p, p+1, number)) {
cout<<*p<<endl;
}
}
}

Related

How to convert a string to array of strings made of characters in c++?

How to split a string into an array of strings for every character? Example:
INPUT:
string text = "String.";
OUTPUT:
["S" , "t" , "r" , "i" , "n" , "g" , "."]
I know that char variables exist, but in this case, I really need an array of strings because of the type of software I'm working on.
When I try to do this, the compiler returns the following error:
Severity Code Description Project File Line Suppression State
Error (active) E0413 no suitable conversion function from "std::string" to "char" exists
This is because C++ treats stringName[index] as a char, and since the array is a string array, the two are incopatible.
Here's my code:
string text = "Sample text";
string process[10000000];
for (int i = 0; i < sizeof(text); i++) {
text[i] = process[i];
}
Is there any way to do this properly?
If you are going to make string, you should look at the string constructors. There's one that is suitable for you (#2 in the list I linked to)
for (int i = 0; i < text.size(); i++) {
process[i] = string(1, text[i]); // create a string with 1 copy of text[i]
}
You should also realise that sizeof does not get you the size of a string! Use the size() or length() method for that.
You also need to get text and process the right way around, but I guess that was just a typo.
std::string is a container in the first place, thus anything that you can do to a container, you can do to an
instance of std::string. I would get use of the std::transform here:
const std::string str { "String." };
std::vector<std::string> result(str.size());
std::transform(str.cbegin(), str.cend(), result.begin(), [](auto character) {
return std::string(1, character);
});

Suggestions that could improve my string splitting function

I'm new to C++ and I'm trying to write some basic functions to get the hang of some of it, I decided on making a custom function to split up a string into tokens every time a specific delimiter is reached.
I've made it work successfully, but since I'm new, I'd like to hear from more experienced programmers on if there is a better way to go about it. This is my code:
vector<string> split(string const str, string const separator=" ") {
int str_len = str.length();
int sep_len = separator.length();
int current_index {0};
vector<string> strings {};
for(int i {0}; i < str_len; ++i) {
if(str.substr(i, sep_len) == separator) {
strings.push_back(str.substr(current_index, i-current_index));
current_index = i + sep_len;
}
}
strings.push_back(str.substr(current_index, str_len-current_index));
return strings;
}
One thing I will say is, I don't like how I had to put
strings.push_back(str.substr(current_index, str_len-current_index));
this after the entire iteration to get the final part of the string. I just can't think of any different methods.
Use std::string::find() to find separators in the string, which is probably much more efficient than your loop that checks for each possible position if the substring at that position matches the separator. Once you have that, you can make use of the fact that if the separator is not found, find() returns std::string::npos, which is the largest possible value of std::string::size_type, so just pass this to substr() to get everything from the current position to the end of the string. This way you can avoid the second push_back().
vector<string> split(string const &str, string const &separator=" ") {
string::size_type current_index {};
vector<string> strings;
while (true) {
auto separator_index = str.find(separator, current_index);
strings.push_back(str.substr(current_index, separator_index - current_index));
if (separator_index == str.npos)
break;
else
current_index = separator_index + separator.size();
}
return strings;
}
Note: ensure you pass the input parameters by reference to avoid unnecessary copies being made.

How to use strtok in string?

Edit: Sorry, it should be c++. how to use strtok in string?
FQ_ID_line[0]="1,26665;TUK.006.8955.FQ;TUK;400 BB 2 FQ;400 BB 2;899;FQ;Z_SCCFG1;Z_BSCFG1;333";
FQ_ID_line[1]="2,26223;TUK.002.8955.FQ;TUK;400 BB 2 FQ;400 BB 2;;FQ;Z_SCCFG1;Z_BSCFG1;333";
for(int FQ_i=0;FQ_i<FQ_Number;FQ_i++)
{
printf( "FQ_ID_line[FQ_i]=%u\n", FQ_ID_line[FQ_i] );
char * FQ_array=strdup(FQ_ID_line[FQ_i].c_str());
char *chars_array=strtok(FQ_array,seps);
chars_array=strtok(NULL,seps);
strcpy(DataLine[FQ_i].analog_comp_id,chars_array);
chars_array=strtok(NULL,seps);
strcpy(DataLine[FQ_i].RTU_abbr,chars_array);
chars_array=strtok(NULL,seps);
chars_array=strtok(NULL,seps);
chars_array=strtok(NULL,seps);
chars_array=strtok(NULL,seps);
chars_array=strtok(NULL,seps);
strcpy(DataLine[FQ_i].analog_scc_fep_group,chars_array);
chars_array=strtok(NULL,seps);
strcpy(DataLine[FQ_i].analog_bsc_fep_group,chars_array);
chars_array=strtok(NULL,seps);
strcpy(DataLine[FQ_i].RTU_number,chars_array);
DataLine[FQ_i].float_RTU_number=atof(chars_array);
free(FQ_array);
}
the ouput is :
DataLine[0].analog_comp_id=TUK.006.8955.FQ
DataLine[0].RTU_abbr=TUK
DataLine[0].analog_scc_fep_group=Z_SCCFG1
DataLine[0].analog_bsc_fep_group=Z_BSCFG1
DataLine[0].float_RTU_number=333
DataLine[1].analog_comp_id=TUK.002.8955.FQ
DataLine[1].RTU_abbr=TUK
DataLine[1].analog_scc_fep_group=Z_BSCFG1
DataLine[1].analog_bsc_fep_group=333
DataLine[1].float_RTU_number=
I want to the ouput:
DataLine[0].analog_comp_id=TUK.006.8955.FQ
DataLine[0].RTU_abbr=TUK
DataLine[0].analog_scc_fep_group=Z_SCCFG1
DataLine[0].analog_bsc_fep_group=Z_BSCFG1
DataLine[0].float_RTU_number=333
DataLine[1].analog_comp_id=TUK.002.8955.FQ
DataLine[1].RTU_abbr=TUK
DataLine[1].analog_scc_fep_group=Z_SCCFG1
DataLine[1].analog_bsc_fep_group=Z_BSCFG1
DataLine[1].float_RTU_number=333
The cause of the problem:
The function strtok() has many problems, due to the fact that subsequent calls depend on previous calls, and this dependency is managed in an unsafe manner:
it's not thread safe (see Robert's comment, and C++ standard section 21.8 pt 14)
if one function you call would use strtok() without you knowing, your next call to strtok() would return a lot of surprises.
Now your problem comes from the input string part: ...400 BB 2;;FQ;..., and the definition of strtok() : In subsequent calls, the function (...) uses the position right after the end of last token as the new starting location for scanning. To determine the beginning and the end of a token, the function first scans from the starting location for the first character not contained in delimiters (which becomes the beginning of the token)
So everything works well until it returns "400 BB 2". The next ";" will according to this algorithm be skipped and your code will jump over the empty field (;;) as if it didn't exist. Not ony do you have a shift in the following fields, but your last call to strtok() may even cause segmentation fault.
Solution:
Best avoid strtok(). If you like c-style, you may consider instead the use of strpbrk() with some adaptation in your code. For example:
char* get_field(char*p, char*& next, const char* s) // by ref as it's c++
{
if ((next = strpbrk(p, s)) != NULL)
*next++ = '\0';
return p;
}
with the following usage to replace strtok():
char* next;
char *chars_array = get_field(FQ_array, next, seps);
...
chars_array = get_field(next, next, seps); // instead of strtok(NULL, seps)
...
My personal recommendation, with C++, would be to consider regex expressions provided in the standard (or in boost), which would also allow for consistency check on you input data.
The full code would then look like:
regex fmt("([0-9]*,[0-9]*);(.*);(.*);(.*);(.*);(.*);(.*);(.*);(.*);([0-9]*\.*[0-9]*)");
for (int FQ_i = 0; ...)
{
smatch sm;
printf("FQ_ID_line[FQ_i]=%u\n", FQ_ID_line[FQ_i]); // ok, a cout would be better
if (regex_match(FQ_ID_line[FQ_i], sm, fmt)) {
DataLine[FQ_i].analog_comp_id = sm[2];
DataLine[FQ_i].RTU_abbr = sm[3];
DataLine[FQ_i].analog_scc_fep_group = sm[8];
DataLine[FQ_i].analog_bsc_fep_group = sm[9];
DataLine[FQ_i].RTU_number = sm[10];
DataLine[FQ_i].float_RTU_number = stof(sm[10]);
}
else
cout << " ** Non matching line ignored !!\n";
}
By fine tuning the regex, you could then check even more for consistency before assigning (Here I just did the minimum for the sake of the example).

Replace whole words in a string list without using external libraries

I want to replace some words without using external libraries.
My first attempt was to make a copy of the string, but it was not efficient, so this is another attempt where I use addresses:
void ReplaceString(std::string &subject, const std::string &search, const std::string &replace)
{
size_t position = 0;
while ((position = subject.find(search, position)) != std::string::npos) //if something messes up --> failure
{
subject.replace(position, search.length(), replace);
position = position + replace.length();
}
}
Because this is not very efficient either, I want to use another thing, but I got stuck; I want to use a function like replace_stuff(std::string & a); with a single parameter using string.replace() and string.find() (parsing it with a for loop or something) and then make use of std::map <std::string,std::string>; which is very convenient for me.
I want to use it for a large number of input words. (let's say replacing many bad words with some harmless ones)
The problem with your question is the lack of the necessary components in the Standard library. If you want an efficient implementation, you'd probably need a trie for efficient lookups. Writing one as part of the answer would be way to much code.
If you use a std::map or, if C++11 is available in your environment, a std::unordered_map, you will need to utilitize additional information about the input string and the search-replace pairs from the map. You'd then tokenize the string and check each token if it has to be replaced. Using positions pointing in the input string is a good idea since it avoids copying data. Which brings us to:
Efficiency will depend on memory access (reads and writes), so you should not modify the input string. Create the output by starting with an empty string and by appending pieces from the input. Check each part of the input: If it is a word, check if it needs to be replaced or if it is appended to the output unmodified. If it is not part of a word, append it unmodified.
It sounds like you want to replace all the "bad" words in a string with harmless ones, but your current implementation is inefficient because the list of bad words is much larger than the length of your input string (subject). Is this correct?
If so, the following code should make it more efficient. As you can see, I had to pass the map as a parameter, but if your function is going to be part of a class, you don't need to do so.
void ReplaceString(std::string &subject, const std::map<std::string, std::string>& replace_map)
{
size_t startofword = 0, endofword = 0;
while(startofword < subject.size())
{
size_t length = std::string::npos;
//get next word in string
endofword = subject.find_first_of(" ", startofword);
if(endofword != std::string::npos)
length = endofword-startofword;
std::string search = subject.substr(startofword, length);
//try to find this word in the map
if(replace_map.find(search) != replace_map.end())
{
//if found, replace the word with a new word
subject.replace(startofword, length, replace_map[search]);
startofword += replace_map[search].length();
}
else
{
startofword += length;
}
}
}
I use the following functions, hope it helps:
//=============================================================================
//replaces each occurence of the phrase in sWhat with sReplacement
std::string& sReplaceAll(std::string& sS, const std::string& sWhat, const std::string& sReplacement)
{
size_t pos = 0, fpos;
while ((fpos = sS.find(sWhat, pos)) != std::string::npos)
{
sS.replace(fpos, sWhat.size(), sReplacement);
pos = fpos + sReplacement.length();
}
return sS;
}
//=============================================================================
// replaces each single char from sCharList that is found within sS with entire sReplacement
std::string& sReplaceChars(std::string& sS, const std::string& sCharList, const std::string& sReplacement)
{
size_t pos=0;
while (pos < sS.length())
{
if (sCharList.find(sS.at(pos),0)!=std::string::npos) //pos is where a charlist-char was found
{
sS.replace(pos, 1, sReplacement);
pos += sReplacement.length()-1;
}
pos++;
}
return sS;
}
You might create a class, say Replacer:
class Replacer
{
std::map<std::string,> replacement;
public:
Replacer()
{
// init the map here
replacement.insert ( std::pair<std::string,std::string>("C#","C++") );
//...
}
void replace_stuff(std::string & a);
}
Then the replace_stuff definition would be very similar to your original ReplaceString (it would use map entries instead of the passed parameters).

Instead of having different size_t variables, can I use just one for searching a std::string multiple times?

I am wondering if it is possible to cut down how many size_t variables I use here. Here is what I have:
std::size_t found, found2, found3, found4 /* etc */;
Each has its own string to find:
found1 = msg.find("string1");
found2 = msg.find("string2");
found3 = msg.find("string3");
found4 = msg.find("string4");
// etc
If the word is found, then it will discard and prevent the message to be shown:
if (found1 != std::string::npos)
{
SendMsg("You cannot say that word!");
}
I have else if statements until found21. I'd like to cut everything down in size, so it would be clean, but I don't have a clue how to do it. I would also like it to lowercase the word. I have never used tolower at all either, so I would appreciated it if someone would help me.
To lowercase a string, you can do
std::transform(msg.begin(), msg.end(), msg.begin(), std::tolower);
Transform takes a begin and end iterator as the first and second arguments, and for each element in that range, applies the fourth argument (a function) and assigns it to what the third iterator is pointing to and increments it. By passing msg.begin() as both the first and third arguments, it will assign the result of the function to what it passed to it. So transform will basically do this:
for (auto src = begin(msg), dst = begin(msg); src != end(msg); ++src, ++dst)
*dst = tolower(*src);
but using transform is so much nicer.
To check whether a string contains any of a list of substrings, you can use a for loop with a vector:
vector<string> bad_strings { "bad word 1", "bad word 2", "etc" };
for (auto i = begin(bad_strings); i != end(bad_strings); ++i)
if (msg.find(*i)) {
SendMsg("You cannot say that word!");
break; // stop when you find it matches even one bad string
}