Extract a specific text pattern from a string

Extract a specific text pattern from a string - c++

I have a string as follows,
"0/41/9/71.94 PC:0x82cc (add)"
The desired output is the text between the brackets ( )
Ex: output = add,
for the string specified above
How is this done using sscanf?
Is there a better way to do it in C++?

With string operations exclusively:
std::string text = "0/41/9/71.94 PC:0x82cc (add)";
auto pos = text.find('(') + 1;
auto opcode = text.substr(pos, text.find(')', pos) - pos);
Demo.
With sscanf it would look something like this:
std::string opcode(5, '\0'); // Some suitable maximum size
sscanf(text.c_str(), "%*[^(](%[^)]", &opcode[0]);
Demo.

Its very easy, u should try yourself, think how to search in an array, then think if i could compare the content of an array or not, then every thing would be possible, as a programmer u have to create ideas, however if i were asked to write a program like this i would do that as follows:
int i=0, p=0;
char string="0/41/9/71.94 PC:0x82cc (add)", nstr[100];
while(string[i]!='\0')
{
while(string[i]!='(')
i++;
if (string[i]=='(')
{
i++;
goto end;
}
end:
while (string[i]!=')' || string[i]!='\0')
{
nstr[p]=string[i];
p++;
i++;
}
nstr[p]='\0';
cout<<Output = "<<nstr<<"\n";
I know this is very long, but this will give you deeper understanding of parsing or spliting a string, hope i help u, thank u...

Related

How can I use the C++ regex library to find a match and then replace it?

I am writing what amounts to a tiny DSL in which each script is read from a single string, like this:
"func1;func2;func1;4*func3;func1"
I need to expand the loops, so that the expanded script is:
"func1;func2;func1;func3;func3;func3;func3;func1"
I have used the C++ standard regex library with the following regex to find those loops:
regex REGEX_SIMPLE_LOOP(":?[0-9]+)\\*([_a-zA-Z][_a-zA-Z0-9]*;");
smatch match;
bool found = std::regex_search(*this, match, std::regex(REGEX_SIMPLE_LOOP));
Now, it's not too difficult to read out the loop multiplier and print the function N times, but how do I then replace the original match with this string? I want to do this:
if (found) match[0].replace(new_string);
But I don't see that the library can do this.
My backup place is to regex_search, then construct the new string, and then use regex_replace, but it seems clunky and inefficient and not nice to essentially do two full searches like that. Is there a cleaner way?

You can also NOT use regex, the parsing isn't too difficult.
So regex might be overkill. Demo here : https://onlinegdb.com/RXLqLtrUQ-
(and yes my output gives an extra ; at the end)
#include <string>
#include <sstream>
#include <iostream>
int main()
{
std::istringstream is{ "func1;func2;func1;4*func3;func1" };
std::string split;
// use getline to split
while (std::getline(is, split, ';'))
{
// assume 1 repeat
std::size_t count = 1;
// if split part starts with a digit
if (std::isdigit(split.front()))
{
// look for a *
auto pos = split.find('*');
// the first part of the string contains the repeat count
auto count_str = split.substr(0, pos);
// convert that to a value
count = std::stoi(count_str);
// and keep the rest ("funcn")
split = split.substr(pos + 1, split.size() - pos - 1);
}
// now use the repeat count to build the output string
for (std::size_t n = 0; n < count; ++n)
{
std::cout << split << ";";
}
}
// TODO invalid input string handling.
return 0;
}

Split a string in C++ after a space, if more than 1 space leave it in the string

I need to split a string by single spaces and store it into an array of strings. I can achieve this using the fonction boost:split, but what I am not being able to achieve is this:
If there is more than one space, I want to integrate the space in the vector
For example:
(underscore denotes space)
This_is_a_string. gets split into: A[0]=This A[1]=is A[2]=a A[3]=string.
This__is_a_string. gets split into: A[0]=This A[1] =_is A[2]=a A[4]=string.
How can I implement this?
Thanks

For this, you can use a combination of the find and substr functions for string parsing.
Suppose there was just a single space everywhere, then the code would be:
while (str.find(" ") != string::npos)
{
string temp = str.substr(0,str.find(" "));
ans.push_back(temp);
str = str.substr(str.find(" ")+1);
}
The additional request you have raised suggests that we call the find function after we are sure that it is not looking at leading spaces. For this, we can iterate over the leading spaces to count how many there are, and then call the find function to search from thereon. To use the find function from say after x positions (because there are x leading spaces), the call would be str.find(" ",x).
You should also take care of corner cases such as when the entire string is composed of spaces at any point. In that case the while condition in the current form will not terminate. Add the x parameter there as well.

This is by no means the most elegant solution, but it will get the job done:
void bizarre_string_split(const std::string& input,
std::vector<std::string>& output)
{
std::size_t begin_break = 0;
std::size_t end_break = 0;
// count how many spaces we need to add onto the start of the next substring
std::size_t append = 0;
while (end_break != std::string::npos)
{
std::string temp;
end_break = input.find(' ', begin_break);
temp = input.substr(begin_break, end_break - begin_break);
// if the string is empty it is because end_break == begin_break
// this happens because the first char of the substring is whitespace
if (!temp.empty())
{
std::string temp2;
while (append)
{
temp2 += ' ';
--append;
}
temp2 += temp;
output.push_back(temp2);
}
else
{
++append;
}
begin_break = end_break + 1;
}
}

Split string and get values before different delimiters

Given the code:
procedure example {
x=3;
y = z +c ;
while p {
b = a+c ;
}
}
I would like to split the code by using the delimiters {, ;, and }.
After splitting, I would like to get the information before it together with the delimiter.
So for example, I would like to get procedure example {, x=3;, y=z+c;, }. Then I would like to push it into a list<pair<int, string>> sList. Could someone explain how this can be done in c++?
I tried following this example: Parse (split) a string in C++ using string delimiter (standard C++), but I could only get one token. I want the entire line. I am new to c++, and the list, splitting, etc. is confusing.
Edit: So I have implemented it, and this is the code:
size_t openCurlyBracket = lines.find("{");
size_t closeCurlyBracket = lines.find("}");
size_t semiColon = lines.find(";");
if (semiColon != string::npos) {
cout << lines.substr(0, semiColon + 1) + "\n";
}
However, it seems that it can't separate based on semicolon separately, openBracket and closeBracket. Anyone knows how to separate based on these characters individually?
2nd Edit:
I have done this (codes below). It is separating correctly, I have one for open curly bracket. I was planning on adding the value to the list in the commented area below. However, when i think about it, if i do that, then the order of information in the list will be messed up. As i have another while loop which separates based on open curly bracket. Any idea on how i can add the information in an order?
Example:
1. procedure example {
2. x=3;
3. y = z+c
4. while p{
and so on.
while (semiColon != string::npos) {
semiColon++;
//add list here
semiColon = lines.find(';',semiColon);
}

I think that you should read about std::string::find_first_of function.
Searches the string for the first character that matches any of the characters specified in its arguments.
I have a problem to understand what you really want to achieve. Let's say this is an example of the find_first_of function use.
list<string> split(string lines)
{
list<string> result;
size_t position = 0;
while((position = lines.find_first_of("{};\n")) != string::npos)
{
if(lines[position] != '\n')
{
result.push_back(lines.substr(0, position+1));
}
lines = lines.substr(position+1);
}
return result;
}

Parsing a string by a delimeter in C++

Ok, so I need some info parsed and I would like to know what would be the best way to do it.
Ok so here is the string that I need to parse. The delimeter is the "^"
John Doe^Male^20
I need to parse the string into name, gender, and age variables. What would be the best way to do it in C++? I was thinking about looping and set the condition to while(!string.empty()
and then assign all characters up until the '^' to a string, and then erase what I have already assigned. Is there a better way of doing this?

You can use getline in C++ stream.
istream& getline(istream& is,string& str,char delimiter=’\n’)
change delimiter to '^'

You have a few options. One good option you have, if you can use boost, is the split algorithm they provide in their string library. You can check out this so question to see the boost answer in action: How to split a string in c
If you cannot use boost, you can use string::find to get the index of a character:
string str = "John Doe^Male^20";
int last = 0;
int cPos = -1;
while ((cPos = str.find('^', cPos + 1)) != string::npos)
{
string sub = str.substr(last, cPos - last);
// Do something with the string
last = cPos + 1;
}

#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "This is a sample string";
char * pch;
printf ("Looking for the 's' character in \"%s\"...\n",str);
pch=strchr(str,'s');
while (pch!=NULL)
{
printf ("found at %d\n",pch-str+1);
pch=strchr(pch+1,'s');
}
return 0;
}
Do something like this in an array.

You have a number of choices but I would use strtok(), myself. It would make short work of this.

Cleaning a string of punctuation in C++

Ok so before I even ask my question I want to make one thing clear. I am currently a student at NIU for Computer Science and this does relate to one of my assignments for a class there. So if anyone has a problem read no further and just go on about your business.
Now for anyone who is willing to help heres the situation. For my current assignment we have to read a file that is just a block of text. For each word in the file we are to clear any punctuation in the word (ex : "can't" would end up as "can" and "that--to" would end up as "that" obviously with out the quotes, quotes were used just to specify what the example was).
The problem I've run into is that I can clean the string fine and then insert it into the map that we are using but for some reason with the code I have written it is allowing an empty string to be inserted into the map. Now I've tried everything that I can come up with to stop this from happening and the only thing I've come up with is to use the erase method within the map structure itself.
So what I am looking for is two things, any suggestions about how I could a) fix this with out simply just erasing it and b) any improvements that I could make on the code I already have written.
Here are the functions I have written to read in from the file and then the one that cleans it.
Note: the function that reads in from the file calls the clean_entry function to get rid of punctuation before anything is inserted into the map.
Edit: Thank you Chris. Numbers are allowed :). If anyone has any improvements to the code I've written or any criticisms of something I did I'll listen. At school we really don't get feed back on the correct, proper, or most efficient way to do things.
int get_words(map<string, int>& mapz)
{
int cnt = 0; //set out counter to zero
map<string, int>::const_iterator mapzIter;
ifstream input; //declare instream
input.open( "prog2.d" ); //open instream
assert( input ); //assure it is open
string s; //temp strings to read into
string not_s;
input >> s;
while(!input.eof()) //read in until EOF
{
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
}
input.close(); //close instream
for(mapzIter = mapz.begin(); mapzIter != mapz.end(); mapzIter++)
cnt = cnt + mapzIter->second;
return cnt; //return number of words in instream
}
void clean_entry(const string& non_clean, string& clean)
{
int i, j, begin, end;
for(i = 0; isalnum(non_clean[i]) == 0 && non_clean[i] != '\0'; i++);
begin = i;
if(begin ==(int)non_clean.length())
return;
for(j = begin; isalnum(non_clean[j]) != 0 && non_clean[j] != '\0'; j++);
end = j;
clean = non_clean.substr(begin, (end-begin));
for(i = 0; i < (int)clean.size(); i++)
clean[i] = tolower(clean[i]);
}

The problem with empty entries is in your while loop. If you get an empty string, you clean the next one, and add it without checking. Try changing:
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
to
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() > 0)
{
mapz[not_s]++; //increment occurence
}
input >>s;
EDIT: I notice you are checking if the characters are alphanumeric. If numbers are not allowed, you may need to revisit that area as well.

Further improvements would be to
declare variables only when you use them, and in the innermost scope
use c++-style casts instead of the c-style (int) casts
use empty() instead of length() == 0 comparisons
use the prefix increment operator for the iterators (i.e. ++mapzIter)

A blank string is a valid instance of the string class, so there's nothing special about adding it into the map. What you could do is first check if it's empty, and only increment in that case:
if (!not_s.empty())
mapz[not_s]++;
Style-wise, there's a few things I'd change, one would be to return clean from clean_entry instead of modifying it:
string not_s = clean_entry(s);
...
string clean_entry(const string &non_clean)
{
string clean;
... // as before
if(begin ==(int)non_clean.length())
return clean;
... // as before
return clean;
}
This makes it clearer what the function is doing (taking a string, and returning something based on that string).

The function 'getWords' is doing a lot of distinct actions that could be split out into other functions. There's a good chance that by splitting it up into it's individual parts, you would have found the bug yourself.
From the basic structure, I think you could split the code into (at least):
getNextWord: Return the next (non blank) word from the stream (returns false if none left)
clean_entry: What you have now
getNextCleanWord: Calls getNextWord, and if 'true' calls CleanWord. Returns 'false' if no words left.
The signatures of 'getNextWord' and 'getNextCleanWord' might look something like:
bool getNextWord (std::ifstream & input, std::string & str);
bool getNextCleanWord (std::ifstream & input, std::string & str);
The idea is that each function does a smaller more distinct part of the problem. For example, 'getNextWord' does nothing but get the next non blank word (if there is one). This smaller piece therefore becomes an easier part of the problem to solve and debug if necessary.
The main component of 'getWords' then can be simplified down to:
std::string nextCleanWord;
while (getNextCleanWord (input, nextCleanWord))
{
++map[nextCleanWord];
}
An important aspect to development, IMHO, is to try to Divide and Conquer the problem. Split it up into the individual tasks that need to take place. These sub-tasks will be easier to complete and should also be easier to maintain.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract a specific text pattern from a string - c++

I have a string as follows, "0/41/9/71.94 PC:0x82cc (add)" The desired output is the text between the brackets ( ) Ex: output = add, for the string specified above How is this done using sscanf? Is there a better way to do it in C++?

Related

How can I use the C++ regex library to find a match and then replace it?

Split a string in C++ after a space, if more than 1 space leave it in the string

Split string and get values before different delimiters

Parsing a string by a delimeter in C++

Cleaning a string of punctuation in C++

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract a specific text pattern from a string - c++

I have a string as follows, "0/41/9/71.94 PC:0x82cc (add)" The desired output is the text between the brackets ( ) Ex: output = add, for the string specified above How is this done using sscanf? Is there a better way to do it in C++?

Related

How can I use the C++ regex library to find a match and *then* replace it?

Split a string in C++ after a space, if more than 1 space leave it in the string

Split string and get values before different delimiters

Parsing a string by a delimeter in C++

Cleaning a string of punctuation in C++

Categories

Resources

How can I use the C++ regex library to find a match and then replace it?