C++ code for taking substrings

C++ code for taking substrings - c++

I have a string as "1.0.0" and I want to extract the "1", "0", and "0". If the last zero is not present, the string must store 0 by default:
verstr.substr(0,verstr.find(".");
The above statement can find the first digit that is "1", however, I am not able to think of a solution for extracting the remainder of the string.
After this i convert it to a long as:
va = atol(verstr.substr(0,verstr.find(".")).c_str());
so i want the "1" in va , 0 in "vb" and so on
Thanks.

C++11 solution:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main(int, char **) {
string version("1.2.3");
match_results<string::const_iterator> m;
regex re("([0-9]+)\\.([0-9]+)(\\.([0-9]+))?");
if (regex_match(version, m, re)) {
int major = stoi(m[1].str()),
minor = stoi(m[2].str()),
rev = stoi(m[4].str().length() == 0 ? 0 : m[4].str());
cout << "major: " << major << endl;
cout << "minor: " << minor << endl;
cout << "rev: " << rev << endl;
} else {
cout << "no match\n";
}
}
The regular expression used is ([0-9]+)\.([0-9]+)(\.([0-9]+))? and breaks down as follows:
[0-9]+ matches one or more digits
\. matches a literal dot.
? following the last expression indicates that it is optional
Expressions wrapped in ( and ) are capture groups. There are five capture groups in this expression:
0 - always matches the entire string - we don't use this.
1 - matches the major version number.
2 - matches the minor version number.
3 - matches a dot followed by the revision number - we don't use this but it is necessary because we use the parentheses followed by a ? to make this whole group optional.
4 - matches the revision number.

Not sure if I understand what you need, if you want to retrieve the digits as strings, with a minimum of x digits, you can do something like this.
vector<string> GetVersion(const string &strInput, int iMinSize)
{
vector<string> vRetValue;
std::stringstream ss(strInput);
string strItem;
while(std::getline(ss, strItem, '.'))
vRetValue.push_back(strItem);
while(vRetValue.size() < iMinSize)
vRetValue.push_back("0");
return vRetValue;
}
int _tmain(int argc, _TCHAR* argv[])
{
vector<string> vRetValue = GetVersion("1.0", 3);
return 0;
}

A possibility would to use std::sscanf(). It is simple to use and provides a level of error checking with relatively few lines of code:
#include <iostream>
#include <string>
#include <cstdio>
int main()
{
std::string input[] = { "1.0.7", "1.0.", "1.0", "1.", "1" };
for (size_t i = 0; i < sizeof(input)/sizeof(input[0]); i++)
{
std::cout << input[i] << ": ";
// Init to zero.
int parts[3] = { 0 };
// sscanf() returns number of assignments made.
if (std::sscanf(input[i].c_str(),
"%d.%d.%d",
&parts[0],
&parts[1],
&parts[2]) >= 2)
{
// OK, the string contained at least two digits.
std::cout << parts[0]
<< ","
<< parts[1]
<< ","
<< parts[2]
<< "\n";
}
else
{
std::cout << "bad format\n";
}
}
return 0;
}
Output:
1.0.7: 1,0,7
1.0.: 1,0,0
1.0: 1,0,0
1.: bad format
1: bad format
See online demo: http://ideone.com/0Ox9b .

find and substr are two really nice family of function overloads that are pretty well suited to many simple parsing problems, especially when your syntax checking only needs to be loose.
To extract multiple scalars out of your version vector, store the found index somewhere:
const auto a = verstr.find('.');
const std::string major = verstr.substr(0, a);
Then re-use it with one of the overloads of string::find, saying start searching at one after a:
const auto b = verstr.find ('.', a+1);
const std::string minor = verstr.substr(a+1, b);
And so forth.
If you need a syntax check, compare the returned indices against string::npos:
const auto a = verstr.find('.');
if (std::string::npos == a)
.... bad syntax ....
Pastebin style version of this answer:
#include <string>
#include <stdexcept>
#include <iostream>
struct Version
{
std::string Major, Minor, Patch;
Version(std::string const &Major)
: Major(Major), Minor("0"), Patch("0")
{}
Version(std::string const &Major, std::string const &Minor)
: Major(Major), Minor(Minor), Patch("0")
{}
Version(std::string const &Major, std::string const &Minor, std::string const &Patch)
: Major(Major), Minor(Minor), Patch(Patch)
{}
};
std::ostream& operator<< (std::ostream &os, Version const &v)
{
return os << v.Major << '.' << v.Minor << '.' << v.Patch;
}
Version parse (std::string const &verstr) {
if (verstr.empty()) throw std::invalid_argument("bad syntax");
const auto first_dot = verstr.find('.');
if (first_dot == std::string::npos)
return Version(verstr);
const auto second_dot = verstr.find('.', first_dot+1);
if (second_dot == std::string::npos)
return Version(verstr.substr(0, first_dot),
verstr.substr(first_dot+1, second_dot));
return Version(verstr.substr(0, first_dot),
verstr.substr(first_dot+1, second_dot),
verstr.substr(second_dot+1));
}
and then
int main () {
std::cout << parse("1.0") << '\n'
<< parse("1.0.4+Patches(55,322)") << '\n'
<< parse("1") << '\n';
parse(""); // expected to throw
}

try something like this instead of solution below the line
string s = "1.0.0";
string delimiters = ".";
size_t current;
size_t next = -1;
do
{
current = next + 1;
next = s.find_first_of( delimiters, current );
string current_substring = s.substr( current, next - current ); // here you have the substring
}
while (next != string::npos);
Ok, please don't use this solution below, if you really don't know what you're doing, according to discussion below this answer with #DavidSchwartz
Take a look at function strtok http://www.cplusplus.com/reference/clibrary/cstring/strtok/
char str[] = "1.0.0";
char * pch;
pch = strtok (str,".");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ".");
}

Take a look at Boost libraries, specifically String Algo.
Standard library support for string manipulation is somewhat limited in C++. And reinventing the wheel is just plain bad.
Update:
I was asked in comments why I consider all find/substr based solutions bad style.
I'll try my best.
As questions does not states otherwise, performance is not a question here. Maintainability and readability are much more important. All solutions proposed here tightly tie split algorithm semantics with a specific version parsing algorithm semantics. This hurts both.
This hurts maintainability, because when you will need to change version format, it will involve changing the very same block of code that implements splitting, making it more error-prone. Same applies to unit-tests.
This hurts readability, because due to mixed semantics I can't at once guess an intent behind this block of code. For example, when I am looking up parse algorithm to check how missing 3d version argument is handled, I'd better not waste my time digging through split implementation details.
If parsing pattern would have been slightly more difficult, I'd have advised regular expressions. But in this case splitting string by a delimiter is an action generic and often used enough to justify having it as a separate function.

if it's only simple char comparison in a small string...
char[] should not be so bad... and c functions should work... (EDIT: for some, its a blasphemy... a lot of C++ method use char* whether it's const or not).
why use an object if it's to has the same functionality with more memory to be used, and more time for the process to spend?
EDIT:
I saw that some answer suppose to create a lot of string object... i don't khnow if it's really the best way...
a little 2 line recursive C like function can do that without gasping a lot.
In c++ code I probably would do that with string object, as it's negligible gasp... but just to say it so.
In string object i would use the length property to get the last char first (with [] operator, or appropriate method).
then just need to get the two elements (in a loop, or with 2 back reference in an object accepting regex (which is less efficient))

Related

Questions regarding efficiency

So while working through a course on Udemy over C++ one of the challenges was to check a string to see whether it was a palindrome or not. I completed the task successfully but went about it a different way than the instructor. I understand there are a multitude of ways to complete a task but I am wondering which is more efficient and why? It may seem stupid to be wondering about this while reteaching myself coding but I feel this is something I should be keeping in mind.
//Instructors code//
# include<iostream>
using namespace std;
/*program for reverse a string and check a string is a palidrome
*/
int main()
{
string str="MADAM";
string rev="";
int len=(int)str.length();
rev.resize(len);
for(int i=0, j=len-1; i<len; i++, j--)
{
rev[i]=str[j];
}
rev[len]='\0';
if(str.compare(rev)==0)
cout<<"palindrome"<<endl;
else
cout<<"not a pallindrome"<<endl;
return 0;
}
My Approach
#include <iostream>
using namespace std;
int main(){
string str1="test";
// cout << "Enter a string to check if it is a Palindrome: ";
// getline(cin,str1);
string str2;
string::reverse_iterator it;
for(it=str1.rbegin(); it!= str1.rend(); it++)
{
str2.push_back(*it);
}
if(!str1.compare(str2))
cout << "\nPalindrome";
else
cout << "\nNot a Palindrome";
return 0;
}
Thank you in advance.

In theory the code from your instructor is more efficient, but both examples have issues.
With your instructors code the main issue is the use of
int len=(int)str.length();
In this example, it is okay because we know the size of the string will fit in a int, but if you were getting a string from an outside source, this could be a problem. A std::string using an unsigned integer type to store the size of the string and that means you can have a string who's size is larger then what can fit in an int. If that were to happen, then code is not going to work correctly.
With your code you a avoid all that, which is great, but you also leave some performance on the table. In theory your code of
for(it=str1.rbegin(); it!= str1.rend(); it++)
{
str2.push_back(*it);
}
is going to cause str2 to have multiple buffer allocations and copies from the old buffer to the new buffer as it grows. This is a lot of extra work that you don't need to do since you already know how much space you need to allocate. Having
str2.reserve(str1.size() + 1);
before the loop pre-allocates all the space you need so you don't have those potential performance hits.
Then we come to the fact that both of your examples are using a second string. You don't need another string to check for a palindrome. What you can do is just check and see if the first and last characters are the same, and if they are move on to the first+1 and last-1 character and so on until you reach the middle or they don't match. You can do that using a construct like
bool is_palindrome = true;
for (auto start = str.begin(), end = str.end() - 1;
start < end && is_palindrome;
++start, --end)
{
if (*start != *end)
is_palindrom = false
}
if (is_palindrome)
std::cout << "palindrome\n";
else
std::cout << "not a pallindrome\n";

The simplest and most efficient way (no copying required) would be something like this:
inline bool is_palindrome(const std::string& u) {
return std::equal(u.begin(), std::next(u.begin(), u.length() / 2), u.rbegin());
}

I would say that both are almost the same, but as mentioned in the comments, the line:
str2.push_back(*it);
Is actually very inefficient, since std::string may copy the existing string to a new location in the memory, and then append the next char to the string, which is wasteful.
But I am wondering, why to create the copy in the first place?
It is very simple to run both from start to end, and from end to start to check it out, meaning:
bool is_polindrom(const std::string& str)
{
for (std::size_t idx = 0, len = str.length(); idx < len / 2; ++idx)
{
if (str[idx] != str[len - 1 - idx])
{
return false;
}
}
return true;
}
Running the code with:
int main()
{
const std::string right1 = "MADAM";
const std::string right2 = "MAAM";
const std::string wrong1 = "MADAAM";
const std::string wrong2 = "MEDAM";
std::cout << "MADAM result is: " << is_polindrom(right1) << std::endl;
std::cout << "MAAM result is: " << is_polindrom(right2) << std::endl;
std::cout << "MADAAM result is: " << is_polindrom(wrong1) << std::endl;
std::cout << "MEDAM result is: " << is_polindrom(wrong2) << std::endl;
}
Will yield:
MADAM result is: 1
MAAM result is: 1
MADAAM result is: 0
MEDAM result is: 0
You don't need extra memory in this case, since it is possible to iterate over a string from the end to the beginning, and you need to run on it exactly once (and notice that I stop when idx >= len / 2 since you don't really need to check each letter twice!).

How can I compare the next character which the file contains?

I am getting few problems when reading code. This is the text file.
2X^6+3X^3+4X^0=0
5X^6+X^2+X^1-4X^0=0
I am getting a proper input for the first line but in second line first I need to ignore. I searched here and found how to use it and it's work to get to next line ignoring all the left over characters of first line.
You can see in second line with X there is no integer, now problem is the second while loop is running continuously. If I add 1 in text file with the X the file reads perfectly. Also how can I put a condition to satisfy this that when there is directly X or -X, it should store 1 or -1 and goes to next character? Also you can see ^ I have to store this in a variable whereas I should ignore it but didn't how to ignore it?
Thanks in advance
int main()
{
int in;
int power;
char x;
char f;
fstream fin;
fin.open("input1.txt");
list l1,l2;
while(fin.peek() != 61)
{
fin>>in;
fin>>x;
fin>>f;
fin>>power;
cout<<in<<endl<<x<<endl<<f<<endl<<power<<endl;
l1.addtoend(in,power,x);
cout<<endl;
}
fin.ignore(2,'\n');
while(fin.peek() != 61)
{
fin>>in;
fin>>x;
fin>>f;
fin>>power;
cout<<in<<endl<<x<<endl<<f<<endl<<power<<endl;
l2.addtoend(in,power,x);
cout<<endl;
}
l1.display();
l2.display();
}

Unfortunately, this is not so simple as expected.
We need to split up the task into smaller parts.
What you want to do, is splitting your equation in terms and extract from this the coefficients and exponents.
Splitting up something in similar parts is also called tokenizing.
So, your equation consists of terms, which all follow the same pattern. First an optional sign, followed by the coefficients, then a “X^”, and, at the end the exponent (which may or may not have a sign).
And since all terms have the same pattern, we can find them with a so-called regex. C++ supports this functionality. Also for splitting up a text in smaller tokens/terms/pattern-matches, we have a special iterator std::sregex_token_iterator. Like any other iterator in C++, it iterates over the source string and extracts (and copies) all matched patterns.
OK, then we found already a solution for the first sub task. Extract all terms and put them into a std::vector. We will use the std::vectors range constructor, to do this, while defining the variable.
The next step is to get the coefficient. Here we need some special handling, because the coefficient can be omitted with an assumed 1. Using this assumption, we will read the term and convert the coefficient to an integer. And because we want to do that in one statement, we use std::transform from the STL’s algorithm library.
Getting the exponents is easier. We simply convert anything in a term following the ‘^’-sign to an integer. We again use std::transform to work on all terms in one statement.
Last but not least, we will get the right-hand-side of the equation and convert it also to an integer.
Please note:
All this can be done also with float type values
We could also allow spaces in the equation
For that, we would simple modify the std::regex-string.
Please see the complete example below:
#include <iostream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <algorithm>
#include <iomanip>
int main() {
std::string equation{ "5X^6+X^2+X^1-4X^0=0" };
const std::regex re(R"(([+-]?\d?X\^[+-]?\d+))");
std::vector<std::string> terms{ std::sregex_token_iterator(equation.begin(), equation.end(), re,1),std::sregex_token_iterator() };
std::vector<int> coefficients(terms.size());
std::vector<int> exponents(terms.size());
int rightHandSite{ 0 };
// Everything in front of X is the coefficient. Handle special case, when no digit is given
std::transform(terms.begin(), terms.end(), coefficients.begin(), [](const std::string& s) {
std::string temp = s.substr(0U, s.find('X'));
if (1 == temp.size() && !std::isdigit(temp[0])) temp += '1';
return std::stoi(temp); });
// Get all exponents
std::transform(terms.begin(), terms.end(), exponents.begin(), [](const std::string & s) {
return std::stoi(s.substr(s.find('^') + 1)); });
// Get right Hand site of equation
rightHandSite = std::stoi(equation.substr(equation.find('=') + 1));
// Show result
std::cout << "\nEquation: " << equation << "\n\nFound '" << terms.size() << "' terms.\n\nCoeffient Exponent\n";
for (size_t i = 0U; i < terms.size(); ++i)
std::cout << std::right << std::setw(9) << coefficients[i] << std::setw(10) << exponents[i] << "\n";
std::cout << "\n --> " << rightHandSite << "\n";
return 0;
}
There are many other possible solutions. But maybe it will give you some idea on what you could do.

Is there a better way to count substring occurrences within a string than char* and a loop?

I have this line:
const char *S1 = "AaA BbB CcC DdD AaA";
I think that this creates a pointer *S1, which is located a constant char type value and has the AaA BbB CcC DdD AaA value in it. Is that right?
If so, how can I read each character of this constant value and recognize how many times AaA occurs?
I was thinking of creating a loop that will copy each letter to a different cell and then 3 enclosed if statements, of which the first could check for A, the second for a and so one. And if those 3 are true I will increment a counter like so i++. Is that correct?
I think it's too complicated and it can be done with less code.

Your fundamental approach is sound. However, it’s complex and doesn’t scale: what if you wanted to search for a word with more than three letters? Four ifs? Five ifs? Six …? Clearly that won’t do.
Instead, use two loops: one to go over the string you search in (the “haystack” or “reference”) and one over the string you search for (“needle” or “pattern”).
But luckily you don’t even have to do that, because C++ gives you the tools to search for the occurrence of one string in another, the find function:
#include <string>
#include <iostream>
int main() {
std::string const reference = "AaA BbB CcC DdD AaA";
std::string const pattern = "AaA";
std::string::size_type previous = 0;
int occurrences = 0;
for (;;) {
auto position = reference.find(pattern, previous);
if (position == std::string::npos)
break;
previous = position + 1;
++occurrences;
}
std::cout << occurrences << " occurrences of " << pattern << '\n';
}
You can look up the individual types and functions in the C++ reference. For instance, you can find the std::string::find function there, which does the actual searching for us.
Note that this will find nested patterns: the reference “AaAaA” will contain two occurrences of “AaA”. If this isn’t what you want, change the line where the previous position is reassigned.

A simple way to achieve what you want is using strstr(str1, str2) function, which returns a pointer to the first occurrence of str2 in str1, or a null pointer if str2 is not part of str1.
int count_sequence(const char *S1, const char *sequence) {
int times, sequence_len;
const char *ptr;
times = 0;
sequence_len = strlen(sequence);
ptr = strstr(S1, sequence); //Search for the first sequence
while(ptr != NULL) {
times++;
ptr = strstr(ptr + sequence_len, sequence); //search from the last position
}
return times;
}

The C++ way:
using std::string for string management, it provide a lot benefit, memory management, iterators, some algorithms like find.
using find method of std::string to search for the index of s1 where s2 begin, if s2 is not present in s1 (a dummy value std::string::npos is returned).
Code:
#include <iostream>
int main() {
std::string s1("AaAaAaA");
//std::string s1("AaA BbB CcC DdD AaA");
std::string s2("AaA");
int times = 0;
size_t index = s1.find(s2, index);
while (index != std::string::npos) {
times++;
index = s1.find(s2, index + 1);
}
std::cout << "Found '" << s2 << "' in '" << s1 << "' "
<< times << " times" << std::endl;
}

How to check if a string contains spaces/tabs/new lines (anything that's blank)?

I know there's an "isspace" function that checks for spaces, but that would require me to iterate through every character in the string, which can be bad on performance since this would be called a lot. Is there a fast way to check if a std::string contains only spaces?
ex:
function(" ") // returns true
function(" 4 ") // returns false
One solution I've thought of is to use regex, then i'll know that it only contains whitespace if it's false... but i'm not sure if this would be more efficient than the isspace function.
regex: [\w\W] //checks for any word character(a,b,c..) and non-word character([,],..)
thanks in advance!

With a regular string, the best you can do will be of the form:
return string::find_first_not_of("\t\n ") == string::npos;
This will be O(n) in the worst case, but without knowing else about the string, this will be the best you can do.

Any method would, of necessity, need to look at each character of the string. A loop that calls isspace() on each character is pretty efficient. If isspace() is inlined by the compiler, then this would be darn near optimal.
The loop should, of course, abort as soon as a non-space character is seen.

You are making the assumption regex doesnt iterate over the string. Regex is probably much heavier than a linear search since it might build a FSM and traverse based on that.
The only way you could speed it up further and make it a near-constant time operation is to amortize the cost by iterating on every update to the string and caching a bool/bit that tracks if there is a space-like character, returning that value if no changes have been made since, and updating that bit whenever you do a write operation to that string. However, this sacrifices/slows that speed of modifying operations in order to increase the speed of your custom has_space().

For what it's worth, a locale has a function (scan_is) to do things like this:
#include <locale>
#include <iostream>
#include <iomanip>
int main() {
std::string inputs[] = {
"all lower",
"including a space"
};
std::locale loc(std::locale::classic());
std::ctype_base::mask m = std::ctype_base::space;
for (int i=0; i<2; i++) {
char const *pos;
char const *b = &*inputs[i].begin();
char const *e = &*inputs[i].end();
std::cout << "Input: " << std::setw(20) << inputs[i] << ":\t";
if ((pos=std::use_facet<std::ctype<char> >(loc).scan_is(m, b, e)) == e)
std::cout << "No space character\n";
else
std::cout << "First space character at position " << pos - b << "\n";
}
return 0;
}
It's probably open to (a lot of) question whether this gives much (if any) real advantage over using isspace in a loop (or using std::find_if).

You can also use find_first_not_of if you all the characters to be in a given list.
Then you can avoid loops.
Here is an example
#include <string>
#include <algorithm>
using namespace std;
int main()
{
string str1=" ";
string str2=" u ";
bool ContainsNotBlank1=(str1.find_first_not_of("\t\n ")==string::npos);
bool ContainsNotBlank2=(str2.find_first_not_of("\t\n ")==string::npos);
bool ContainsNotBlank3=(str2.find_first_not_of("\t\n u")==string::npos);
cout << ContainsNotBlank1 <<endl;
cout << ContainsNotBlank2 <<endl;
cout << ContainsNotBlank3 <<endl;
return 0;
}
Output:
1: because only blanks and a tab
0: because u is not into the list "\t\n "
1: because str2 contains blanks, tabs and a u.
Hope it helps
Tell me if you have any questions

How to use wildcard for strings (matching and replacing)?

I want to search for a number of letters including ? replaced by a letter matched in a string in C++.
Think of a word like abcdefgh. I want to find an algorithm to search for an input ?c for any letter replaced by ?, and finds bc, but also it should also check for ?e? and find def.
Do you have any ideas?

How about using boost::regex? or std::regex if you're using c++11 enabled compilers.

If you just want to support ?, that's pretty easy: when you encounter a ? in the pattern, just skip ahead over one byte of input (or check for isalpha, if you really meant you only want to match letters).
Edit: Assuming the more complex problem (finding a match starting at any position in the input string), you could use code something like this:
#include <string>
size_t match(std::string const &pat, std::string const &target) {
if (pat.size() > target.size())
return std::string::npos;
size_t max = target.size()-pat.size()+1;
for (size_t start =0; start < max; ++start) {
size_t pos;
for (pos=0; pos < pat.size(); ++pos)
if (pat[pos] != '?' && pat[pos] != target[start+pos])
break;
if (pos == pat.size())
return start;
}
return std::string::npos;
}
#ifdef TEST
#include <iostream>
int main() {
std::cout << match("??cd?", "aaaacdxyz") << "\n";
std::cout << match("?bc", "abc") << "\n";
std::cout << match("ab?", "abc") << "\n";
std::cout << match("ab?", "xabc") << "\n";
std::cout << match("?cd?", "cdx") << "\n";
std::cout << match("??cd?", "aaaacd") << "\n";
std::cout << match("??????", "abc") << "\n";
return 0;
}
#endif
If you only want to signal a yes/no based on whether the whole pattern matches the whole input, you do pretty much the same thing, but with the initial test for != instead of >, and then basically remove the outer loop.

Or if you insist on "wildcards" in the form you exhibit the term you want to search for is "glob"s (at least on unix-like systems).
The c-centric API is to be found in glob.h on unix-like systems, and consists of two calls glob and globfree in section 3 of the manual.
Switching to full regular expressions will allow you to use a more c++ approach as shown in the other answers.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ code for taking substrings - c++

Related

Questions regarding efficiency

How can I compare the next character which the file contains?

Is there a better way to count substring occurrences within a string than char* and a loop?

How to check if a string contains spaces/tabs/new lines (anything that's blank)?

How to use wildcard for strings (matching and replacing)?

Categories

Resources