Finding all substrings within a string and storing the results?

Finding all substrings within a string and storing the results? - c++

class ORF_Finder {
public:
void findORFs(string & strand, int sizeOfStrand);
vector<string> orf1Strands;
vector<string> orf2Strands;
vector<string> orf3Strands;
private:
string newStrand;
string newSub;
};
void ORF_Finder::findORFs(string & strand, int sizeOfStrand) {
int pos, pos1, index = 0;
for (int i = 0; i < strand.size(); i++) {
pos = strand.find("ATG");
pos1 = strand.find("TAA");
newSub = strand.substr(pos, pos1);
newStrand.insert(index, newSub);
strand.erase(pos, pos1);
index = index + 3;
if ((pos1 % 3 == 0) && (pos1 >= pos + 21)) {
orf1Strands.push_back(newStrand);
}
else if ((pos1 % 3 == 1) && (pos1 >= pos + 21)) {
orf2Strands.push_back(newStrand);
}
else if ((pos1 % 3 == 2) && (pos1 >= pos + 21)) {
orf3Strands.push_back(newStrand);
}
}
}
^ assume all strings are declared and I'm "using namespace std".
My goal is to ask the user for an imported DNA strand (ex: "TCAATGCGCGCTACCATGCGGAGCTCTGGGCCCAAATTTCATCCATAACTGGGGCCCTTTTAAGGGCCCGGGAAATTT") and find all instances where a substring starts with "ATG" and ends with "TAA", "TAG", or "TGA" (I omitted "TAG" and "TGA" for simplicity sake).
The substring will be as so: "ATG ... ... ... ... TAA" and then it will be stored into a vector to be utilized later. However, I would like to find multiple instances of each reading frame (ORF1 should start at the "T" of the imported strand, ORF2 should start at the "C" of the imported strand, and ORF3 should start at the "A" of the imported strand) and should work in triplets, hence the inclusion of mod 3 in the if statements. The purpose of "pos1 >= pos + 21" is so that every substring is at least seven codons long.
The above code is what I've done thus far but obviously, it's incorrect. I'm trying to tell pos to find "ATG" and pos1 to find "TAA". newSub is the substring that will be generated from "ATG" to "TAA" and newStrand will be generated to contain the substring. I would then erase the portion of the strand (to avoid repetition) and increment index.
Sorry for the long description but I've been stressing over this and I've tried everything in my willpower to solve this.

Knutt-Morris Pratt is the fastest solution. Aho-corasick algorithm its a generalized version from kmp algorithm. Basically its a trie with failure links computed from a breadth-first search. You can try my PHP implementation phpahocorasick # codeplex.com. Then you need to add a wildcard to find all substrings.

Here is a possible implementation.
Characteristics :
can process large strings since it only keep one single copy of the initial string
accept one arbitrary initial sequence (here "ATG")
accept many end sequences (here "TAA", "TAG", or "TGA")
accept only substring of at least 7 codons
substrings are only described by there index in initial string and length (to save memory)
per your requirement keep results in 3 different vectors according to the modulo 3 of their index
Code :
#include <iostream>
#include <string>
#include <stdexcept>
#include <vector>
class Strand {
const std::string* data;
size_t begin;
size_t len;
public:
Strand(const std::string& data, size_t begin, size_t end): begin(begin),
len(end - begin), data(&data) {
if (end <= begin) {
throw std::invalid_argument("end < begin");
}
}
std::string getString() const {
const char *beg = data->c_str();
beg += begin;
return std::string(beg, len);
}
};
class Parser {
const std::string& data;
const std::string& first;
const std::vector<std::string>& end;
size_t dataLen;
std::vector<Strand> orf1Strands;
std::vector<Strand> orf2Strands;
std::vector<Strand> orf3Strands;
public:
enum TypStrand {
one = 0, two, three
};
Parser(const std::string& data, const std::string& first,
const std::vector<std::string>& end): data(data),
first(first), end(end) {
dataLen = data.length();
}
void parse();
const std::vector<Strand>& getVector(int typ) const {
switch(typ) {
case 0 : return orf1Strands;
case 1 : return orf2Strands;
default : return orf3Strands;
}
}
const std::vector<Strand>& getVector(TypStrand typ) const {
return getVector((int) typ);
}
};
void Parser::parse() {
size_t pos=0;
size_t endSize = end.size();
std::string firstChars = "";
for(size_t i=0; i<endSize; i++) {
firstChars += end[i].at(0);
}
for(;;) {
pos = data.find(first, pos);
if (pos == std::string::npos) break;
size_t strandEnd = pos + 18;
for(;;) {
if (strandEnd + 3 >= dataLen) break;
strandEnd = data.find_first_of(firstChars, strandEnd);
if ((strandEnd - pos) % 3 != 0) {
strandEnd += 1;
continue;
}
if (strandEnd + 3 >= dataLen) break;
for (size_t i=0; i<endSize; i++) {
if (data.compare(strandEnd, end[i].length(), end[i]) == 0) {
std::cout << "Found sequence ended with " << end[i] << std::endl;
switch(pos %3) {
case 0 :
orf1Strands.push_back(Strand(data, pos,
strandEnd + 3));
break;
case 1 :
orf2Strands.push_back(Strand(data, pos,
strandEnd + 3));
break;
case 2 :
orf3Strands.push_back(Strand(data, pos,
strandEnd + 3));
break;
}
pos = strandEnd + end[i].length() - 1;
break;
}
}
if (pos > strandEnd) break;
strandEnd += 3;
}
if (strandEnd + 3 >= dataLen) break;
pos = pos + 1;
}
}
using namespace std;
int main() {
std::string first = "ATG";
vector<string> end;
std::string ends[] = { "TAA", "TAG", "TGA"};
for (int i=0; i< sizeof(ends)/sizeof(std::string); i++) {
end.push_back(ends[i]);
}
string data = "TCAATGCGCGCTACCATGCGGAGCTCTGGGCCCAAATTTC"
"ATCCATAACTGGGGCCCTTTTAAGGGCCCGGGAAATTT";
Parser parser(data, first, end);
parser.parse();
for (int i=0; i<3; i++) {
int typ = i;
const vector<Strand>& vect = parser.getVector(typ);
cout << "Order " << i << " : " << vect.size() << endl;
if (vect.size() > 0) {
for(size_t j=0; j<vect.size(); j++) {
cout << vect[i].getString() << endl;
}
}
}
return 0;
}
Todo :
add comments
fix the management of enum TypStrand : once the program is written, I think that it would be better to have an array of three vectors than three separate ones.
minimal number of codons should be configurable
test more intensively for corner cases
3 is a magic number and really should be expressed as a constant

Simple:
Scan the whole string for occurrances of the start or end sequences.
If you find an end sequence, extract the part from the previous start sequence.
You will have a few cornercases like e.g. the handling of multiple signal sequences that could be paired differently, but that's all just normal programming.
The problem with your approach is that you don't scan the string from start to end, but you repeatedly search for the start and end from the beginning. You need to continue after the last position instead. Check out the various find.. functions of the string class to get an idea how to do that.

Related

How can I speed up parsing of large strings?

So I've made a program that reads in various config files. Some of these config files can be small, some can be semi-large (largest one is 3,844 KB).
The read in file is stored in a string (in the program below it's called sample).
I then have the program extract information from the string based on various formatting rules. This works well, the only issue is that when reading larger files it is very slow....
I was wondering if there was anything I could do to speed up the parsing or if there was an existing library that does what I need (extract string up until a delimiter & extract string string in between 2 delimiters on the same level). Any assistance would be great.
Here's my code & a sample of how it should work...
#include "stdafx.h"
#include <string>
#include <vector>
std::string ExtractStringUntilDelimiter(
std::string& original_string,
const std::string& delimiter,
const int delimiters_to_skip = 1)
{
std::string needle = "";
if (original_string.find(delimiter) != std::string::npos)
{
int total_found = 0;
auto occurance_index = static_cast<size_t>(-1);
while (total_found != delimiters_to_skip)
{
occurance_index = original_string.find(delimiter);
if (occurance_index != std::string::npos)
{
needle = original_string.substr(0, occurance_index);
total_found++;
}
else
{
break;
}
}
// Remove the found string from the original string...
original_string.erase(0, occurance_index + 1);
}
else
{
needle = original_string;
original_string.clear();
}
if (!needle.empty() && needle[0] == '\"')
{
needle = needle.substr(1);
}
if (!needle.empty() && needle[needle.length() - 1] == '\"')
{
needle.pop_back();
}
return needle;
}
void ExtractInitialDelimiter(
std::string& original_string,
const char delimiter)
{
// Remove extra new line characters
while (!original_string.empty() && original_string[0] == delimiter)
{
original_string.erase(0, 1);
}
}
void ExtractInitialAndFinalDelimiters(
std::string& original_string,
const char delimiter)
{
ExtractInitialDelimiter(original_string, delimiter);
while (!original_string.empty() && original_string[original_string.size() - 1] == delimiter)
{
original_string.erase(original_string.size() - 1, 1);
}
}
std::string ExtractStringBetweenDelimiters(
std::string& original_string,
const std::string& opening_delimiter,
const std::string& closing_delimiter)
{
const size_t first_delimiter = original_string.find(opening_delimiter);
if (first_delimiter != std::string::npos)
{
int total_open = 1;
const size_t opening_index = first_delimiter + opening_delimiter.size();
for (size_t i = opening_index; i < original_string.size(); i++)
{
// Check if we have room for opening_delimiter...
if (i + opening_delimiter.size() <= original_string.size())
{
for (size_t j = 0; j < opening_delimiter.size(); j++)
{
if (original_string[i + j] != opening_delimiter[j])
{
break;
}
else if (j == opening_delimiter.size() - 1)
{
total_open++;
}
}
}
// Check if we have room for closing_delimiter...
if (i + closing_delimiter.size() <= original_string.size())
{
for (size_t j = 0; j < closing_delimiter.size(); j++)
{
if (original_string[i + j] != closing_delimiter[j])
{
break;
}
else if (j == closing_delimiter.size() - 1)
{
total_open--;
}
}
}
if (total_open == 0)
{
// Extract result, and return it...
std::string needle = original_string.substr(opening_index, i - opening_index);
original_string.erase(first_delimiter, i + closing_delimiter.size());
// Remove new line symbols
ExtractInitialAndFinalDelimiters(needle, '\n');
ExtractInitialAndFinalDelimiters(original_string, '\n');
return needle;
}
}
}
return "";
}
int main()
{
std::string sample = "{\n"
"Line1\n"
"Line2\n"
"{\n"
"SubLine1\n"
"SubLine2\n"
"}\n"
"}";
std::string result = ExtractStringBetweenDelimiters(sample, "{", "}");
std::string LineOne = ExtractStringUntilDelimiter(result, "\n");
std::string LineTwo = ExtractStringUntilDelimiter(result, "\n");
std::string SerializedVector = ExtractStringBetweenDelimiters(result, "{", "}");
std::string SubLineOne = ExtractStringUntilDelimiter(SerializedVector, "\n");
std::string SubLineTwo = ExtractStringUntilDelimiter(SerializedVector, "\n");
// Just for testing...
printf("LineOne: %s\n", LineOne.c_str());
printf("LineTwo: %s\n", LineTwo.c_str());
printf("\tSubLineOne: %s\n", SubLineOne.c_str());
printf("\tSubLineTwo: %s\n", SubLineTwo.c_str());
system("pause");
}

Use string_view or a hand rolled one.
Don't modify the string loaded.
original_string.erase(0, occurance_index + 1);
is code smell and going to be expensive with a large original string.
If you are going to modify something, do it in one pass. Don't repeatedly delete from the front of it -- that is O(n^2). Instead, procceed along it and shove "finished" stuff into an output accumulator.
This will involve changing how your code works.

You're reading your data into a string. "Length of string" should not be a problem. So far, so good...
You're using "string.find().". That's not necessarily a bad choice.
You're using "string.erase()". That's probably the main source of your problem.
SUGGESTIONS:
Treat the original string as "read-only". Don't call erase(), don't modify it.
Personally, I'd consider reading your text into a C string (a text buffer), then parsing the text buffer, using strstr().

Here is a more efficient version of ExtractStringBetweenDelimiters. Note that this version does not mutate the original buffer. You would perform subsequent queries on the returned string.
std::string trim(std::string buffer, char what)
{
auto not_what = [&what](char ch)
{
return ch != what;
};
auto first = std::find_if(buffer.begin(), buffer.end(), not_what);
auto last = std::find_if(buffer.rbegin(), std::make_reverse_iterator(first), not_what).base();
return std::string(first, last);
}
std::string ExtractStringBetweenDelimiters(
std::string const& buffer,
const char opening_delimiter,
const char closing_delimiter)
{
std::string result;
auto first = std::find(buffer.begin(), buffer.end(), opening_delimiter);
if (first != buffer.end())
{
auto last = std::find(buffer.rbegin(), std::make_reverse_iterator(first),
closing_delimiter).base();
if(last > first)
{
result.assign(first + 1, last);
result = trim(std::move(result), '\n');
}
}
return result;
}
If you have access to string_view (c++17 for std::string_view or boost::string_view) you could return one of these from both functions for extra efficiency.
It's worth mentioning that this method of parsing a structured file is going to cause you problems down the line if any of the serialised strings contains a delimiter, such as a '{'.
In the end you'll want to write or use someone else's parser.
The boost::spirit library is a little complicated to learn, but creates very efficient parsers for this kind of thing.

How to "Fold a word" from a string. EX. "STACK" becomes "SKTCA". C++

I'm trying to figure out how to can fold a word from a string. For example "code" after the folding would become "ceod". Basically start from the first character and then get the last one, then the second character. I know the first step is to start from a loop, but I have no idea how to get the last character after that. Any help would be great. Heres my code.
#include <iostream>
using namespace std;
int main () {
string fold;
cout << "Enter a word: ";
cin >> fold;
string temp;
string backwards;
string wrap;
for (unsigned int i = 0; i < fold.length(); i++){
temp = temp + fold[i];
}
backwards= string(temp.rbegin(),temp.rend());
for(unsigned int i = 0; i < temp.length(); i++) {
wrap = fold.replace(backwards[i]);
}
cout << wrap;
}
Thanks

#Supreme, there are number of ways to do your task and I'm going to post one of them. But as #John had pointed you must try your own to get it done because real programming is all about practicing a lot. Use this solution just as a reference of one possibility and find many others.
int main()
{
string in;
cout <<"enter: "; cin >> in;
string fold;
for (int i=0, j=in.length()-1; i<in.length()/2; i++, j--)
{
fold += in[i];
fold += in[j];
}
if( in.length()%2 != 0) // if string lenght is odd, pick the middle
fold += in[in.length()/2];
cout << endl << fold ;
return 0;
}
good luck !

There are two approaches to this form of problem, a mathematically exact method would be to create a generator function which returns the number in the correct order.
An easier plan would be to modify the string to solve practically the problem.
Mathematical solution
We want a function which returns the index in the string to add. We have 2 sequences - increasing and decreasing and they are interleaved.
sequence 1 :
0, 1 , 2, 3.
sequence 2
len-1, len-2, len-3, len-4.
Given they are interleaved, we want even values to be from sequence 1 and odd values from sequence 2.
So our solution would be to for a given new index, choose which sequence to use, and then return the next value from that sequence.
int generator( int idx, int len )
{
ASSERT( idx < len );
if( idx %2 == 0 ) { // even - first sequence
return idx/2;
} else {
return (len- (1 + idx/2);
}
}
This can then be called from a function fold...
std::string fold(const char * src)
{
std::string result;
std::string source(src);
for (size_t i = 0; i < source.length(); i++) {
result += source.at(generator(i, source.length()));
}
return result;
}
Pratical solution
Although less efficient, this can be easier to think about. We are taking either the first or the last character of a string. This we will do using string manipulation to get the right result.
std::string fold2(const char * src)
{
std::string source = src;
enum whereToTake { fromStart, fromEnd };
std::string result;
enum whereToTake next = fromStart;
while (source.length() > 0) {
if (next == fromStart) {
result += source.at(0);
source = source.substr(1);
next = fromEnd;
}
else {
result += source.at(source.length() - 1); // last char
source = source.substr(0, source.length() - 1); // eat last char
next = fromStart;
}
}
return result;
}

You can take advantage of the concept of reverse iterators to write a generic algorithm based on the solution presented in Usman Riaz answer.
Compose your string picking chars from both the ends of the original string. When you reach the center, add the char in the middle if the number of chars is odd.
Here is a possible implementation:
#include <iostream>
#include <string>
#include <vector>
#include <utility>
#include <algorithm>
#include <iterator>
template <class ForwardIt, class OutputIt>
OutputIt fold(ForwardIt source, ForwardIt end, OutputIt output)
{
auto reverse_source = std::reverse_iterator<ForwardIt>(end);
auto reverse_source_end = std::reverse_iterator<ForwardIt>(source);
auto source_end = std::next(source, std::distance(source, end) / 2);
while ( source != source_end )
{
*output++ = *source++;
*output++ = *reverse_source++;
}
if ( source != reverse_source.base() )
{
*output++ = *source;
}
return output;
}
int main() {
std::vector<std::pair<std::string, std::string>> tests {
{"", ""}, {"a", "a"}, {"stack", "sktca"}, {"steack", "sktcea"}
};
for ( auto const &test : tests )
{
std::string result;
fold(
std::begin(test.first), std::end(test.first),
std::back_inserter(result)
);
std::cout << (result == test.second ? " OK " : "FAILED: ")
<< '\"' << test.first << "\" --> \"" << result << "\"\n";
}
}

Sorting string vector using integer values at the end of the string in C++

I have a directory containing files {"good_6", good_7", "good_8"...,"good_660"}, after reading it using readdir and storing in a vector I get {"good_10", "good_100", "good_101", "good_102"...}.
What I want to do is to keep the file names as {"good_6", good_7", "good_8"...,"good_660"} in the vector and then replacing first name with 1, second with 2 and so on... such that good_6 will be 1, good_7 will be 2 and so on. but now good_10 corresponds to 1 and good_100 to 2 and so on.
I tried std::sort on vector but the values are already sorted, just not in a way that I desire (based on integer after _). Even if I just get the last integer and sort on that, it will still be sorted as 1, 100, 101...
Any help would be appreciated. Thanks.

You can use a custom function that compares strings with a special case for digits:
#include <ctype.h>
int natural_string_cmp(const char *sa, const char *sb) {
for (;;) {
int a = (unsigned char)*sa++;
int b = (unsigned char)*sb++;
/* simplistic version with overflow issues */
if (isdigit(a) && isdigit(b)) {
const char *sa1 = sa - 1;
const char *sb1 = sb - 1;
unsigned long na = strtoul(sa1, (char **)&sa, 10);
unsigned long nb = strtoul(sb1, (char **)&sb, 10);
if (na == nb) {
if ((sa - sa1) == (sb - sb1)) {
/* XXX should check for '.' */
continue;
} else {
/* Perform regular strcmp to handle 0 :: 00 */
return strcmp(sa1, sb1);
}
} else {
return (na < nb) ? -1 : +1;
}
} else {
if (a == b) {
if (a != '\0')
continue;
else
return 0;
} else {
return (a < b) ? -1 : 1;
}
}
}
}
Depending on your sorting algorithm, you may need to wrap it with an extra level of indirection:
int natural_string_cmp_ind(const void *p1, const void *p2) {
return natural_string_cmp(*(const char * const *)p1, *(const char * const *)p2);
}
char *array[size];
... // array is initialized with filenames
qsort(array, size, sizeof(*array), natural_string_cmp_ind);

I think you can play around with your data structure. For example instead of vector<string>, you can convert your data to vector< pair<int, string> >. Then {"good_6", "good_7", "good_8"...,"good_660"} should be {(6, "good"), (7, "good"), (7, "good")..., (660, "good")}. In the end, you convert it back and do whatever you want.
Another way is just to define your own comparator to do the exact comparison as what you want.

You can use string::replace to replace string "good_" with empty string, and use stoi to convert the rest of the integral part of the string. Lets say the value obtained is x.
Create std::map and populate it in this way myMap[x] = vec_element.
Then you can traverse from m.begin() till m.end() to find sorted order.
Code:
myMap[ stoi( vec[i].replace(0,5,"") )] = vec[i];
for( MapType::iterator it = myMap.begin(); it != myMap.end(); ++it ) {
sortedVec.push_back( it->second );

If I understand your question, you're just having trouble with the sorting and not how you plan to change the names after you sort.
Something like this might work for you:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <tuple>
#include <string.h>
int main()
{
std::vector<std::string> v;
char buffer[64] = {};
for (size_t i = 1; i < 10; ++i)
{
sprintf(buffer, "good_%d", i * 3);
v.push_back(buffer);
sprintf(buffer, "bad_%d", i * 2);
v.push_back(buffer);
}
std::random_shuffle(v.begin(), v.end());
for (const auto& s : v)
{
std::cout << s << "\n";
}
std::sort(v.begin(), v.end(),
[](const std::string& lhs, const std::string& rhs)
{
//This assumes a lot about the contents of the strings
//and has no error checking just to keep things short.
size_t l_pos = lhs.find('_');
size_t r_pos = rhs.find('_');
std::string l_str = lhs.substr(0, l_pos);
std::string r_str = rhs.substr(0, r_pos);
int l_num = std::stoi(lhs.substr(l_pos + 1));
int r_num = std::stoi(rhs.substr(r_pos + 1));
return std::tie(l_str, l_num) < std::tie(r_str, r_num);
});
std::cout << "-----\n";
for (const auto& s : v)
{
std::cout << s << "\n";
}
return 0;
}

Managed to do it with the following compare function:
bool numericStringComapre(const std::string& s1, const std::string& s2)
{
size_t foundUnderScore = s1.find_last_of("_");
size_t foundDot = s1.find_last_of(".");
string s11 = s1.substr(foundUnderScore+1, foundDot - foundUnderScore - 1);
foundUnderScore = s2.find_last_of("_");
foundDot = s2.find_last_of(".");
string s22 = s2.substr(foundUnderScore+1, foundDot-foundUnderScore - 1);
int i1 = stoi(s11);
int i2 = stoi(s22);
if (i1 < i2) return true;
return false;
}
full file name was good_0.png, hence that find_last_of(".").

Complex algorithm to extract numbers/number range from a string

I am working on a algorithm where I am trying the following output:
Given values/Inputs:
char *Var = "1-5,10,12,15-16,25-35,67,69,99-105";
int size = 29;
Here "1-5" depicts a range value, i.e. it will be understood as "1,2,3,4,5" while the values with just "," are individual values.
I was writing an algorithm where end output should be such that it will give complete range of output as:
int list[]=1,2,3,4,5,10,12,15,16,25,26,27,28,29,30,31,32,33,34,35,67,69,99,100,101,102,103,104,105;
If anyone is familiar with this issue then the help would be really appreciated.
Thanks in advance!
My initial code approach was as:
if(NULL != strchr((char *)grp_range, '-'))
{
int_u8 delims[] = "-";
result = (int_u8 *)strtok((char *)grp_range, (char *)delims);
if(NULL != result)
{
start_index = strtol((char*)result, (char **)&end_ptr, 10);
result = (int_u8 *)strtok(NULL, (char *)delims);
}
while(NULL != result)
{
end_index = strtol((char*)result, (char**)&end_ptr, 10);
result = (int_u8 *)strtok(NULL, (char *)delims);
}
while(start_index <= end_index)
{
grp_list[i++] = start_index;
start_index++;
}
}
else if(NULL != strchr((char *)grp_range, ','))
{
int_u8 delims[] = ",";
result = (unison_u8 *)strtok((char *)grp_range, (char *)delims);
while(result != NULL)
{
grp_list[i++] = strtol((char*)result, (char**)&end_ptr, 10);
result = (int_u8 *)strtok(NULL, (char *)delims);
}
}
But it only works if I have either "0-5" or "0,10,15". I am looking forward to make it more versatile.

Here is a C++ solution for you to study.
#include <vector>
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int ConvertString2Int(const string& str)
{
stringstream ss(str);
int x;
if (! (ss >> x))
{
cerr << "Error converting " << str << " to integer" << endl;
abort();
}
return x;
}
vector<string> SplitStringToArray(const string& str, char splitter)
{
vector<string> tokens;
stringstream ss(str);
string temp;
while (getline(ss, temp, splitter)) // split into new "lines" based on character
{
tokens.push_back(temp);
}
return tokens;
}
vector<int> ParseData(const string& data)
{
vector<string> tokens = SplitStringToArray(data, ',');
vector<int> result;
for (vector<string>::const_iterator it = tokens.begin(), end_it = tokens.end(); it != end_it; ++it)
{
const string& token = *it;
vector<string> range = SplitStringToArray(token, '-');
if (range.size() == 1)
{
result.push_back(ConvertString2Int(range[0]));
}
else if (range.size() == 2)
{
int start = ConvertString2Int(range[0]);
int stop = ConvertString2Int(range[1]);
for (int i = start; i <= stop; i++)
{
result.push_back(i);
}
}
else
{
cerr << "Error parsing token " << token << endl;
abort();
}
}
return result;
}
int main()
{
vector<int> result = ParseData("1-5,10,12,15-16,25-35,67,69,99-105");
for (vector<int>::const_iterator it = result.begin(), end_it = result.end(); it != end_it; ++it)
{
cout << *it << " ";
}
cout << endl;
}
Live example
http://ideone.com/2W99Tt

This is my boost approach :
This won't give you array of ints, instead a vector of ints
Algorithm used: (nothing new)
Split string using ,
Split the individual string using -
Make a range low and high
Push it into vector with help of this range
Code:-
#include<iostream>
#include<vector>
#include <boost/algorithm/string.hpp>
#include <boost/lexical_cast.hpp>
int main(){
std::string line("1-5,10,12,15-16,25-35,67,69,99-105");
std::vector<std::string> strs,r;
std::vector<int> v;
int low,high,i;
boost::split(strs,line,boost::is_any_of(","));
for (auto it:strs)
{
boost::split(r,it,boost::is_any_of("-"));
auto x = r.begin();
low = high =boost::lexical_cast<int>(r[0]);
x++;
if(x!=r.end())
high = boost::lexical_cast<int>(r[1]);
for(i=low;i<=high;++i)
v.push_back(i);
}
for(auto x:v)
std::cout<<x<<" ";
return 0;
}

You're issue seems to be misunderstanding how strtok works. Have a look at this.
#include <string.h>
#include <stdio.h>
int main()
{
int i, j;
char delims[] = " ,";
char str[] = "1-5,6,7";
char *tok;
char tmp[256];
int rstart, rend;
tok = strtok(str, delims);
while(tok != NULL) {
for(i = 0; i < strlen(tok); ++i) {
//// range
if(i != 0 && tok[i] == '-') {
strncpy(tmp, tok, i);
rstart = atoi(tmp);
strcpy(tmp, tok + i + 1);
rend = atoi(tmp);
for(j = rstart; j <= rend; ++j)
printf("%d\n", j);
i = strlen(tok) + 1;
}
else if(strchr(tok, '-') == NULL)
printf("%s\n", tok);
}
tok = strtok(NULL, delims);
}
return 0;
}

Don't search. Just go through the text one character at a time. As long as you're seeing digits, accumulate them into a value. If the digits are followed by a - then you're looking at a range, and need to parse the next set of digits to get the upper bound of the range and put all the values into your list. If the value is not followed by a - then you've got a single value; put it into your list.

Stop and think about it: what you actually have is a comma
separated list of ranges, where a range can be either a single
number, or a pair of numbers separated by a '-'. So you
probably want to loop over the ranges, using recursive descent
for the parsing. (This sort of thing is best handled by an
istream, so that's what I'll use.)
std::vector<int> results;
std::istringstream parser( std::string( var ) );
processRange( results, parser );
while ( isSeparator( parser, ',' ) ) {
processRange( results, parser );
}
with:
bool
isSeparator( std::istream& source, char separ )
{
char next;
source >> next;
if ( source && next != separ ) {
source.putback( next );
}
return source && next == separ;
}
and
void
processRange( std::vector<int>& results, std::istream& source )
{
int first = 0;
source >> first;
int last = first;
if ( isSeparator( source, '-' ) ) {
source >> last;
}
if ( last < first ) {
source.setstate( std::ios_base::failbit );
}
if ( source ) {
while ( first != last ) {
results.push_back( first );
++ first;
}
results.push_back( first );
}
}
The isSeparator function will, in fact, probably be useful in
other projects in the future, and should be kept in your
toolbox.

First divide whole string into numbers and ranges (using strtok() with "," delimiter), save strings in array, then, search through array looking for "-", if it present than use sscanf() with "%d-%d" format, else use sscanf with single "%d" format.
Function usage is easily googling.

One approach:
You need a parser that identifies 3 kinds of tokens: ',', '-', and numbers. That raises the level of abstraction so that you are operating at a level above characters.
Then you can parse your token stream to create a list of ranges and constants.
Then you can parse that list to convert the ranges into constants.
Some code that does part of the job:
#include <stdio.h>
// Prints a comma after the last digit. You will need to fix that up.
void print(int a, int b) {
for (int i = a; i <= b; ++i) {
printf("%d, ", i);
}
}
int main() {
enum { DASH, COMMA, NUMBER };
struct token {
int type;
int value;
};
// Sample input stream. Notice the sentinel comma at the end.
// 1-5,10,
struct token tokStream[] = {
{ NUMBER, 1 },
{ DASH, 0 },
{ NUMBER, 5 },
{ COMMA, 0 },
{ NUMBER, 10 },
{ COMMA, 0 } };
// This parser assumes well formed input. You have to add all the error
// checking yourself.
size_t i = 0;
while (i < sizeof(tokStream)/sizeof(struct token)) {
if (tokStream[i+1].type == COMMA) {
print(tokStream[i].value, tokStream[i].value);
i += 2; // skip to next number
}
else { // DASH
print(tokStream[i].value, tokStream[i+2].value);
i += 4; // skip to next number
}
}
return 0;
}

How to reverse a string in blocks of 2 in C++?

What I want to do is convert a string such as
"a4b2f0" into "f0b2a4"
or in more simple terms:
turning "12345678" into "78563412"
The string will always have an even number of characters so it will always divide by 2. I'm not really sure where to start.

One simple way to do that is this:
std::string input = "12345678";
std::string output = input;
std::reverse(output.begin(), output.end());
for(size_t i = 1 ; i < output.size(); i+=2)
std::swap(output[i-1], output[i]);
std::cout << output << std::endl;
Online demo
A bit better in terms of speed, as the previous one swaps elements twice, and this one swap each pair once:
std::string input = "12345678";
std::string output = input;
for(size_t i = 0, middle = output.size()/2, size = output.size(); i < middle ; i+=2 )
{
std::swap(output[i], output[size - i- 2]);
std::swap(output[i+1], output[size -i - 1]);
}
std::cout << output << std::endl;
Demo

Let's get esoteric... (not tested! :( And definitely not built to handle odd-length sequences.)
typedef <typename I>
struct backward_pair_iterator {
typedef I base_t;
base_t base;
bool parity;
backward_pair_iterator(base_t base, parity = false):
base(base), parity(parity) {
++base;
}
backward_pair_iterator operator++() {
backward_pair_iterator result(base, !parity);
if (parity) { result.base++; result.base++; }
else { result.base--; }
return result;
}
};
template <typename I>
backward_pair_iterator<I> make_bpi(I base) {
return backward_pair_iterator<I>(base);
}
std::string output(make_bpi(input.rbegin()), make_bpi(input.rend()));

static string reverse(string entry) {
if (entry.size() == 0) {
return "";
} else {
return entry.substr (entry.size() - 2, entry.size()) + reverse(entry.substr (0, entry.size() - 2));
}
}
My method uses the power of recursive programming

A simple solution is this:
string input = "12345678";
string output = "";
for(int i = input.length() - 1; i >= 0; i-2)
{
if(i -1 >= 0){
output += input[i -1];
output += input[i];
}
}
Note: You should check to see if the length of the string when mod 2 is = because otherwise this will go off the end. Do something like I did above.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding all substrings within a string and storing the results? - c++

Related

How can I speed up parsing of large strings?

How to "Fold a word" from a string. EX. "STACK" becomes "SKTCA". C++

Sorting string vector using integer values at the end of the string in C++

Complex algorithm to extract numbers/number range from a string

How to reverse a string in blocks of 2 in C++?

Categories

Resources