Concise lists/vectors in C++ - c++

I'm currently translating an algorithm in Python to C++.
This line EXCH_SYMBOL_SETS = [["i", "1", "l"], ["s", "5"], ["b", "8"], ["m", "n"]]
is now
vector<vector<char>> exch_symbols;
vector<char> vector_1il;
vector_1il.push_back('1');
vector_1il.push_back('i');
vector_1il.push_back('l');
vector<char> vector_5s;
vector_5s.push_back('5');
vector_5s.push_back('s');
vector<char> vector_8b;
vector_8b.push_back('8');
vector_8b.push_back('b');
vector<char> vector_mn;
vector_mn.push_back('m');
vector_mn.push_back('n');
exch_symbols.push_back(vector_1il);
exch_symbols.push_back(vector_5s);
exch_symbols.push_back(vector_8b);
exch_symbols.push_back(vector_mn);
I hate to have an intermediate named variable for each inner variable in a 2-D vector. I'm not really familiar with multidimensional datastructures in C++. Is there a better way?
What's happening afterwards is this:
multimap<char, char> exch_symbol_map;
/*# Insert all possibilities
for symbol_set in EXCH_SYMBOL_SETS:
for symbol in symbol_set:
for symbol2 in symbol_set:
if symbol != symbol2:
exch_symbol_map[symbol].add(symbol2)*/
void insert_all_exch_pairs(const vector<vector<char>>& exch_symbols) {
for (vector<vector<char>>::const_iterator symsets_it = exch_symbols.begin();
symsets_it != exch_symbols.end(); ++symsets_it) {
for (vector<char>::const_iterator sym1_it = symsets_it->begin();
sym1_it != symsets_it->end(); ++sym1_it) {
for (vector<char>::const_iterator sym2_it = symsets_it->begin();
sym2_it != symsets_it->end(); ++sym2_it) {
if (sym1_it != sym2_it) {
exch_symbol_map.insert(pair<char, char>(*sym1_it, *sym2_it));
}
}
}
}
}
So this algorithm should work in one way or another with the representation here. The goal is that EXCH_SYMBOL_SETS can be easily changed later to include new groups of chars or add new letters to existing groups. Thank you!

I would refactor, instead of vector<char>, use std::string as internal, i.e.
vector<string> exch_symbols;
exch_symbols.push_back("1il");
exch_symbols.push_back("s5");
exch_symbols.push_back("b8");
exch_symbols.push_back("mn");
then change your insert method:
void insert_all_exch_pairs(const vector<string>& exch_symbols)
{
for (vector<string>::const_iterator symsets_it = exch_symbols.begin(); symsets_it != exch_symbols.end(); ++symsets_it)
{
for (string::const_iterator sym1_it = symsets_it->begin(); sym1_it != symsets_it->end(); ++sym1_it)
{
for (string::const_iterator sym2_it = symsets_it->begin(); sym2_it != symsets_it->end(); ++sym2_it)
{
if (sym1_it != sym2_it)
exch_symbol_map.insert(pair<char, char>(*sym1_it, *sym2_it));
}
}
}
}

You could shorten it by getting rid of the intermediate values
vector<vector<char> > exch_symbols(4, vector<char>()); //>> is not valid in C++98 btw.
//exch_symbols[0].reserve(3)
exch_symbols[0].push_back('i');
etc.
You could also use boost.assign or something similiar
EXCH_SYMBOL_SETS = [["i", "1", "l"], ["s", "5"], ["b", "8"], ["m", "n"]] then becomes
vector<vector<char>> exch_symbols(list_of(vector<char>(list_of('i')('1')('l')))(vector<char>(list_of('s')('5'))(list_of('m')('n'))) (not tested and never used it with nested vectors, but it should be something like this)

For your real question of...
how could I translate L = [A, [B],
[[C], D]]] to C++ ... at all!
There is no direct translation - you've switched from storing values of the same type to storing values of variable type. Python allows this because it's a dynamically typed language, not because it has a nicer array syntax.
There are ways to replicate the behaviour in C++ (e.g. a vector of boost::any or boost::variant, or a user defined container class that supports this behviour), but it's never going to be as easy as it is in Python.

Your code:
vector<char> vector_1il;
vector_1il.push_back('1');
vector_1il.push_back('i');
vector_1il.push_back('l');
Concise code:
char values[] = "1il";
vector<char> vector_1il(&values[0], &values[3]);
Is it fine with you?
If you want to use std::string as suggested by Nim, then you can use even this:
//Concise form of what Nim suggested!
std::string s[] = {"1il", "5s", "8b", "mn"};
vector<std::string> exch_symbols(&s[0], &s[4]);
Rest you can follow Nim's post. :-)

In c++0x the instruction
vector<string> EXCH_SYMBOL_SETS={"i1l", "s5", "b8", "mn"} ;
compiles and works fine. Sadly enough the apparently similar statement
vector<vector<char>> EXCH_SYMBOL_SETS={{'i','1','l'},{'s','5'}, {'b','8'}, {'m','n'}};
doesn't work :-(.
This is implemented in g++ 4.5.0 or later you should add the -std=c++0x option. I think this feature is not yet avaliable in microsoft c (VC10), and I don't know what's the status of other compilers.

I know that this is an old post, but in case anyone stumbles across it, C++ has gotten MUCH better at dealing with this stuff:
In c++11 the first code block can simply be re-written in as:
std::vector<std::string> exch_symbols {"1il", "5s", "8b", "mn"};
This isn't special to string either, we can nest vector like so:
std::vector<std::vector<int>> vov {{1,2,3}, {2,3,5,7,11}};
And here's the entire code in c++14-style, with an added cout at the end:
#include <iostream>
#include <map>
#include <string>
#include <vector>
void add_all_char_pairs (std::multimap<char, char> & mmap, const std::string & str)
{
// we choose not to add {str[i], str[i]} pairs for some reason...
const int s = str.size();
for (int i1 = 0; i1 < s; ++i1)
{
char c1 = str[i1];
for (int i2 = i1 + 1; i2 < s; ++i2)
{
char c2 = str[i2];
mmap.insert({c1, c2});
mmap.insert({c2, c1});
}
}
}
auto all_char_pairs_of_each_str (const std::vector<std::string> & strs)
{
std::multimap<char, char> mmap;
for (auto & str : strs)
{
add_all_char_pairs(mmap, str);
}
return mmap;
}
int main ()
{
std::vector<std::string> exch_symbols {"1il", "5s", "8b", "mn"};
auto mmap = all_char_pairs_of_each_str(exch_symbols);
for (auto e : mmap)
{
std::cout << e.first << e.second << std::endl;
}
}

Related

Efficient way to compare more 50 strings

I have method which takes two parameters one as string and other as int.
The string has to compare with more than 50 string and Once the match is found int value need to be mapped with hard coded string as Example below
EX:
string Compare_Method(std::string str, int val) {
if(str == "FIRST")
{
std::array<std::string, 3> real_value = {"Hello1","hai1","bye1"}
return real_value[val];
}
else if(str == "SECOND")
{
std::array<std::string, 4> real_value = {"Hello2","hai2","bye2"}
return real_value[val];
}
else if(str == "THIRD")
{
std::array<std::string, 5> real_value = {"Hello3","hai3","bye3"}
return real_value[val];
}
//----- 50+ else if
}
My approach is as above. What will be the efficient way
1.To compare more than 50 string.
2. create std::array for each if case
EDITED : std::array size is not fixed it can be 3,4,5 as edited above.
This would be my way of doing that. The data structure is created only once and the access times should be fast enough
#include <iostream>
#include <string>
#include <array>
#include <unordered_map>
std::string Compare_Method(const std::string& str, int val)
{
// or std::vector<std::string>
static std::unordered_map<std::string, std::array<std::string, 3>> map
{
{ "FIRST", { "Hello1", "hail1", "bye1" }},
{ "SECOND", { "Hello2", "hail2", "bye2" }},
{ "THIRD", { "Hello3", "hail3", "bye3" }},
// 50+ more
};
// maybe check if str is present in the map
return map[str][val];
}
int main()
{
std::cout << Compare_Method("SECOND", 1) << std::endl;
}
If std::unordered_map isn't (fast) enough for you, you can come up with some sort of static optimal hash structure, since keys are known at compile time.
If those 50 strings are something you will be widely using throughout your program, string comparisons will take a toll on performance. I'd suggest you to adapt them to an enum.
enum Strings
{
FIRST,
SECOND,
THIRD,
…
…
}
You'll obviously need a method to convert string to int whenever you get one from the source (user input, file read, etc.). This should be as infrequent as possible since your system now works on enum values (which can be used as indices on STL containers as we see in the next step)
int GetEnumIndex(const std::string& str)
{
// here you can map all variants of the same string to the same number
if ("FIRST" == str || "first" == str) return 1;
…
}
Then, the comparison method can be based on the enum instead of the string:
std::string Compare_Method(const int& strIndex, int val)
{
static std::vector<std::vector<std::string>> stringArray
{
{ "Hello1", "hail1", "bye1" },
{ "Hello2", "hail2", "bye2", "aloha2" },
{ "Hello3", "hail3", "bye3", "aloha3", "tata3" },
…
};
return stringArray[strIndex][val];
}
With information provided by you, I tried various variations to find out best way to achieve objective. I am listing best one here. You can see other methods here.
You can compile it and run run.sh to compare performance of all cases.
std::string Method6(const std::string &str, int val) {
static std::array<std::string, 5> NUMBERS{"FIRST", "SECOND", "THIRD",
"FOURTH", "FIFTH"};
static std::array<std::vector<std::string>, 5> VALUES{
std::vector<std::string>{"FIRST", "hai1", "bye1"},
std::vector<std::string>{"Hello1", "SECOND", "bye1"},
std::vector<std::string>{"Hello1", "hai1", "THIRD"},
std::vector<std::string>{"FOURTH", "hai1", "bye1"},
std::vector<std::string>{"Hello1", "FIFTH", "bye1"}};
for (int i = 0; i < NUMBERS.size(); ++i) {
if (NUMBERS[i] == str) {
return VALUES[i][val];
}
}
return "";
}
For simplicity I have been using NUMBERS with length of 5 but you can use what ever length you want to.
VALUES is std::array of std::vector so you can add any number if element to std::vector.
output from github code.
Method1 880
Method2 851
Method3 7292
Method4 989
Method5 598
Method6 440
You output may be different based on you system and system load at the time of execution.

Static vector pair of strings class member function

Write a function partlist that gives all the ways to divide a list (an array) of at least two elements into two non-empty parts.
From what I understand, the function should produce linear partitions (using term as mathematical) of the original array.
I think I understand each of the types for the function individually, but I am struggling to bring them all together.
(I have 6 months of C++ experience and no other languages. This is an exercise from codewars that I'm using to try to improve my coding skills)
I've written the function code up to the point where I want to start testing, but with the way the problem is worded, I do not understand how to instantiate the class type. I've reviewed statics, vectors, pairs, and constants in individual terms from class notes and cplusplus.com.
I've gotten to the point that the program will compile, but will not complete main(). I feel like I'm missing a vital bit of information, and I appreciate any help to understand the goal of the program.
#include <iostream>
#include <vector>
class PartList
{
public:
static std::vector<std::pair <std::string, std::string>> partlist(std::vector<std::string> &arr);
};
///The above is what I have to work with///
int main(){
std::vector<std::string> tester = {"I", "Love", "To", "Discrete"};
PartList::partlist(tester);
}
std::vector<std::pair<std::string,std::string>> PartList::partlist(std::vector<std::string> &arr){
std::vector<std::pair<std::string,std::string>> output;
std::vector<std::pair<std::string,std::string>>::iterator bigIt = output.begin();
std::vector<std::string>::iterator myIt;
for(std::vector<std::string>::iterator secIt = arr.begin();
secIt != arr.end(); secIt++){
myIt = arr.begin();
while(myIt <= secIt){
bigIt->first += *myIt;
myIt++;
}
while((myIt > secIt) && myIt != arr.end()){
bigIt->second += *myIt;
myIt++;
}
}
return output;
}
Expected:
For set {std::string a, std::string b, std::string c, std::string d}
Should result in {a, bcd}, {ab,cd}, {abc,d}
Result:
nothing
As john said in comments. you're not actually doing anything with output. At the beginning of your for loop, you need to append a new item to output with output.push_back(). Then, instead of using an iterator, just reference that item using output.back()
code:
using std::vector;
using std::string;
using std::pair;
vector<pair<string, string>> PartList::partlist(vector<string> &arr)
{
vector<pair<string, string>> output;
vector<string>::const_iterator arr_iterator;
for (vector<string>::const_iterator secIt = arr.begin(); secIt != std::prev(arr.end()); secIt++)
{
arr_iterator = arr.begin();
output.push_back(pair<string, string>());
while (arr_iterator <= secIt)
{
output.back().first += *(arr_iterator++);
}
while (arr_iterator != arr.end())
{
output.back().second += *(arr_iterator++);
}
}
return output;
}

To find duplicate entry in c++ using 2D Vector (std::vector)

I wrote a program to find duplicate entry in a table. I am a beginner in C++, hence I don't know how this program is working efficient. Is there any other idea to write this program? Here I have 3 tables (2D Vector), that they are 1)aRecord_arr 2)mainTable and 3)idxTable. idxtable is use to identify the keys to check duplicate entry. aRecord_arr table to be add in maintable. If it is already exist in maintable, it will show the error "Duplicate Entry". So Check this program, and give your suggestions.
typedef vector<string> rec_t;
typedef vector<rec_t> tab_t;
typedef vector<int> cn_t;
int main()
{
tab_t aRecord_arr= { {"a","apple","fruit"},
{"b","banana","fruit"} };
tab_t mainTable = { {"o","orange","fruit"},
{"p","pineapple","fruit"},
{"b","banana","fruit"},
{"m","melon","fruit"},
{"a","apple","fruit"},
{"g","guava","fruit"} };
tab_t idxTable = { {"code","k"},
{"name","k"},
{"category","n"}};
size_t Num_aRecords = aRecord_arr.size();
int idxSize = idxTable.size();
int mainSize = mainTable.size();
rec_t r1;
rec_t r2;
tab_t t1,t2;
cn_t idx;
for(int i=0;i<idxSize;i++)
{
if(idxTable[i][1]=="k")
{
idx.push_back(i);
}
}
for(size_t j=0;j<Num_aRecords;j++)
{
for(unsigned int id=0;id<idx.size();id++)
{
r1.push_back(aRecord_arr[j][idx[id]]);
}
t1.push_back(std::move(r1));
}
for(int j=0;j<mainSize;j++)
{
for(unsigned int id=0;id<idx.size();id++)
{
r2.push_back(mainTable[j][idx[id]]);
}
t2.push_back(std::move(r2));
}
for(size_t i=0;i<t1.size();i++)
{
for(size_t j=0;j<t2.size();j++)
{
if(t1[i]==t2[j])
{
cout<<"Duplicate Entry"<<endl;
exit(0);
}
}
}
}
If you want to avoid duplicate entries in an array, you should consider using a std::setinstead.
What you want is probably a std::map or a std::set
Don't reinvent the wheel, the STL is full of goodies.
You seem to be rooted in a weakly typed language - but C++ is strongly typed.
You will 'pay' the disadvantage of strong typing almost no matter what you do, but you almost painstakingly avoid the advantage.
Let me start with the field that always says 'fruit' - my suggestion is to make this an enum, like:
enum PlantType { fruit, veggie };
Second, you have a vector that always contain 3 strings, all with the same meaning. this seems to be a job for a struct, like:
struct Post {
PlantType kind;
char firstchar;
string name;
// possibly other characteristics
};
the 'firstchar' is probably premature optimization, but lets keep that for now.
Now you want to add a new Post, to an existing vector of Posts, like:
vector<Post> mainDB;
bool AddOne( const Post& p )
{
for( auto& pp : mainDB )
if( pp.name == p.name )
return false;
mainDB.push_back(p);
return true;
}
Now you can use it like:
if( ! AddOne( Post{ fruit, 'b', "banana" } ) )
cerr << "duplicate entry";
If you need speed (at the cost of memory), switch your mainDB to map, like:
map<string,Post> mainDB;
bool AddOne( const Post& p )
{
if( mainDB.find(p.name) != mainDB.end() )
return false;
mainDB[p.name]=p;
return true;
}
this also makes it easier (and faster) to find and use a specific post, like
cout << "the fruit is called " << mainDB["banana"].name ;
beware that the above will cause a runtime error if the post dont exists
As you can see, firstchar was never used, and could be omitted. std::map
has a hash-function-specialization for string keys, and it will probably be
orders of magnitude faster than anything you or I could whip up by hand.
All of the above assumed inclusion of the correct headers, and
using namespace std;
if you dont like using namespace, prepend std:: to all the right places
hope it helps :)

Is a compile-time checked string-to-int map possible?

I'm probably trying to achieve the impossible, but StackExchange always surprises me, so please have a go at this:
I need to map a name to an integer. The names (about 2k) are unique. There will be no additions nor deletions to that list and the values won't change during runtime.
Implementing them as const int variables gives me compile-time checks for existence and type.
Also this is very clear and verbose in code. Errors are easily spotted.
Implementing them as std::map<std::string, int> gives me a lot of flexibility for building the names to look up with string manipulation. I may use this to give strings as parameters to functions which than can query the list for multiple values by appending pre-/suffixes to that string. I can also loop over several values by creating a numeral part of the key name from the loop variable.
Now my question is: is there a method to combine both advantages? The missing compile-time check (especially for key-existence) almost kills the second method for me. (Especially as std::map silently returns 0 if the key doesn't exist which creates hard to find bugs.) But the looping and pre-/suffix adding capabilities are so damn useful.
I would prefer a solution that doesn't use any additional libraries like boost, but please suggest them nevertheless as I might be able to re-implement them anyway.
An example on what I do with the map:
void init(std::map<std::string, int> &labels)
{
labels.insert(std::make_pair("Bob1" , 45 ));
labels.insert(std::make_pair("Bob2" , 8758 ));
labels.insert(std::make_pair("Bob3" , 436 ));
labels.insert(std::make_pair("Alice_first" , 9224 ));
labels.insert(std::make_pair("Alice_last" , 3510 ));
}
int main()
{
std::map<std::string, int> labels;
init(labels);
for (int i=1; i<=3; i++)
{
std::stringstream key;
key << "Bob" << i;
doSomething(labels[key.str()]);
}
checkName("Alice");
}
void checkName(std::string name)
{
std::stringstream key1,key2;
key1 << name << "_first";
key2 << name << "_last";
doFirstToLast(labels[key1.str()], labels[key2.str()]);
}
Another goal is that the code shown in the main() routine stays as easy and verbose as possible. (Needs to be understood by non-programmers.) The init() function will be code-generated by some tools. The doSomething(int) functions are fixed, but I can write wrapper functions around them. Helpers like checkName() can be more complicated, but need to be easily debuggable.
One way to implement your example is using an enum and token pasting, like this
enum {
Bob1 = 45,
Bob2 = 8758,
Bob3 = 436,
Alice_first = 9224,
Alice_last = 3510
};
#define LABEL( a, b ) ( a ## b )
int main()
{
doSomething( LABEL(Bob,1) );
doSomething( LABEL(Bob,2) );
doSomething( LABEL(Bob,3) );
}
void checkName()
{
doFirstToLast( LABEL(Alice,_first), LABEL(Alice,_last) );
}
Whether or not this is best depends on where the names come from.
If you need to support the for loop use-case, then consider
int bob[] = { 0, Bob1, Bob2, Bob3 }; // Values from the enum
int main()
{
for( int i = 1; i <= 3; i++ ) {
doSomething( bob[i] );
}
}
I'm not sure I understand all your requirements, but how about something like this, without using std::map.
I am assuming that you have three strings, "FIRST", "SECOND" and "THIRD" that you
want to map to 42, 17 and 37, respectively.
#include <stdio.h>
const int m_FIRST = 0;
const int m_SECOND = 1;
const int m_THIRD = 2;
const int map[] = {42, 17, 37};
#define LOOKUP(s) (map[m_ ## s])
int main ()
{
printf("%d\n", LOOKUP(FIRST));
printf("%d\n", LOOKUP(SECOND));
return 0;
}
The disadvantage is that you cannot use variable strings with LOOKUP. But now you can iterate over the values.
Maybe something like this (untested)?
struct Bob {
static constexpr int values[3] = { 45, 8758, 436 };
};
struct Alice {
struct first {
static const int value = 9224;
};
struct last {
static const int value = 3510;
};
};
template <typename NAME>
void checkName()
{
doFirstToLast(NAME::first::value, NAME::last::value);
}
...
constexpr int Bob::values[3]; // need a definition in exactly one TU
int main()
{
for (int i=1; i<=3; i++)
{
doSomething(Bob::values[i]);
}
checkName<Alice>();
}
Using enum you have both compile-time check and you can loop over it:
How can I iterate over an enum?

Sorting a file with 55K rows and varying Columns

I want to find a programmatic solution using C++.
I have a 900 files each of 27MB size. (just to inform about the enormity ).
Each file has 55K rows and Varying columns. But the header indicates the columns
I want to sort the rows in an order w.r.t to a Column Value.
I wrote the sorting algorithm for this (definitely my newbie attempts, you may say).
This algorithm is working for few numbers, but fails for larger numbers.
Here is the code for the same:
basic functions I defined to use inside the main code:
int getNumberOfColumns(const string& aline)
{
int ncols=0;
istringstream ss(aline);
string s1;
while(ss>>s1) ncols++;
return ncols;
}
vector<string> getWordsFromSentence(const string& aline)
{
vector<string>words;
istringstream ss(aline);
string tstr;
while(ss>>tstr) words.push_back(tstr);
return words;
}
bool findColumnName(vector<string> vs, const string& colName)
{
vector<string>::iterator it = find(vs.begin(), vs.end(), colName);
if ( it != vs.end())
return true;
else return false;
}
int getIndexForColumnName(vector<string> vs, const string& colName)
{
if ( !findColumnName(vs,colName) ) return -1;
else {
vector<string>::iterator it = find(vs.begin(), vs.end(), colName);
return it - vs.begin();
}
}
////////// I like the Recurssive functions - I tried to create a recursive function
///here. This worked for small values , say 20 rows. But for 55K - core dumps
void sort2D(vector<string>vn, vector<string> &srt, int columnIndex)
{
vector<double> pVals;
for ( int i = 0; i < vn.size(); i++) {
vector<string>meancols = getWordsFromSentence(vn[i]);
pVals.push_back(stringToDouble(meancols[columnIndex]));
}
srt.push_back(vn[max_element(pVals.begin(), pVals.end())-pVals.begin()]);
if (vn.size() > 1 ) {
vn.erase(vn.begin()+(max_element(pVals.begin(), pVals.end())-pVals.begin()) );
vector<string> vn2 = vn;
//cout<<srt[srt.size() -1 ]<<endl;
sort2D(vn2 , srt, columnIndex);
}
}
Now the main code:
for ( int i = 0; i < TissueNames.size() -1; i++)
{
for ( int j = i+1; j < TissueNames.size(); j++)
{
//string fname = path+"/gse7307_Female_rma"+TissueNames[i]+"_"+TissueNames[j]+".txt";
//string fname2 = sortpath2+"/gse7307_Female_rma"+TissueNames[i]+"_"+TissueNames[j]+"Sorted.txt";
string fname = path+"/gse7307_Male_rma"+TissueNames[i]+"_"+TissueNames[j]+".txt";
string fname2 = sortpath2+"/gse7307_Male_rma"+TissueNames[i]+"_"+TissueNames[j]+"4Columns.txt";
vector<string>AllLinesInFile;
BioInputStream fin(fname);
string aline;
getline(fin,aline);
replace (aline.begin(), aline.end(), '"',' ');
string headerline = aline;
vector<string> header = getWordsFromSentence(aline);
int pindex = getIndexForColumnName(header,"p-raw");
int xcindex = getIndexForColumnName(header,"xC");
int xeindex = getIndexForColumnName(header,"xE");
int prbindex = getIndexForColumnName(header,"X");
string newheaderline = "X\txC\txE\tp-raw";
BioOutputStream fsrt(fname2);
fsrt<<newheaderline<<endl;
int newpindex=3;
while ( getline(fin, aline) ){
replace (aline.begin(), aline.end(), '"',' ');
istringstream ss2(aline);
string tstr;
ss2>>tstr;
tstr = ss2.str().substr(tstr.length()+1);
vector<string> words = getWordsFromSentence(tstr);
string values = words[prbindex]+"\t"+words[xcindex]+"\t"+words[xeindex]+"\t"+words[pindex];
AllLinesInFile.push_back(values);
}
vector<string>SortedLines;
sort2D(AllLinesInFile, SortedLines,newpindex);
for ( int si = 0; si < SortedLines.size(); si++)
fsrt<<SortedLines[si]<<endl;
cout<<"["<<i<<","<<j<<"] = "<<SortedLines.size()<<endl;
}
}
can some one suggest me a better way of doing this?
why it is failing for larger values. ?
The primary function of interest for this query is Sort2D function.
thanks for the time and patience.
prasad.
I'm not sure why your code is crashing, but recursion in that case is only going to make the code less readable. I doubt it's a stack overflow, however, because you're not using much stack space in each call.
C++ already has std::sort, why not use that instead? You could do it like this:
// functor to compare 2 strings
class CompareStringByValue : public std::binary_function<string, string, bool>
{
public:
CompareStringByValue(int columnIndex) : idx_(columnIndex) {}
bool operator()(const string& s1, const string& s2) const
{
double val1 = stringToDouble(getWordsFromSentence(s1)[idx_]);
double val2 = stringToDouble(getWordsFromSentence(s2)[idx_]);
return val1 < val2;
}
private:
int idx_;
};
To then sort your lines you would call
std::sort(vn.begin(), vn.end(), CompareByStringValue(columnIndex));
Now, there is one problem. This will be slow because stringToDouble and getWordsFromSentence are called multiple times on the same string. You would probably want to generate a separate vector which has precalculated the values of each string, and then have CompareByStringValue just use that vector as a lookup table.
Another way you can do this is insert the strings into a std::multimap<double, std::string>. Just insert the entries as (value, str) and then read them out line-by-line. This is simpler but slower (though has the same big-O complexity).
EDIT: Cleaned up some incorrect code and derived from binary_function.
You could try a method that doesn't involve recursion. if your program crashes using the Sort2D function with large values, then your probably overflowing the stack (danger of using recursion with a large number of function calls). Try another sorting method, maybe using a loop.
sort2D crashes because you keep allocating an array of strings to sort and then you pass it by value, in effect using O(2*N^2) memory. If you really want to keep your recursive function, simply pass vn by reference and don't bother with vn2. And if you don't want to modify the original vn, move the body of sort2D into another function (say, sort2Drecursive) and call that from sort2D.
You might want to take another look at sort2D in general, since you are doing O(N^2) work for something that should take O(N+N*log(N)).
The problem is less your code than the tool you chose for the job. This is purely a text processing problem, so choose a tool good at that. In this case on Unix the best tool for the job is Bash and the GNU coreutils. On Windows you can use PowerShell, Python or Ruby. Python and Ruby will work on any Unix-flavoured machine too, but roughly all Unix machines have Bash and the coreutils installed.
Let $FILES hold the list of files to process, delimited by whitespace. Here's the code for Bash:
for FILE in $FILES; do
echo "Processing file $FILE ..."
tail --lines=+1 $FILE |sort >$FILE.tmp
mv $FILE.tmp $FILE
done