sorting strings in a file - c++

I need a solution for sorting of unix pwd file using C++ based on the last name. The format of the file is username, password, uid, gid, name, homedir, shell. All are seperated by colon delimiters. The name field contains first name follwed by last name both seperated by space I am able to sort the values using map and i am posting my code. Can some one suggest me improvements that I can do to my code please. Also I am unable to see the sorted lines in my file.
string line,item;
fstream myfile("pwd.txt");
vector<string> lines;
map<string,int> lastNames;
map<string,int>::iterator it;
if(myfile.is_open())
{
char delim =':';
int count =0;
while(!myfile.eof())
{
count++;
vector<string> tokens;
getline(myfile,line);
istringstream iss(line);
lines.push_back(line);
while(getline(iss,item,delim))
{
tokens.push_back(item);
}
cout<<tokens.size()<<endl;;
size_t i =tokens[4].find(" ");
string temp = tokens[4].substr(i,(tokens[4].size()-i));
cout<<temp<<endl;
lastNames.insert(pair<string,int>(temp,count));
tokens.clear();
}
myfile.seekg(0,ios::beg);
for(it=lastNames.begin();it!=lastNames.end();it++)
{
cout << (*it).first << " => " << (*it).second << endl;
int value=lastNames[(*it).first ];
myfile<<lines[value-1]<<endl;
cout<<lines[value-1]<<endl;
cout<<value<<endl;
}
}
Also I am having problem writing to the file I am unable to see the sorted results.
my problem:
Can someone please explain me why I am unable to see the written results in the file!
Thanks & Regards,
Mousey.

Since the format of the file is fixed
username, password, uid, gid, first name(space)lastname, homedir, shell
Maintain a std::map with key value as string (which will contain last name, and value as line number
Start reading the file line by line, extract the last name (Split the line by "," and then split fifth extracted part on space).
Store the name along with line number in map
When complete file has been read, just output the line numbers as mentioned in map. (Map contains lat names in sorted order)
For splitting a string
Refer to
Split a string in C++?

If it's only a few megabytes, you're can basically slurp it into memory and use the O(n log n) sorting algorithm of your choice to sort it, then write it out.
Basically, write a code snippet to compare two lines the way you want, and use that with your standard library sort routine to sort the data. Or write your own sort routine, whatever.
If you're interested in how you'd go about dealing with gigabytes of data, take a look at Wikipedia's article on External Sorting for a good jumping-off point.

Related

C++ retrieve numerical values in a line of string

Here is the content of txt file that i've managed read.
X-axis=0-9
y-axis=0-9
location.txt
temp.txt
I'm not sure whether if its possible but after reading the contents of this txt file i'm trying to store just the x and y axis range into 2 variables so that i'll be able to use it for later functions. Any suggestion? And do i need to use vectors? Here is the code for reading of the file.
string configName;
ifstream inFile;
do {
cout << "Please enter config filename: ";
cin >> configName;
inFile.open(configName);
if (inFile.fail()){
cerr << "Error finding file, please re-enter again." << endl;
}
} while (inFile.fail());
string content;
string tempStr;
while (getline(inFile, content)){
if (content[0] && content[1] == '/') continue;
cout << endl << content << endl;
depends on the style of your file, if you are always sure that the style will remain unchanged, u can read the file character by character and implement pattern recognition stuff like
if (tempstr == "y-axis=")
and then convert the appropriate substring to integer using functions like
std::stoi
and store it
I'm going to assume you already have the whole contents of the .txt file in a single string somewhere. In that case, your next task should be to split the string. Personally, yes, I would recommend using vectors. Say you wanted to split that string by newlines. A function like this:
#include <string>
#include <vector>
std::vector<std::string> split(std::string str)
{
std::vector<std::string> ret;
int cur_pos = 0;
int next_delim = str.find("\n");
while (next_delim != -1) {
ret.push_back(str.substr(cur_pos, next_delim - cur_pos));
cur_pos = next_delim + 1;
next_delim = str.find("\n", cur_pos);
}
return ret;
}
Will split an input string by newlines. From there, you can begin parsing the strings in that vector. They key functions you'll want to look at are std::string's substr() and find() methods. A quick google search should get you to the relevant documentation, but here you are, just in case:
http://www.cplusplus.com/reference/string/string/substr/
http://www.cplusplus.com/reference/string/string/find/
Now, say you have the string "X-axis=0-9" in vec[0]. Then, what you can do is do a find for = and then get the substrings before and after that index. The stuff before will be "X-axis" and the stuff after will be "0-9". This will allow you to figure that the "0-9" should be ascribed to whatever "X-axis" is. From there, I think you can figure it out, but I hope this gives you a good idea as to where to start!
std::string::find() can be used to search for a character in a string;
std::string::substr() can be used to extract part of a string into another new sub-string;
std::atoi() can be used to convert a string into an integer.
So then, these three functions will allow you to do some processing on content, specifically: (1) search content for the start/stop delimiters of the first value (= and -) and the second value (- and string::npos), (2) extract them into temporary sub-strings, and then (3) convert the sub-strings to ints. Which is what you want.

How to create an inverted index when I've already tokenized my file?

I'm trying to create an inverted index. I'm reading the lines of a text file, the text file has in the first position of each line the id of a document docId and the rest of the line has keywords about this document.
In order to create an inverted index, I first have to tokenize this text file. I did it with a function I wrote, and I store every word in a vector. My only gripe is that I also store the docId as a string in the vector. Here is the header of the tokenize function if you need it:
void tokenize(string& s, char c, vector<string>& v)
Now after tokenizing the file I have to create a function that puts every word in a map, i'm thinking of using an unordered map, in the map every word appears one time. I also have to somehow store the frequency of the word somewhere. I thought that using the docId as a key in the map would be a good idea but then I realized that I can only have one docId which will show me the word, while in my text file a docId has more than one words.
So, how am I going to solve this problem? Where should I begin?
What a mess of a question. Breaking it down, if I understand correctly you have:
doc1 word1a word1b word1c word1d
doc2 word2a word2b word2c
...
You want mappings from words to documents and vice versa. It's hard to tell from your question whether your talk of word "frequency" reflects the same word being a keyword for multiple documents, or whether the description you have of your file format failed to incorporate a needed count for repetitions within each file. Assuming the former:
if (std::ifstream f(filename))
{
std::map<std::string, std::vector<string>> words_in_doc;
std::map<std::string, std::vector<string>> docs_containing_word;
std::string line;
while (getline(f, line))
{
std::istringstream iss(line);
std::string docid, word;
if (line >> docid)
while (line >> word)
{
words_in_doc[docid].push_back(word);
docs_containing_word[word].push_back(docid);
}
}
// do whatever with your data/indices...
}
else
std::cerr << "unable to open input file\n";

Efficiently read CSV file with optional columns

I'm trying to write a program that reads in a CSV file (no need to worry about escaping anything, it's strictly formatted with no quotes) but any numeric item with a value of 0 is instead just left blank. So a normal line would look like:
12,string1,string2,3,,,string3,4.5
instead of
12,string1,string2,3,0,0,string3,4.5
I have some working code using vectors but it's way too slow.
int main(int argc, char** argv)
{
string filename("path\\to\\file.csv");
string outname("path\\to\\outfile.csv");
ifstream infile(filename.c_str());
if(!infile)
{
cerr << "Couldn't open file " << filename.c_str();
return 1;
}
vector<vector<string>> records;
string line;
while( getline(infile, line) )
{
vector<string> row;
string item;
istringstream ss(line);
while(getline(ss, item, ','))
{
row.push_back(item);
}
records.push_back(row);
}
return 0;
}
Is it possible to overload operator<< of ostream similar to How to use C++ to read in a .csv file and output in another form? when fields can be blank?
Would that improve the performance?
Or is there anything else I can do to get this to run faster?
Thanks
The time spent reading the string data from the file is greater than the time spent parsing it. You won't make significant time savings in the parsing of the string.
To make your program run faster, read bigger "chunks" into memory; get more data per read. Research on memory mapped files.
One alternative way to handle this to get better performance is to read the whole file into a buffer. Then go through the buffer and set pointers to where the values start, if you find a , or end of line put in a \0.
e.g. https://code.google.com/p/csv-routine/

search for specific row c++ tab delmited

AccountNumber Type Amount
15 checking 52.42
23 savings 51.51
11 checking 12.21
is my tab delmited file
i would like to be able to search for rows by the account number. say if i put in 23, i want to get that specific row. how would id do that?
also more advance, if i wanted to change a specific value, say amount 51.51 in account 23. how do i fetch that value and replace it with a new value?
so far im just reading in row by row
string line;
ifstream is("account.txt");
if (is.is_open())
{
while (std::getline(is, line)) // read one line at a time
{
string value;
string parseline;
std::istringstream iss(line);
getline(line, parseline);
cout << parseline << endl; // do something with the value
while (iss >> value) // read one value at at time from the line
{
//cout << line << " "; // do something with the value
}
}
is.close();
}
else
cout << "File cant be opened" << endl;
return 0;
Given that each line is of variable length there is no way to index to particular row without first parsing the entire file.
But I suspect your program will want to manipulate random rows and columns. So I'd start by parsing out the entire file. Put each row into its own data structure in an array, then index that row in the array.
You can use "strtok" to split the input up into rows, and then strtok again to split each row into fields.
If I were to do this, I would first write a few functions that parse the entire file and store the data in an appropriate data structure (such as an array or std::map). Then I would use the data structure for the required operations (such as searching or editing). Finally, I would write the data structure back to a file if there are any modifications.

Translating Program

I am beginning to write a translator program which will translate a string of text found on a file using parallel arrays. The language to translate is pig Latin. I created a text file to use as a pig latin to English dictionary. I didn't want to use any two dimension arrays; I want to keep the arrays in one dimension.
Basically I want to read a text file written in PigLatin and using the dictionary I created I want to output the translation to English on the command line.
My pseudo-code idea is:
Open the dictionary text file.
Ask the user for the name of the text file written in PigLatin that he/she wants to translate to English
Searching each word on the user's text file and comparing to the Dictionary to then translate the word accordingly. Keep on going until there are no more words to translate.
Show the translated words on the command line screen.
I was thinking on using a parallel arrays, one containing the english translated words and another one containing the pig latin words.
I would like to know how can I manipulate the strings using arrays in C++?
Thank you.
If files will be always translated in one direction (e.g. PigLatin -> English) then it would be easier and more efficient to use std::map to map one string to another:
std::map<std::string, std::string> dictionary;
dictionary["ashtray"] = "trash";
dictionary["underplay"] = "plunder";
And get translated word, just use dictionary[] to lookup (e.g. std::cout << dictionary["upidstay"] << std::endl;)
Pig latin can be translated on the fly.
Just split the words before the first vowel of each word and you won't need a dictionary file. Then concatenate the second part with the first part, delimited with a '-', and add "ay" at the end.
Unless you want to use a dictionary file?
Declaring an array of strings is easy, the same as declaring an array of anything else.
const int MaxWords = 100;
std::string piglatin[MaxWords];
That's an array of 100 string objects, and the array is named piglatin. The strings start out empty. You can fill the array like this:
int numWords = 0;
std::ifstream input("piglatin.txt");
std::string line;
while (std::getline(input, line) && numWords < MaxWords) {
piglatin[numWords] = line;
++numWords;
}
if (numWords == MaxWords) {
std::cerr << "Too many words" << std::endl;
}
I strongly recommend you not use an array. Use a container object, such as std::vector or std::deque, instead. That way, you can load the contents of the files without knowing in advance how big the files are. With the example declaration above, you need to make sure you don't have more than 100 entries in your file, and if there are fewer than 100, then you need to keep track of how many entries in your array are valid.
std::vector<std::string> piglatin;
std::ifstream input("piglatin.txt");
std::string line;
while (std::getline(input, line)) {
piglatin.push_back(line);
}