Splitting of string by spaces and outputting the columns into different arrays - c++

So i have this text file which basically has 2 columns of letters and numbers separated spaces. I want to split these 2 columns and place them in separate arrays.
I tried using the getLine method with space as the delimiter but I am only able to place them in the same array. I can do this with fileOpen.eof method but that causes too many problems in my program
while(getline(openFile, letters, ' ')){
index++;
lettersArray[index] = letters;
}
I expect the output of lettersArray[index] to be a column of letters only.

I think you are using the getline function in the wrong way. Take a look at how it works here: http://www.cplusplus.com/reference/string/string/getline/
You are basically telling the getline function to use the space character to use as the delimiter. So it is processing the letters in the file in the odd numbered iterations of the while loop and the numbers in the file in the even numbered iterations of the while loop.
If you want to stick to using the getline function, here is a possible modification to make it work.
while(getline(openFile, letters, ' ')){
index++;
lettersArray[index] = letters;
getline(openFile, letters);
}
The call to the getline function on the last line of the while loop, gets rid of the remaining part of the current line.

Related

How to read CSV file with newline and comma characters inside cells in C++

I've got a CSV file containing cells with break lines ("\n") and/or commas which are enclosed with double quotes.
When I use getline() function to get each row, it consider each line inside cell as a new row of csv file. In addition, when using splitIntoVec to get vector of each row, it condiders comma inside a cell as a new vector element.
I want to store the content of csv file into a vector of vectors which each row is a vector of strings inside its cells.
for instance, for the following csv file content
"Row 1 cell 1
With break line","Row1 cell2, with comma"
"Row 2 cell 1
With break line","Row2 cell2, with comma"
Row 3 cell 1,Row3 cell 2
I get the result vector of 4 string vectors which the first one has only one element and the second one has 3 elements.
Here is my code :
vector<vector<string>> readFromCsv(string &fileName, char rowDelimiter = "\n", char colDelimiter = ",") {
ifstream file(fileName); // declare file stream
string value;
vector<vector<string>> contentVec;
vector<string> rowVec;
string rowStr;
while (getline(file, rowStr, rowDelimiter)) {
rowVec = splitIntoVec(rowStr, colDelimiter);
contentVec.push_back(rowVec);
}
return contentVec;
}
Is there any other function (in libraries like boost) available to resolve these issues? Any help would be appreciated.
In PHP , I get the content of the csv file by fgetcsv() correctly . Is there any alternative function in c++?
#Simone already said in his comment that it is not the CSV file. But seeing your problem you will need to get your hand dirty and do some text processing to get it separate. You can read complete file in a string and then break it further using loops or which ever way you see fit. For this you will need to keep track of the encountered " while traversing and breaking only when it is not inside double quotes.
For Example,
(opening apostrophes)"Row 1 cell 1
With break line"(closing apostrophes),"(opening apostrophes)Row1 cell2, with comma"(closing apostrophes)
You will have to keep track of opening and closing double apostrophes using index or number and break for rows only if '\n' is found outside the opening and closing apostrophes.
You can use regex also if you are sure there are no " in the cells.
Thanks #Alex Useful link if someone else faces the same issue : http://mybyteofcode.blogspot.nl/2010/11/parse-csv-file-with-embedded-new-lines.html
You have to completely separate by ", keeping 2 states: inside "" and outside. , and EOL have different meanings based on the states.
You can use getline(file, rowStr, '"') to read in everything up to the ", but your logic to separate in records will be a bit more complex. If numbers are allowed without quotation marks, then it becomes even more complex.

Reading from a file, multiple delimeters

So I have some code which reads from a file and separates by the commas in the file. However some things in the file often have spaces after or before the commas so it's causing a bit of a problem when executing the code.
This is the code that I which reads in the data from the file. Using the same kind of format I was wondering if there was a way to prepare for this spaces
while(getline(inFile, line)){
stringstream linestream(line);
// each field of the inventory file
string type;
string code;
string countRaw;
int count;
string priceRaw;
int price;
string other;
//
if(getline(linestream,type,',') && getline(linestream,code,',')
&& getline(linestream,countRaw,',')&& getline(linestream,priceRaw,',')){
// optional other
getline(linestream,other,',');
count = atoi(countRaw.c_str());
price = atoi(priceRaw.c_str());
StockItem *t = factoryFunction(code, count, price, other, type);
list.tailAppend(t);
}
}
The better approach for those kind of problems is a state machine. Each character that you get should act in a simple way. You don't state if you need spaces between words non delimited by commas, so I suppose you need them. I don't know what you need to do with double spaces, I suppose you need to keep things as are. So start reading one character at a time, there are two variables the start positions and the limit position. When you start you are determining the start position ( state 1 ). If you find any character different than the space character you set that start position to that character and you change your state to ( state 2 ). When in state 2 when you find a non space character you set the limit position to the next position than the character you found. If you find a comma character you get the string that begins form start to limit and you change again into state 1.

(C++) Seekg in fstream cutting off characters

So I'm not entirely sure why this is happening. I've tried just adding in spaces before the words in the txt file that I'm reading from and it fixes it for some, but not all. Basically I'm just trying to return a name, and each name in the file is on a different line. But when i print the names, some of them are cut off, like "Dillon" would be "llon" or "Stephanie" will be "phanie" and so on. Here's the use of seekg:
string Employee::randomFirstName()
{
int i;
string fName;
i = rand() % 100;
ifstream firstName;
firstName.open("First Names.txt", ios::out);
firstName.seekg(i);
firstName >> fName;
return fName;
}
So, I would post the txt file, but its just a list of names, one per line, 100 of them. I've tried looking up examples of the use of seekg, but I cant seem to figure out why it cuts off some. Also, it only cuts off sometimes. One output it'll print out "Dillon" right, next it would print "llon".
Any help would be appreciated
istream::seekg() will move to a character position. Therefore, seeking to a random character position between 0 and 99 (rand() % 100) may end up in the middle of a line. There is no way for seekg to know you wanted to seek to a line number: it has no concept of lines.
You can instead use std::getline for i number of times to get to that specific line.

Input filtering using scanf

I want to filter input. I don't know what is the best way. I want words starting with alpha-bates to be read. For example, if the input is:
This is 1 EXAMPLE1 input.
The string should be like this:
This is EXAMPLE1 input
What is the easiest way to filter input like this?
I tried using "%[a-zA-Z]s", but it not working.
Your scan string "%[a-zA-Z]s" probably isn't want you think it is. Drop that trailing s.
"%[a-zA-Z]" will scan a string consisting entirely of lower and uppercase letters. So numbers will be discounted. However, you want to scan alpha-numeric strings that begin with a lower or uppercase letter. scanf doesn't provide a facility to look for a string in that way. You can, instead, scan for an alpha-numeric string with "%[a-zA-Z0-9]", and then drop the scanned input if it the first character of the string is numeric.
Using scanf is tricky for various reasons. The string may be longer than you expect, and cause buffer overflow. If the input isn't in the format you expect, then scanf may fail to advance past the unexpected input. It is usually more reliable to read the input into a buffer unconditionally, and parse the buffer. For example:
const char *wants
= "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
std::string word;
while (std::cin >> word) {
if (!isalpha(word[0])) continue;
std::string::size_type p = word.find_first_not_of(wants);
word = word.substr(0, p);
//... do something with word
}

input, output and \n's

So I'm trying to solve this problem that asks to look for palindromes in strings, so seems like I've got everything right, however the problem is with the output.
Here's the original and my out put:
http://pastebin.com/c6Gh8kB9
Here's whats been said about input and input of the problem:
Input format :
A file with no more than 20,000
characters. The file has one or more
lines. No line is longer than 80
characters (not counting the newline
at the end).
Output format :
The first line of the output should be the length of the longest
palindrome found. The next line or
lines should be the actual text of the
palindrome (without any surrounding
white space or punctuation but with
all other characters) printed on a
line (or more than one line if
newlines are included in the
palindromic text). If there are
multiple palindromes of longest
length, output the one that appears
first.
Here's how I read the input :
string test;
string original;
while (getline(fin,test))
original += test;
And here's how I output it:
int len = answer.length();
answer = cleanUp(answer);
while (len > 0){
string s3 = answer.substr(0,80);
answer.erase(0,80);
fout << s3 << endl;
len -= 80;
}
cleanUp() is a function to remove the illegal characters from the beginning and the end. I'm guessing that the problem is with \n's and the way I read the input. How can I fix this ?
No line is longer than 80 characters (not counting the newline at the end)
does not imply that every line is 80 characters except for the last, while your output code does assume this by taking 80 characters off answer in every iteration.
You may want to keep the newlines in the string until the output phase. Alternatively, you might store newline positions in a separate std::vector. The first option complicates your palindrome search routine; the second your output code.
(If I were you, I'd also index into answer instead of taking chunks off with substr/erase; your output code is now O(n^2) while it could be O(n).)
After rereading, it appears that I misunderstood the question. I was thinking in terms of each line representing a single word, and the intent is to test whether that "word" is palindromic.
After rereading, I think the question is really more like: "Given a sequence of up to 20,000 characters, find the longest palindromic sub-sequence. Oh, incidentally, the input is broken up into lines of no more than 80 characters."
If that's correct, I'd ignore the line-length completely. I'd read the entire file into a single buffer, then search for palindromes in that buffer.
To find the palindromes, I'd simply walk through each position in the array, and find the longest possible palindrome with that as its center point:
for (int i=1; i<total_chars; i++)
for (n=1; n<min(i, total_chars-i); n++)
if (array[i+n] != array[i-n])
// Candidate palindrome is from array[i-n+1] to array[i+n-1]