How to ignore \n for istream strings - c++

Does anyone have any suggestions on how I can ignore the "\n" coming in from istream? I'm trying to extract data from txt file where in some "cells", text has been written in such that there are "\n" coming in from when the user pressed Enter.
My goal is to take in some parts of this text and output it with " ; " separating the parts. However, in doing so, sometimes a "\n" gets sucked in and the output cell starts continuing downwards instead of just line going from right to left (which is preferred).
Any advice would be much appreciated!
string vectorToString(vector<size_t> positionVector, string mainString,
string outputString, int numLetters)
{
//Check how many different positions were found and make that the length
//of the vector that will store all the strings for each finding
int positionVectorLength = positionVector.size();
//foundPos is the position of the cursor along the foundPositions vector
int vectorPosition=0;
while (vectorPosition < positionVectorLength)
{
//define local variable that will store the value along the vector
size_t vectorPositionValue = positionVector.at(vectorPosition);
//append the numLetters after the positionValue in string frameNum
outputString.append(mainString, vectorPositionValue, numLetters);
outputString.append(" ; ");
//reiterate until all the positions have been recorded
vectorPosition+=1;
}
//return the string of frame numbers
return(outputString);
}

You can use this to remove all the \n from your string:
mainString.erase(std::remove(mainString.begin(),mainString.end(),'\n'),mainString.end());

Related

C++: How to read multiple lines from file until a certain character, store them in array and move on to the next part of the file

I'm doing a hangman project for school, and one of the requirements is it needs to read the pictures of the hanging man from a text file. I have set up a text file with the '-' char which means the end of one picture and start of the next one.
I have this for loop set up to read the file until the delimiting character and store it in an array, but when testing I am getting incomplete pictures, cut off in certain places.
This is the code:
string s;
ifstream scenarijos("scenariji.txt");
for(int i = 0; i < 10; i++ ) {
getline(scenarijos, s, '-');
scenariji[i] = s;
}
For the record, scenariji is an array with type of string
And here is an example of the text file:
example
From your example input, it looks like '-' can be part of the input image (look at the "arms" of the hanged man). Unless you use some other, unique character to delimit the images, you won't be able to separate them.
If you know the dimensions of the images, you could read them without searching for the delimiter by reading a certain amount of bytes from the input file. Alternatively, you could define some more complex rules for image termination, e.g. when the '-' character is the only character in the line. For example:
ifstream scenarijos("scenariji.txt");
string scenariji[10];
for (int i = 0; i < 10; ++i) {
string& scenarij = scenariji[i];
while (scenarijos.good()) {
string s;
getline(scenarijos, s); // read line
if (!scenarijos.good() || s == "-")
break;
scenarij += s;
scenarij.push_back('\n'); // the trailing newline was removed by getline
}
}

Extracting certain integers from string C++

Good day to all,
I am having a hard time trying to extract desired integers from a string. I am given the following to read in from a file:
itemnameitemnumber price percentmarkup
examples
Gowns-u2285 24.22 37%
TwoB1Ask1-m1275 90.4 1%
What I have been trying to do is get the item number separated from the item name so that I can store the item number as a reference for sorting. As you can see the first example itemnameitemnumber is a clear cut character to digit separation, whereas the next example has numbers within its item name.
I have tried several different approaches, however with certain item names having integers apart of their name is proving to be beyond my experience.
If anyone can help me with this I would be greatly appreciative for their time and knowledge.
Good day,
I don't know, if you have a fixed number of digits for itemnumber, but i am going to assume that you don't.
This is a simple approach; first you have to separate the words of your line. For example, use std::istringstream.
When you have the line split to words, for example by giving its iterators to a vector, or reading it with operator>>, you start to check the first word from backwards, until you find anything that is not one of "0123456789 " (note the whitespace at the end).
After you've done this, you get the iterator about where these digits end (from backwards), and cut your original string, or if you have the opportunity, the already split string. Voilá! You have yourself your item name and item number.
For the record, i am going to do this whole thing, utilising the same technique for the percent markup too, of course with the exception characters being "% ".
#define VALID_DIGITS "0123456789 "
#define VALID_PERCENTAGE "% "
struct ItemData {
std::string Name;
int Count;
double Price;
double PercentMarkup;
};
int ExtractItemData(std::string Line, ItemData & Output) {
std::istringstream Stream( Line );
std::vector<std::string> Words( Stream.begin(), Stream.end() );
if (Words.size() < 3) {
/* somebody gave us a malformed line with less than needed words */
return -1;
}
// Search from backwards, until you do not find anything that is not digits (0-9) or a whitespace
std::size_t StartOfDigits = Words[0].find_last_not_of( VALID_DIGITS );
if (StartOfDigits == std::string::npos) {
/* error; your item name is invalid */
return -2;
}
else {
// Separate the string into 2 parts
Output.Name = Words[0].substr(0, StartOfDigits); // Get the first part
Output.Count = std::stoi( Words[0].substr(StartOfDigits, Words[0].length() - StartOfDigits) );
Output.Price = std::stod( Words[1] );
// Search from backwards, until we do not find anything that is not '%' or ' '
std::size_t StartOfPercent = Words[2].find_last_not_of(VALID_PERCENTAGE);
Output.PercentMarkup = std::stod( Words[2].substr(0, StartOfPercent) );
}
return 0;
}
Code requies includes sstream, vector, string, and cstdint if you do not have size_t defined
Hope the answer was useful.
Best of luck, COlda.
PS.: My first answer on stack overflow ^^;
you can iterate on the string pushing the numbers to a vector then use stringstream to convert them to integers

seekg() not working as expected

I have a small program, that is meant to copy a small phrase from a file, but it appears that I am either misinformed as to how seekg() works, or there is a problem in my code preventing the function from working as expected.
The text file contains:
//Intro
previouslyNoted=false
The code is meant to copy the word "false" into a string
std::fstream stats("text.txt", std::ios::out | std::ios::in);
//String that will hold the contents of the file
std::string statsStr = "";
//Integer to hold the index of the phrase we want to extract
int index = 0;
//COPY CONTENTS OF FILE TO STRING
while (!stats.eof())
{
static std::string tempString;
stats >> tempString;
statsStr += tempString + " ";
}
//FIND AND COPY PHRASE
index = statsStr.find("previouslyNoted="); //index is equal to 8
//Place the get pointer where "false" is expected to be
stats.seekg(index + strlen("previouslyNoted=")); //get pointer is placed at 24th index
//Copy phrase
stats >> previouslyNotedStr;
//Output phrase
std::cout << previouslyNotedStr << std::endl;
But for whatever reason, the program outputs:
=false
What I expected to happen:
I believe that I placed the get pointer at the 24th index of the file, which is where the phrase "false" begins. Then the program would've inputted from that index onward until a space character would have been met, or the end of the file would have been met.
What actually happened:
For whatever reason, the get pointer started an index before expected. And I'm not sure as to why. An explanation as to what is going wrong/what I'm doing wrong would be much appreciated.
Also, I do understand that I could simply make previouslyNotedStr a substring of statsStr, starting from where I wish, and I've already tried that with success. I'm really just experimenting here.
The VisualC++ tag means you are on windows. On Windows the end of line takes two characters (\r\n). When you read the file in a string at a time, this end-of-line sequence is treated as a delimiter and you replace it with a single space character.
Therefore after you read the file you statsStr does not match the contents of the file. Every where there is a new line in the file you have replaced two characters with one. Hence when you use seekg to position yourself in the file based on numbers you got from the statsStr string, you end up in the wrong place.
Even if you get the new line handling correct, you will still encounter problems if the file contains two or more consecutive white space characters, because these will be collapsed into a single space character by your read loop.
You are reading the file word by word. There are better methods:
while (getline(stats, statsSTr)
{
// An entire line is read into statsStr.
std::string::size_type posn = statsStr.find("previouslyNoted=");
// ...
}
By reading entire text lines into a string, there is no need to reposition the file.
Also, there is a white-space issue when reading by word. This will affect where you think the text is in the file. For example, white space is skipped, and there is no telling how many spaces, newlines or tabs were skipped.
By the way, don't even think about replacing the text in the same file. Replacement of text only works if the replacement text has the same length as the original text in the file. Write to a new file instead.
Edit 1:
A better method is to declare your key strings as array. This helps with positioning pointers within a string:
static const char key_text[] = "previouslyNoted=";
while (getline(stats, statsStr))
{
std::string::size_type key_position = statsStr.find(key_text);
std::string::size_type value_position = key_position + sizeof(key_text) - 1; // for the nul terminator.
// value_position points to the character after the '='.
// ...
}
You may want to save programming type by making your data file conform to an existing format, such as INI or XML, and using appropriate libraries to parse them.

Count the number of unique words and occurrence of each word

CSCI-15 Assignment #2, String processing. (60 points) Due 9/23/13
You MAY NOT use C++ string objects for anything in this program.
Write a C++ program that reads lines of text from a file using the ifstream getline() method, tokenizes the lines into words ("tokens") using strtok(), and keeps statistics on the data in the file. Your input and output file names will be supplied to your program on the command line, which you will access using argc and argv[].
You need to count the total number of words, the number of unique words, the count of each individual word, and the number of lines. Also, remember and print the longest and shortest words in the file. If there is a tie for longest or shortest word, you may resolve the tie in any consistent manner (e.g., use either the first one or the last one found, but use the same method for both longest and shortest). You may assume the lines comprise words (contiguous lower-case letters [a-z]) separated by spaces, terminated with a period. You may ignore the possibility of other punctuation marks, including possessives or contractions, like in "Jim's house". Lines before the last one in the file will have a newline ('\n') after the period. In your data files, omit the '\n' on the last line. You may assume that the lines will be no longer than 100 characters, the individual words will be no longer than 15 letters and there will be no more than 100 unique words in the file.
Read the lines from the input file, and echo-print them to the output file. After reaching end-of-file on the input file (or reading a line of length zero, which you should treat as the end of the input data), print the words with their occurrence counts, one word/count pair per line, and the collected statistics to the output file. You will also need to create other test files of your own. Also, your program must work correctly with an EMPTY input file – which has NO statistics.
Test file looks like this (exactly 4 lines, with NO NEWLINE on the last line):
the quick brown fox jumps over the lazy dog.
now is the time for all good men to come to the aid of their party.
all i want for christmas is my two front teeth.
the quick brown fox jumps over a lazy dog.
Copy and paste this into a small file for one of your tests.
Hints:
Use a 2-dimensional array of char, 100 rows by 16 columns (why not 15?), to hold the unique words, and a 1-dimensional array of ints with 100 elements to hold the associated counts. For each word, scan through the occupied lines in the array for a match (use strcmp()), and if you find a match, increment the associated count, otherwise (you got past the last word), add the word to the table and set its count to 1.
The separate longest word and the shortest word need to be saved off in their own C-strings. (Why can't you just keep a pointer to them in the tokenized data?)
Remember – put NO NEWLINE at the end of the last line, or your test for end-of-file might not work correctly. (This may cause the program to read a zero-length line before seeing end-of-file.)
This is not a long program – no more than about 2 pages of code
Here is what I have so far:
#include<iostream>
#include<iomanip>
#include<fstream>
#include<string>
#include<cstring>
using namespace std;
void totalwordCount(ifstream &inputFile)
{
char words[100][16]; // Holds the unique words.
char *token;
int totalCount = 0; // Counts the total number of words.
// Read every word in the file.
while(inputFile >> words[99])
{
totalCount++; // Increment the total number of words.
// Tokenize each word and remove spaces, periods, and newlines.
token = strtok(words[99], " .\n");
while(token != NULL)
{
token = strtok(NULL, " .\n");
}
}
cout << "Total number of words in file: " << totalCount << endl;
}
void uniquewordCount(ifstream &inputFile)
{
char words[100][16]; // Holds the unique words
int counter[100];
char *tok = "0";
int uniqueCount = 0; // Counts the total number of unique words
while(!inputFile.eof())
{
uniqueCount++;
tok = strtok(words[99], " .\n");
while(tok != NULL)
{
tok = strtok(NULL, " .\n");
inputFile >> words[99];
if(strcmp(tok, words[99]) == 0)
{
counter[99]++;
}
else
{
words[99][15] += 1;
}
uniqueCount++;
}
}
cout << counter[99] << endl;
}
int main(int argc, char *argv[])
{
ifstream inputFile;
char inFile[12] = "string1.txt";
char outFile[16] = "word result.txt";
// Get the name of the file from the user.
cout << "Enter the name of the file: ";
cin >> inFile;
// Open the input file.
inputFile.open(inFile);
// If successfully opened, process the data.
if(inputFile)
{
while(!inputFile.eof())
{
totalwordCount(inputFile);
uniquewordCount(inputFile);
}
}
return 0;
}
I already took care of how to count the total number of words in the file in the totalwordCount() function, but in the uniquewordCount() function, I am having trouble counting the total number of unique words and counting the number of occurrences of each word. Is there anything that I need to change in the uniquewordCount() function?
This program contains several issues which are to be considered harmful! To prevent bad software being created based on entirely nonsensical assignments like the above, here are a number of hints:
Always test the stream for success after reading from it. Using in.eof() to determine if the stream is in a good state does not work! One of the problems is that you will get an infinite loop if the stream goes bad for a different reason than end of file, e.g., failure to correctly parse a value (this will set std::ios_base::failbit but not std::ios_base::eofbit.
Reading to a fixed size char array a using in >> a without having set up limits for the number of characters to be read is the C++ way to spell gets()! If you really think that using in >> a is the right way to (see next item), you absolutely need to set up the array's width, e.g., using in >> std::setw(sizeof(a)) >> a. You still need to check that this extraction was successful, of course.
From the looks of it, your teacher wants you to actually use std::istream::getline() to read the array, e.g., using in.getline(a, sizeof(a)) (which, of course, needs to be checked for success).
Note that the formatted input, i.e., in >> a already tokenizes the stream being received by spaces! There is no need to faff about with strtok() after that.
Once you have consumed a stream, it is consumed. Assuming the characters don't come from a file but rather from something like standard input, you also can't rewind the stream to read it again. I'd think you want to tokenize the values once and use them for both purposes.
This is more of a sidenote: after you created a stream, its nature should be entirely immaterial for the processing of the stream's content (although, e.g., for string streams you might want to eventually collect the result using the str() member): implement your stream processing functions in terms of std::istream rather than std::ifstream!
Since you have a concrete question ("Is there anything that I need to change in the uniquewordCount() function?"): yes, everything! Throw away this function entirely and rethink what you need to do. Basically, the structure of the functionality should be along the lines of
char buffer[100];
while (in.getline(buffer, sizeof(buffer))) {
// tokenize buffer into words
// for each word check if it already exists
// if the word does not exist, append it to the array of known words and set count to 1
// if the word exists, increment the count
// determine if the word is shorter or longer than the shortest or longest word so far
// if it is the case, remember the word's index or a pointer to it
}

Why is this char stopping my program?

Does the newline character have some kind of special significance in c++? Is it a non-ASCII character?
I'm trying to build a Markov chain for each unique n-character substring within a larger piece of text. Every time I come across a new unique substring I enter it into a map whose value is a 256-element vector (one element for each character in the extended ASCII table).
There's no problem when I print out the entire contents of the file ("lines" is a vector of lines of text built using ifstream and getline):
for(int i=0; i<lines.size(); i++) cout << lines[i] << endl;
The whole text file shows up in the console. The problem happens when I try to return the newline character to a function that's expecting a char. "moveSpaces" is an integer constant that determines how many characters further ahead to move in the vector of strings on each iteration.
char GetNextChar(int row, int col){
for (int i=0; i<MOVESPACES; i++) {
if (col+1<lines[row].size()) {
col+=1;
} else { // If you're not at the end of the line keep going
row+=1; // Otherwise, move to the beginning of the next row
col=0;
}
}
return lines[row].at(col);
}
I've walked through with the debugger, and when it gets to the 1st column of the 2nd line it craps out on me – no error or anything. It fails within this function, not the calling function.
The file I'm using is A Christmas Carol (first thing that came up on Project Gutenberg). For reference here are the first few lines:
STAVE I: MARLEY'S GHOST
MARLEY was dead: to begin with. There is no doubt
whatever about that. The register of his burial was
The function breaks when it should return the first character on the second line. This doesn't happen if I get rid of the newline, or if I build the "lines" vector myself line by line in the program. Any idea what's wrong?
Your GetNextChar function is assuming that if you are at the last character in some line, there will be a character in the next line. What happens if there is no character in that next line? This can happen in two places: When you have hit end of file, or when the next line is the empty string.
The second line is the empty string.