Loading a Linux formatted text file using a C++ windows application - c++

My problem is I am trying to load a Linux formatted text file and read it regularly as is one was opening up a windows formatted text file in a C++ application. I have gotten it to work perfectly when the file is formatted exactly how I want it to be in windows and have the data form the file loaded into a list of lists.
To try to describe my problem a little better what I am currently able to do right now is if I have a file which is tab delimited I am able to store all of the contents from each row into a list of strings where each string is whatever each tab is separating. I then have a list of all of the rows.
For example my text file I'm reading my look something like this:
156 Hit 83.2 23:34
23 Miss 21.4 23:38
and so on....
This code spinet below is what I have been using, which I had found help elsewhere and altered it to work how I needed it to. It will create a list with two items in the list where each of the items contains a list of 4 strings each string representing the contents in each of "columns" for the current row. Hope this is explained thorough enough.
ifstream infile(file);
list <list <string> > data;
while (infile){
string s;
if (!getline( infile, s )) break;
std::istringstream ss ( s );
list <string> record;
while (ss){
string s;
if (!getline( ss, s, '\t' )) break;
record.push_back( s );
}
data.push_back( record );
}
That is exactly what I would like to do however instead of the text file I would be reading from being formatted as a Windows text file it will be a Linux text file and will not have a tab in-between each "item" in each row; but instead will contain a random number of spaces. My thought was I could open the file up in binary mode and read it that way and instead of having a tab be my delimiter I could choose any amount of white space. However I am not exactly sure how to do that as I am still relatively new to C++ and have not specifically worked with reading Linux formatted text files from a Windows C++ application. Any help would be greatly appreciated. Thanks in advance!

This has nothing to do with Linux versus Windows. You can use >> to perform formatted input of whitespace-separated fields:
string s;
while (ss >> s)
record.push_back(s);
To skip whitespace explicitly, use std::ws; to disable whitespace skipping, use std::noskipws.

Related

Ignoring remaining newlines and white space when reading input file (C++)

I have a function that reads a text file as input and stores the data in a vector.
It works, as long as the text file doesn't contain any extra new lines or white space.
Here is the code I currently have:
std::ifstream dataStream;
dataStream.open(inputFileName, std::ios_base::in);
std::string pushThis;
while(dataStream >> pushThis){
dataVector.push_back(pushThis);
}
For example:
safe mace
bait mate
The above works as an input text file.
This does not work:
safe mace
bait mate
Is there any way to stop the stream once you reach the final character in the file, while still maintaining separation via white space between words in order to add them to something like a vector, stack, whatever?
i.e. a vector would contain ['safe', 'mace', 'bait', 'mate']
Answer:
The problem came from having two streams, one using !dataStream.eof() and the other using dataStream >> pushThis.
Fixed so that both use dataStream >> pushThis.
For future reference for myself and others who may find this:
Don't use eof() unless you want to grab the ending bit(s) of a file (whitespace inclusive).

Formatting issue with getline()

I'm trying to read some data out of a text file. One of the data names is Chamber temperature [°C]. I read the file with the command: getline(myfile, tab, '\t'); out.
The problem is that the degree sign is formatted into "Chamber temperature [�C]".
How can I prevent c++ from deformatting the degree sign?
P.S. : In the text file the sign is formatted correctly
Code:
//just create a txt file on your desktop which only stores "Chamber Temperature [°C]
myfile.open("C:\\Users\\user\\Desktop\\test.txt");
string tab = "";
getline(myfile, tab, '\t');
cout << tab << endl;
When you have the same setting as i described below you should have the same problem, well it is not a Problem just a language difference. UTF-8 just cant interpret the signs as ANSI.
There are solutions in which I can look for the substring and then replace the format as I wish, but I would like to have a foolproof and safe way to use this code in any case. So I'm looking for a conversion between these 2 languages.
Additional Information about my environment:
I use eclipse with a MinGW compiler and the accent c++11. I use default text file encoding UTF-8 and the new Text file delimiter UNIX.
I opened the file in notepad++ and it gives me the estimation of the file format "ANSI".
I use a simple ifstream to read the data into a 3D vector (first dimension: file; second dimension: row data; third dimension:columns). I use the getline to read each sequence delimited by a tab into a variable ... and in the end into my matrix.
Now after I have stored the data into my matrix I so some data searching and here comes my problem. Because the file is formatted in ANSI I cant compare the string Chamber Temperature [°C] with the stored data, since it will never find it.
I need to convert the text file into UTF-8 format and then store it into my 3D matrix. Is this possible? I new into coding, so could you please provide me with an example code or pseudo code?

Is it possible to skip a line in a data file?

I have a data file that I am trying to input and the data is split into sections via a blank line. The data will be read in from a text file.
How do I make my code skip a blank line to read in the next piece of data? I am currently just in the planning stages of my application.
I'm a beginner so I'm not really sure how to go about this.
Can anyone advise a method on how to approach this?
I have just written it out and my code looks like this:
string ship2_id;
char ship2_journey_id[20];
float ship2_l;
int ship2_s;
getline(itinerary_file, ship2_id);
if (ship2_id = ' ')
{
itinerary_file.ignore(numeric_limits<streamsize>::max(), '\n');
}
else
getline(itinerary_file, ship2_id);
cout << ship2_id << endl;
Yes,
stream.ignore(max_number_of_chars_to_be_skipped, '\n');
I usually just use 1ul<<30 or similar for the first parameter, but
this could be a DoS vector if the input is untrusted and slow to skip those chars
the "pedant" value would read std::numeric_limits<std::stream_pos>::max() or similar
I don't what are you using to read the file, but, to search for a blank line, look for two "line breaks" together. Take in account that the "line breaker" character is different for some OS. In Windows, by default, there are two characters that are used together for a line break.

getline() text with UNIX formatting characters

I am writing a C++ program which reads lines of text from a .txt file. Unfortunately the text file is generated by a twenty-something year old UNIX program and it contains a lot of bizarre formatting characters.
The first few lines of the file are plain, English text and these are read with no problems. However, whenever a line contains one or more of these strange characters mixed in with the text, that entire line is read as characters and the data is lost.
The really confusing part is that if I manually delete the first couple of lines so that the very first character in the file is one of these unusual characters, then everything in the file is read perfectly. The unusual characters obviously just display as little ascii squiggles -arrows, smiley faces etc, which is fine. It seems as though a decision is being made automatically, without my knowledge or consent, based on the first line read.
Based on some googling, I suspected that the issue might be with the locale, but according to the visual studio debugger, the locale property of the ifstream object is "C" in both scenarios.
The code which reads the data is as follows:
//Function to open file at location specified by inFilePath, load and process data
int OpenFile(const char* inFilePath)
{
string line;
ifstream codeFile;
//open text file
codeFile.open(inFilePath,ios::in);
//read file line by line
while ( codeFile.good() )
{
getline(codeFile,line);
//check non-zero length
if (line != "")
ProcessLine(&line[0]);
}
//close line
codeFile.close();
return 1;
}
If anyone has any suggestions as to what might be going on or how to fix it, they would be very welcome.
From reading about your issues it sounds like you are reading in binary data, which will cause getline() to throw out content or simply skip over the line.
You have a couple of choices:
If you simply need lines from the data file you can first sanitise them by removing all non-printable characters (that is the "official" name for those weird ascii characters). On UNIX a tool such as strings would help you with that process.
You can off course also do this programmatically in your code by simply reading in X amount of data, storing it in a string, and then removing those characters that fall outside of the standard ASCII character range. This will most likely cause you to lose any unicode that may be stored in the file.
You change your program to understand the format and basically write a parser that allows you to parse the document in a more sane way.
If you can, I would suggest trying solution number 1, simply to see if the results are sane and can still be used. You mention that this is medical data, do you per-chance know what file format this is? If you are trying to find out and have access to a unix/linux machine you can use the utility file and maybe it can give you a clue (worst case it will tell you it is simply data).
If possible try getting a "clean" file that you can post the hex dump of so that we can try to provide better help than that what we are currently providing. With clean I mean that there is no personally identifying information in the file.
For number 2, open the file in binary mode. You mentioned using Windows, binary and non-binary files in std::fstream objects are handled differently, whereas on UNIX systems this is not the case (on most systems, I'm sure I'll get a comment regarding the one system that doesn't match this description).
codeFile.open(inFilePath,ios::in);
would become
codeFile.open(inFilePath, ios::in | ios::binary);
Instead of getline() you will want to become intimately familiar with .read() which will allow unformatted operations on the ifstream.
Reading will be like this:
// This code has not been tested!
char input[1024];
codeFile.read(input, 1024);
int actual_read = codeFile.gcount();
// Here you can process input, up to a maximum of actual_read characters.
//ProcessLine() // We didn't necessarily read a line!
ProcessData(input, actual_read);
The other thing as mentioned is that you can change the locale for the current stream and change the separator it considers a new line, maybe this will fix your issue without requiring to use the unformatted operators:
imbue the stream with a new locale that only knows about the newline. This method may or may not let your getline() function without issues.

How do you read a word in from a file in C++?

So I was feeling bored and decided I wanted to make a hangman game. I did an assignment like this back in high school when I first took C++. But this was before I even too geometry, so unfortunately I didn't do well in any way shape or form in it, and after the semester I trashed everything in a fit of rage.
I'm looking to make a txt document and just throw in a whole bunch of words
(ie:
test
love
hungery
flummuxed
discombobulated
pie
awkward
you
get
the
idea
)
So here's my question:
How do I get C++ to read a random word from the document?
I have a feeling #include<ctime> will be needed, as well as srand(time(0)); to get some kind of pseudorandom choice...but I haven't the foggiest on how to have a random word taken from a file...any suggestions?
Thanks ahead of time!
Here's a rough sketch, assuming that the words are separated by whitespaces (space, tab, newline, etc):
vector<string> words;
ifstream in("words.txt");
while(in) {
string word;
in >> word;
words.push_back(word);
}
string r=words[rand()%words.size()];
The operator >> used on a string will read 1 (white) space separated word from a stream.
So the question is do you want to read the file each time you pick a word or do you want to load the file into memory and then pick up the word from a memory structure. Without more information I can only guess.
Pick a Word from a file:
// Note a an ifstream is also an istream.
std::string pickWordFromAStream(std::istream& s,std::size_t pos)
{
std::istream_iterator<std::string> iter(s);
for(;pos;--pos)
{ ++iter;
}
// This code assumes that pos is smaller or equal to
// the number of words in the file
return *iter;
}
Load a file into memory:
void loadStreamIntoVector(std::istream& s,std::vector<std::string> words)
{
std::copy(std::istream_iterator<std::string>(s),
std::istream_iterator<std::string>(),
std::back_inserter(words)
);
}
Generating a random number should be easy enough. Assuming you only want psudo-random.
I would recommend creating a plain text file (.txt) in Notepad and using the standard C file APIs (fopen(), and fread()) to read from it. You can use fgets() to read each line one at a time.
Once you have your plain text file, just read each line into an array and then randomly choose an entry in the array using the method you've suggested above.