I'm trying to read some data out of a text file. One of the data names is Chamber temperature [°C]. I read the file with the command: getline(myfile, tab, '\t'); out.
The problem is that the degree sign is formatted into "Chamber temperature [�C]".
How can I prevent c++ from deformatting the degree sign?
P.S. : In the text file the sign is formatted correctly
Code:
//just create a txt file on your desktop which only stores "Chamber Temperature [°C]
myfile.open("C:\\Users\\user\\Desktop\\test.txt");
string tab = "";
getline(myfile, tab, '\t');
cout << tab << endl;
When you have the same setting as i described below you should have the same problem, well it is not a Problem just a language difference. UTF-8 just cant interpret the signs as ANSI.
There are solutions in which I can look for the substring and then replace the format as I wish, but I would like to have a foolproof and safe way to use this code in any case. So I'm looking for a conversion between these 2 languages.
Additional Information about my environment:
I use eclipse with a MinGW compiler and the accent c++11. I use default text file encoding UTF-8 and the new Text file delimiter UNIX.
I opened the file in notepad++ and it gives me the estimation of the file format "ANSI".
I use a simple ifstream to read the data into a 3D vector (first dimension: file; second dimension: row data; third dimension:columns). I use the getline to read each sequence delimited by a tab into a variable ... and in the end into my matrix.
Now after I have stored the data into my matrix I so some data searching and here comes my problem. Because the file is formatted in ANSI I cant compare the string Chamber Temperature [°C] with the stored data, since it will never find it.
I need to convert the text file into UTF-8 format and then store it into my 3D matrix. Is this possible? I new into coding, so could you please provide me with an example code or pseudo code?
Related
I have a function that reads a text file as input and stores the data in a vector.
It works, as long as the text file doesn't contain any extra new lines or white space.
Here is the code I currently have:
std::ifstream dataStream;
dataStream.open(inputFileName, std::ios_base::in);
std::string pushThis;
while(dataStream >> pushThis){
dataVector.push_back(pushThis);
}
For example:
safe mace
bait mate
The above works as an input text file.
This does not work:
safe mace
bait mate
Is there any way to stop the stream once you reach the final character in the file, while still maintaining separation via white space between words in order to add them to something like a vector, stack, whatever?
i.e. a vector would contain ['safe', 'mace', 'bait', 'mate']
Answer:
The problem came from having two streams, one using !dataStream.eof() and the other using dataStream >> pushThis.
Fixed so that both use dataStream >> pushThis.
For future reference for myself and others who may find this:
Don't use eof() unless you want to grab the ending bit(s) of a file (whitespace inclusive).
I have a data file that I am trying to input and the data is split into sections via a blank line. The data will be read in from a text file.
How do I make my code skip a blank line to read in the next piece of data? I am currently just in the planning stages of my application.
I'm a beginner so I'm not really sure how to go about this.
Can anyone advise a method on how to approach this?
I have just written it out and my code looks like this:
string ship2_id;
char ship2_journey_id[20];
float ship2_l;
int ship2_s;
getline(itinerary_file, ship2_id);
if (ship2_id = ' ')
{
itinerary_file.ignore(numeric_limits<streamsize>::max(), '\n');
}
else
getline(itinerary_file, ship2_id);
cout << ship2_id << endl;
Yes,
stream.ignore(max_number_of_chars_to_be_skipped, '\n');
I usually just use 1ul<<30 or similar for the first parameter, but
this could be a DoS vector if the input is untrusted and slow to skip those chars
the "pedant" value would read std::numeric_limits<std::stream_pos>::max() or similar
I don't what are you using to read the file, but, to search for a blank line, look for two "line breaks" together. Take in account that the "line breaker" character is different for some OS. In Windows, by default, there are two characters that are used together for a line break.
I have this program that is supposed to load everything from a .txt file into a string and then display it. The problem I'm getting is that when I import the contents of the file they look different than if you view it in a simple text editor. This is what it looks like in a text editor:
bvwÅ.wÅ.Å}.ÅsqÄsÇ.sÑs|.]po{o.r}sÅ|Ç.y|}Ö.op}ÉÇ.wÇ
And this is what it looks like when it's imported and printed in my program:
bvw\201.w\201.\201}.\201sq\200s\202.s\204s|.]po{o.r}s\201|\202.y|}\205.op}\203\202.w\202
It seems like some characters are being encoded in a strange way, e.g. swedish "å" is stored as "/201". I want all of the text that my program handles to be Unicode, so that I can convert characters back and forth between chars and ints. This is how I import the text file:
//Imports the entire file as a string
string toBeDecrypted;
while(getline(inputFile, toBeDecrypted)){
string appendtemp;
getline(inputFile, appendtemp);
toBeDecrypted.append("\n");
toBeDecrypted.append(appendtemp);
}
inputFile.close();
My program also writes to files, so I want it to write in Unicode too.
EDIT
I solved the problem by changing the way that the input file is created, it no longer consists of any ASCII-extended characters.
My problem is I am trying to load a Linux formatted text file and read it regularly as is one was opening up a windows formatted text file in a C++ application. I have gotten it to work perfectly when the file is formatted exactly how I want it to be in windows and have the data form the file loaded into a list of lists.
To try to describe my problem a little better what I am currently able to do right now is if I have a file which is tab delimited I am able to store all of the contents from each row into a list of strings where each string is whatever each tab is separating. I then have a list of all of the rows.
For example my text file I'm reading my look something like this:
156 Hit 83.2 23:34
23 Miss 21.4 23:38
and so on....
This code spinet below is what I have been using, which I had found help elsewhere and altered it to work how I needed it to. It will create a list with two items in the list where each of the items contains a list of 4 strings each string representing the contents in each of "columns" for the current row. Hope this is explained thorough enough.
ifstream infile(file);
list <list <string> > data;
while (infile){
string s;
if (!getline( infile, s )) break;
std::istringstream ss ( s );
list <string> record;
while (ss){
string s;
if (!getline( ss, s, '\t' )) break;
record.push_back( s );
}
data.push_back( record );
}
That is exactly what I would like to do however instead of the text file I would be reading from being formatted as a Windows text file it will be a Linux text file and will not have a tab in-between each "item" in each row; but instead will contain a random number of spaces. My thought was I could open the file up in binary mode and read it that way and instead of having a tab be my delimiter I could choose any amount of white space. However I am not exactly sure how to do that as I am still relatively new to C++ and have not specifically worked with reading Linux formatted text files from a Windows C++ application. Any help would be greatly appreciated. Thanks in advance!
This has nothing to do with Linux versus Windows. You can use >> to perform formatted input of whitespace-separated fields:
string s;
while (ss >> s)
record.push_back(s);
To skip whitespace explicitly, use std::ws; to disable whitespace skipping, use std::noskipws.
I am writing a C++ program which reads lines of text from a .txt file. Unfortunately the text file is generated by a twenty-something year old UNIX program and it contains a lot of bizarre formatting characters.
The first few lines of the file are plain, English text and these are read with no problems. However, whenever a line contains one or more of these strange characters mixed in with the text, that entire line is read as characters and the data is lost.
The really confusing part is that if I manually delete the first couple of lines so that the very first character in the file is one of these unusual characters, then everything in the file is read perfectly. The unusual characters obviously just display as little ascii squiggles -arrows, smiley faces etc, which is fine. It seems as though a decision is being made automatically, without my knowledge or consent, based on the first line read.
Based on some googling, I suspected that the issue might be with the locale, but according to the visual studio debugger, the locale property of the ifstream object is "C" in both scenarios.
The code which reads the data is as follows:
//Function to open file at location specified by inFilePath, load and process data
int OpenFile(const char* inFilePath)
{
string line;
ifstream codeFile;
//open text file
codeFile.open(inFilePath,ios::in);
//read file line by line
while ( codeFile.good() )
{
getline(codeFile,line);
//check non-zero length
if (line != "")
ProcessLine(&line[0]);
}
//close line
codeFile.close();
return 1;
}
If anyone has any suggestions as to what might be going on or how to fix it, they would be very welcome.
From reading about your issues it sounds like you are reading in binary data, which will cause getline() to throw out content or simply skip over the line.
You have a couple of choices:
If you simply need lines from the data file you can first sanitise them by removing all non-printable characters (that is the "official" name for those weird ascii characters). On UNIX a tool such as strings would help you with that process.
You can off course also do this programmatically in your code by simply reading in X amount of data, storing it in a string, and then removing those characters that fall outside of the standard ASCII character range. This will most likely cause you to lose any unicode that may be stored in the file.
You change your program to understand the format and basically write a parser that allows you to parse the document in a more sane way.
If you can, I would suggest trying solution number 1, simply to see if the results are sane and can still be used. You mention that this is medical data, do you per-chance know what file format this is? If you are trying to find out and have access to a unix/linux machine you can use the utility file and maybe it can give you a clue (worst case it will tell you it is simply data).
If possible try getting a "clean" file that you can post the hex dump of so that we can try to provide better help than that what we are currently providing. With clean I mean that there is no personally identifying information in the file.
For number 2, open the file in binary mode. You mentioned using Windows, binary and non-binary files in std::fstream objects are handled differently, whereas on UNIX systems this is not the case (on most systems, I'm sure I'll get a comment regarding the one system that doesn't match this description).
codeFile.open(inFilePath,ios::in);
would become
codeFile.open(inFilePath, ios::in | ios::binary);
Instead of getline() you will want to become intimately familiar with .read() which will allow unformatted operations on the ifstream.
Reading will be like this:
// This code has not been tested!
char input[1024];
codeFile.read(input, 1024);
int actual_read = codeFile.gcount();
// Here you can process input, up to a maximum of actual_read characters.
//ProcessLine() // We didn't necessarily read a line!
ProcessData(input, actual_read);
The other thing as mentioned is that you can change the locale for the current stream and change the separator it considers a new line, maybe this will fix your issue without requiring to use the unformatted operators:
imbue the stream with a new locale that only knows about the newline. This method may or may not let your getline() function without issues.