I am keeping a large repository of strings in a character-delimited file. Currently, I am reading the strings into string variables, and then later printing them.
The problem I'm facing is how to store and print new line characters. In the file, if the string, for example, is:
"Hello this is \n\n a new line"
then the literal '\n' is printed in my program terminal when I print the string, however I would like to print new lines.
Is this a matter of processing the strings character by character, or is there a proper way to read the strings into the string variables that will allow this to work?
Related
If I have a text file where lines contains some non-blank characters followed by spaces, how do I read those lines into a character variable without excess spaces?
character (len=1000) :: text
open (unit=20,file="foo.txt",action="read")
read (20,"(a)") text
will read the first 1000 characters of a line into variable text, which will be padded with spaces at the end if there are fewer than 1000 characters in the line. But if the line length is 100 you have 900 extraneous spaces, and the program does not "know" how long the line read actually was.
Fortran strings are blank-padded. There is simply no chance to distinguish any significant blank-padding in your strings with constant-length Fortran strings.
If every whitespace character is important, I suggest to treat the file as a stream-access file instead (formated or unformatted as needed), read individual characters to some array buffer and allocate a deferred-length string only after you know the length you actually need.
character (len=1000) :: text
integer :: s, ios
open (unit=20,file="foo.txt",action="read")
read (20,"(a)", size=s, advance='no', iostat=ios) text
After that last line, s contains the number of characters read, including trailing spaces, which I think is what you wanted.
Notes:
With a size tag, you must also have an advance tag set to 'no' otherwise you get a compilation error. Since the format is "(a)", the whole line is read so the next read statement will advance to the next line despite the 'no'. That's fine.
ios stores a negative integer when attempting to read past the end of the line. This will always happen if the line is shorter than length of text. That's fine.
When attempting to read past the end of the file, ios will store a different negative integer. What those two negative integers are is not set by the standard I think so you may have to experiment a bit. In my case, with the gfortran compiler, ios was -1 when attempting to read past the end of the file and -2 otherwise.
Why do these two print different things? The first prints abcd but the second prints \x61\x62\x63\x64. What do I need to do to make the line from the file to be read as abcd?
std::string line("\x61\x62\x63\x64");
ifstream myfile ("myfile.txt"); //<-- the file contains \x61\x62\x63\x64
std::string line_file;
getline(myfile,line_file);
cout << line << endl;
cout << line_file << endl;
In c++, the backslash is an escape character, which can be used to represent special characters such as new-lines \n and tabs \t, or in your case, hexadecimal representations of ASCII characters in string literals. If you actually want to store a backslash in c++ you have to escape it: char c='\\'. When you read a backslash from a file, it's not treated as an escape character, but as an actual backslash.
It has to do with the input file stream character interpretation:
File streams opened in binary mode perform input and output operations independently of any format considerations. Non-binary files are known as text files, and some translations may occur due to formatting of some special characters (like newline and carriage return characters).
Text file streams are those where the ios::binary flag is not included in their opening mode. These files are designed to store text and thus all values that are input or output from/to them can suffer some formatting transformations, which do not necessarily correspond to their literal binary value.
So, the backslashes'\' are the most probable reason your ifstream is reading and interpreting the bytes from the file differently (as separate characters), as opposed to the string that contains information about its value, thus making it non-ambiguous.
For further reading see how fstreams work and learn about character literals backslash escape.
I am getting some large logs from a several sources and appending them to a list. All of the list values have several \n characters.
Once the list has been populated with the logs I require, I want to output them to a file in $HOME like so:
def logfile_creation(self):
with open(os.path.join(self.homedir, self.logfile), 'w') as logoutput:
for output in self.logs:
logoutput.writelines(str(output))
When I read the logoutput file, instead of a newline the \n character is printed. I assume this is occurring because each list value is being converted into a string with str(), however this seems required for writelines to output into the file.
What's the best way to process the newline as it's outputted into the file, rather than printing \n?
In case anyone was wondering, the way I resolved this was to use:
logoutput.writelines(str(output).replace('\n', '\n'))
Easy as that :)
I'm programming an application that converts .txt files to bags of words for text mining. However, I keep getting non-alphabetic characters ( like ¾ and =) even though my application filters non-alphabetic characters:
My vector passes through a loop which erases strings that begins with a char with an ASCII value other than [65,90] (from A to Z). These characters also pass the isalpha test. It seems like these characters can't be distinguished from alphabetic characters.
I don't see how I can remove these weird strings dynamically from my vector of strings. I need help.
My code because it is quite long for a forum post.
This part of my code fails to get rid of the strings beginning with non-aphabetic characters:
for (unsigned int i=0; i<token24.size();i++){
string temp = token24[i];
char c = temp[0];
if(c>90||c<65){
token24.erase(token24.begin()+i);
i--;
}
}
I also tried with the condition
(c>'Z'||c<'A')
You could always do a string replace the characters with whitespace, but that just handles the specific cases of specific characters, not the larger problem.
I don't think we can do anything for you until we see the code.
The most important part in programs like yours is handling the content of .txt file. Such file can be a Unicode text, which in turn can be encoded, for eample, with UTF-8. Then, single byte can be only a part of a character, not character itself. Are you sure you load (and possibly, decode) the file in a proper way?
Also, don't you think that lower letters are also valid alpha characters?
First off, I'm a complete beginner at C++.
I'm coding something using an API, and would like to pass text containing new lines to it, and have it print out the new lines at the other end.
If I hardcode whatever I want it to print out, like so
printInApp("Hello\nWorld");
it does come out as separate lines in the other end, but if I retrieve the text from the app using a method that returns a const char then pass it straight to printInApp (which takes const char as argument), it comes out as a single line.
Why's this and how would I go about to fix it?
It is the compiler that process escape codes in string literals, not the runtime methods. This is why you can for example have "char c = '\n';" since the compiler just compiles it as "char c = 10".
If you want to process escape codes in strings such as '\' and 'n' as separate characters (eg read as such from a file), you will need to write (or use an existing one) a string function which finds the escape codes and converts them to other values, eg converting a '\' followed by a 'n' into a newline (ascii value 10).