I'm struggling with the following: I'm reading from an XML file the following std::stringstream
"sigma=0\nreset"
Which after some copying&processing is written to a text-file. And I was hoping for the following
sigma=0
reset
But sadly I only get
sigma=0\nreset
but when I directly stream
out << "sigma=0\nreset"
I get:
sigma=0
reset
I currently suspect that some qualifier of the "\n" is lost during the "copy&processing"... is this possible? How to track down a "\n" in the stream which isn't a linefeed anymore?
Thank you!
It's because the output functions doesn't handle the escape sequences like '\n', it's the compiler that does and then only for literals. The compiler knows nothing of the contents of strings, and so can not do the translation "\n" to newline when inside a string.
You have to parse the string itself, and write out newlines when appropriate.
Assuming that the std::stringstream actually contains what is equivalent to the literal "sigma=0\\nreset" (length = 14 characters) and not "sigma=0\nreset" (length = 13 characters), you'll have to replace it yourself. Doing so is not very difficult, either use boost's replace_all (http://www.boost.org/doc/libs/1_53_0/doc/html/boost/algorithm/replace_all.html), or std::string::find and std::string::replace:
std::stringstream inStream;
inStream.str ("sigma=0\\nreset");
std::string content = inStream.str();
size_t index = content.find("\\n",0);
while(index != std::string::npos)
{
content.replace(index, 2, "\n");
index = content.find("\\n",index);
}
std::cout << content << '\n';
Note: you may want to consider cases when the system end-of-line is something other than "\n"
If the std::stringstream actually contains "sigma=0\nreset", then please post the code that does the copying/processing and the writing to the text file.
Related
I'm trying to pass in data from a string into a .csv file, but I would like to know how i can change the delimiter from comma to any other char or even have no delimiters?
There will be commas in data thus other than stripping the commas, is it possible to remove delimiters/change the char of delimiters when writing to data.csv?
snippet of code:
string buffer, data;
ofstream oFile;
oFile.open("data.csv");
//some code to generate buffer
while (buffer.length() != 0){
size_t pos = buffer.find("end");
data = buffer.substr(0, pos);
buffer.erase(0, pos);
oFile << data;
oFile << "\n";
}
oFile.close();
I don't see where you output any comma in your code. Of course, when writing the file you can output any character as field separator.
The problem in your case may be with the code that reads the file. This is the place where you have to tell that the separator is something different than comma.
Maybe it helps to output your fields using double quotes? Then the reader may ignore the commas inside the quoted string.
...as someone may remember, I'm still stuck on C++ strings. Ok, I can write a string to a file using a fstream as follows
outStream.write((char *) s.c_str(), s.size());
When I want to read that string, I can do
inStream.read((char *) s.c_str(), s.size());
Everything works as expected. The problem is: if I change the length of my string after writing it to a file and before reading it again, printing that string won't bring me back my original string but a shorter/longer one. So: if I have to store many strings on a file, how can I know their size when reading it back?
Thanks a lot!
You shouldn’t be using the unformatted I/O functions (read() and write()) if you just want to write ordinary human-readable string data. Generally you only use those functions when you need to read and write compact binary data, which for a beginner is probably unnecessary. You can write ordinary lines of text instead:
std::string text = "This is some test data.";
{
std::ofstream file("data.txt");
file << text << '\n';
}
Then read them back with getline():
{
std::ifstream file("data.txt");
std::string line;
std::getline(file, line);
// line == text
}
You can also use the regular formatting operator >> to read, but when applied to string, it reads tokens (nonwhitespace characters separated by whitespace), not whole lines:
{
std::ifstream file("data.txt");
std::vector<std::string> words;
std::string word;
while (file >> word) {
words.push_back(word);
}
// words == {"This", "is", "some", "test", "data."}
}
All of the formatted I/O functions automatically handle memory management for you, so there is no need to worry about the length of your strings.
Although your writing solution is more or less acceptable, your reading solution is fundamentally flawed: it uses the internal storage of your old string as a character buffer for your new string, which is very, very bad (to put it mildly).
You should switch to a formatted way of reading and writing the streams, like this:
Writing:
outStream << s;
Reading:
inStream >> s;
This way you would not need to bother determining the lengths of your strings at all.
This code is different in that it stops at whitespace characters; you can use getline if you want to stop only at \n characters.
You can write the strings and write an additional 0 (null terminator) to the file. Then it will be easy to separate strings later. Also, you might want to read and write lines
outfile << string1 << endl;
getline(infile, string2, '\n');
If you want to use unformatted I/O your only real options are to either use a fixed size or to prepend the size somehow so you know how many characters to read. Otherwise, when using formatted I/O it somewhat depends on what your strings contain: if they can contain all viable characters, you would need to implement some sort of quoting mechanism. In simple cases, where strings consist e.g. of space-free sequence, you can just use formatted I/O and be sure to write a space after each string. If your strings don't contain some character useful as a quote, it is relatively easy to process quotes:
std::istream& quote(std::istream& out) {
char c;
if (in >> c && c != '"') {
in.setstate(std::ios_base::failbit;
}
}
out << '"' << string << "'";
std::getline(in >> std::ws >> quote, string, '"');
Obviously, you might want to bundle this functionality a class.
Suppose that I get a stringbuf with some content that include certain character sequences who must be removed:
std::stringbuf string_buff;
std::iostream io_stream (&string_buff);
io_stream << "part-one\r\npart-two\r\npart-three\r\nEND";
There, the CRLF pairs must be removed, so I've tested some as:
int pos = 0;
while (true) {
pos = string_buff.str().rfind("\r\n");
if (pos == string_buff.str().npos) {
break;
} else {
std::string preamble = string_buff.str().substr(0, pos);
std::string postamble = string_buff.str().substr(pos +2);
io_stream.seekp(0);
io_stream << preamble << postamble;
}
}
But the sequence remains of the same length. So, I get the following result:
part-onepart-twopart-threeENDNDNDND
I suppose that there are some way to do this -and more elegant- but I'm unable to find the way.
By the way. It seems that the direct manipulation on the inner string does not work. I say tings like:
string_buff.str().clear();
Neither
io_stream.clear();
or
io_stream.flush();
Unfortunately I mistaken in my initial approach
As I mentioned earlier, the real problem is related to a boost::asio::streambuf and my mistake was in try to mimic that, with a std::istream in a separate console application for test purposes.
Of course, with an asio::streambuf y can't do some as
strembuf.str("");
So the real situation is this:
boost::asio::streambuf stream_buff;
std::iostream response_stream(&stream_buff);
response_stream << "part-one\r\npart-two\r\npart-three\r\nEND";
My apologies for the confussion.
The question remains the same: How can I remove the CRLF -or any other- character sequence from the input?
You are close! The way to make the streambuf empty is
String_buff.str("");
That will assign it the empty string.
(string_buff.str().clear() just empties a copy of the contents :-)
Use Boost.String.
string s(string_buf.str());
boost::erase_all(s, "\r\n");
string_buf.str(s);
or, if you need the line-ends replaced with other whitespace:
string s(string_buf.str());
boost::replace_all(s, "\r\n", " ");
string_buf.str(s);
And yes, stringbuf.str() returns a copy of, not a reference to the string.
I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word.
The text for example might be structured like this:
"06/05/1992
Today is a good day;
The worm has turned and the battle was won."
I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.
Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.
So to sort the thing short:
Is there an easy way to read an input from a file and split it into words?
Since it's easier to write than to find the duplicate question,
#include <iterator>
std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;
size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}
The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.
If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.
Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.
i.e.
std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
words.push_back(currentWord);
You can use getline with a space character, getline(buffer,1000,' ');
Or perhaps you can use this function to split a string into several parts, with a certain delimiter:
string StrPart(string s, char sep, int i) {
string out="";
int n=0, c=0;
for (c=0;c<(int)s.length();c++) {
if (s[c]==sep) {
n+=1;
} else {
if (n==i) out+=s[c];
}
}
return out;
}
Notes: This function assumes that it you have declared using namespace std;.
s is the string to be split.
sep is the delimiter
i is the part to get (0 based).
You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.
If you later intend to interpret the words, I would recommend this approach.
I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)
I'm trying to find out if two strings I have are the same, for the purpose of unit testing. The first is a predefined string, hard-coded into the program. The second is a read in from a text file with an ifstream using std::getline(), and then taken as a substring. Both values are stored as C++ strings.
When I output both of the strings to the console using cout for testing, they both appear to be identical:
ThisIsATestStringOutputtedToAFile
ThisIsATestStringOutputtedToAFile
However, the string.compare returns stating they are not equal. When outputting to a text file, the two strings appear as follows:
ThisIsATestStringOutputtedToAFile
T^#h^#i^#s^#I^#s^#A^#T^#e^#s^#t^#S^#t^#r^#i^#n^#g^#O^#u^#t^#p^#u^#t^#
t^#e^#d^#T^#o^#A^#F^#i^#l^#e
I'm guessing this is some kind of encoding problem, and if I was in my native language (good old C#), I wouldn't have too many problems. As it is I'm with C/C++ and Vi, and frankly don't really know where to go from here! I've tried looking at maybe converting to/from ansi/unicode, and also removing the odd characters, but I'm not even sure if they really exist or not..
Thanks in advance for any suggestions.
EDIT
Apologies, this is my first time posting here. The code below is how I'm going through the process:
ifstream myInput;
ofstream myOutput;
myInput.open(fileLocation.c_str());
myOutput.open("test.txt");
TEST_ASSERT(myInput.is_open() == 1);
string compare1 = "ThisIsATestStringOutputtedToAFile";
string fileBuffer;
std::getline(myInput, fileBuffer);
string compare2 = fileBuffer.substr(400,100);
cout << compare1 + "\n";
cout << compare2 + "\n";
myOutput << compare1 + "\n";
myOutput << compare2 + "\n";
cin.get();
myInput.close();
myOutput.close();
TEST_ASSERT(compare1.compare(compare2) == 0);
How did you create the content of myInput? I would guess that this file is created in two-byte encoding. You can use hex-dump to verify this theory, or use a different editor to create this file.
The simpliest way would be to launch cmd.exe and type
echo "ThisIsATestStringOutputtedToAFile" > test.txt
UPDATE:
If you cannot change the encoding of the myInput file, you can try to use wide-chars in your program. I.e. use wstring instead of string, wifstream instead of ifstream, wofstream, wcout, etc.
The following works for me and writes the text pasted below into the file. Note the '\0' character embedded into the string.
#include <iostream>
#include <fstream>
#include <sstream>
int main()
{
std::istringstream myInput("0123456789ThisIsATestStringOutputtedToAFile\x0 12ou 9 21 3r8f8 reohb jfbhv jshdbv coerbgf vibdfjchbv jdfhbv jdfhbvg jhbdfejh vbfjdsb vjdfvb jfvfdhjs jfhbsd jkefhsv gjhvbdfsjh jdsfhb vjhdfbs vjhdsfg kbhjsadlj bckslASB VBAK VKLFB VLHBFDSL VHBDFSLHVGFDJSHBVG LFS1BDV LH1BJDFLV HBDSH VBLDFSHB VGLDFKHB KAPBLKFBSV LFHBV YBlkjb dflkvb sfvbsljbv sldb fvlfs1hbd vljkh1ykcvb skdfbv nkldsbf vsgdb lkjhbsgd lkdcfb vlkbsdc xlkvbxkclbklxcbv");
std::ofstream myOutput("test.txt");
//std::ostringstream myOutput;
std::string str1 = "ThisIsATestStringOutputtedToAFile";
std::string fileBuffer;
std::getline(myInput, fileBuffer);
std::string str2 = fileBuffer.substr(10,100);
std::cout << str1 + "\n";
std::cout << str2 + "\n";
myOutput << str1 + "\n";
myOutput << str2 + "\n";
std::cout << str1.compare(str2) << '\n';
//std::cout << myOutput.str() << '\n';
return 0;
}
Output:
ThisIsATestStringOutputtedToAFile
ThisIsATestStringOutputtedToAFile
It turns out that the problem was that the file encoding of myInput was UTF-16, whereas the comparison string was UTF-8. The way to convert them with the OS limitations I had for this project (Linux, C/C++ code), was to use the iconv() functions. To keep the compatibility of the C++ strings I'd been using, I ended up saving the string to a new text file, then running iconv through the system() command.
system("iconv -f UTF-16 -t UTF-8 subStr.txt -o convertedSubStr.txt");
Reading the outputted string back in then gave me the string in the format I needed for the comparison to work properly.
NOTE
I'm aware that this is not the most efficient way to do this. I've I'd had the luxury of a Windows environment and the windows.h libraries, things would have been a lot easier. In this case though, the code was in some rarely used unit tests, and as such didn't need to be highly optimized, hence the creation, destruction and I/O operations of some text files wasn't an issue.