Stuck with removing "\r" from text files! C++

Stuck with removing "\r" from text files! C++ - c++

OK so I've almost completed a program. However whilst it works on Windows I would prefer to run it on my Mac to test differences in performance (my Mac has much faster hardware).
I have an unordered map that is storing in values from a text file and I am also copying this map to reverse the key/value pairs.
The text files keep adding a new line, and from research I've found it to be because Windows adds it's own carriage return (why?!) and it's at the end of every second element in my map.
The file is "stringx,stringy" and so am using stringstream to split the string x and y into the key/value pair.
EDIT: thanks for the answers guys, worked a treat!

That isn't how std::string::replace works, you should read up on how it works here.
In order to do a basic replace, you could write your own function to do it, however in your case it seems to be a trimming issue since the carriage return is usually on the right side of the string.
You can remove the carriage return and new line by doing something like this:
std::string& rtrim(std::string& str) {
size_t endpos = str.find_last_not_of("\r\n");
if(endpos != std::string::npos) {
str.substr(0,endpos+1).swap(str);
}
return str;
}

On some implementations, like Windows, using a read mode of "r" or a write mode of "w" will cause "\r\n" to be read/written when you meant to pass "\n" through. Use "wb" or "rb". For iostream functions, I believe you need to pass in the ios::binary flag.

Windows uses "\r\n" to end lines. Usually programs that are supposed to run on various platforms use some #ifdef to handle similar differences.

I think I understand what the question is now. It's not about dealing with the differences in code - you are actually trying to use a "DOS/Windows" file on a non-Dos/Windows machine - you need to use dos2unix to fix up the end of lines on your file!

Related

C++ delete everything in text file that is located before/after a specific word

So lets say text file has the following contents:
kasjdfhjkasdhfjkasdhfjasfjs
asdjkfhasj
start
sdfjkhasdkjfhasjkdfhajksdfhjkasdfh
asdjfhajs
end
sdjfhsjkdf
How to delete everything before the word "start" and everything after "end"?

Filesystems in general do support "truncate" meaning to chop off the end, but they do not support removing the front of a file. So you're left with only one option: you need to move the contents between "start" and "end" to the beginning of the file, then "truncate" the rest. This isn't very efficient if the part you're moving is very large, but there's no way around it on typical filesystems.

Barring very specific cases, it is not a good idea to edit files in place. If your computer crashes at the wrong point in time, for instance, you'd end up with a corrupted file and without the ability to restore its state before the attempted transformation.
So, better to read from one file and write to another, which is very simple:
std::ifstream in ("input.txt");
std::ofstream out("output.txt");
std::string line;
// read and discard lines before "start"
while(std::getline(in, line) && line != "start");
// read and echo lines until "end"
while(std::getline(in, line) && line != "end") {
out << line << '\n';
}
and then move it to where the original file is, overwriting it. On Windows:
MoveFileExA("output.txt", "input.txt", MOVEFILE_REPLACE_EXISTING);
On POSIX-conforming systems (such as Linux, BSD, MacOS X):
rename("output.txt", "input.txt");
...or take a look at Boost.Filesystem for a portable solution.
Renaming will typically be an atomic operation for the file system, so you'll have the state before or after the transformation at all times, and if fecal matter hits the fan, you'll be able to repair it without too much trouble.

Brought a Linux C++ Console Application to a Win32 C++ App using VS2010 and the search function from <algorithm> is no longer working

Just like the title says, I've been working on a fairly large program and have come upon this bug. I'm also open to alternatives for searching a file for a string instead of using . Here is my code narrowed down:
istreambuf_iterator<char> eof;
ifstream fin;
fin.clear();
fin.open(filename.c_str());
if(fin.good()){
//I outputted text to a file to make sure opening the file worked, which it does
}
//term was not found.
if(eof == search(istreambuf_iterator<char>(fin), eof, term.begin(), term.end()){
//PROBLEM: this code always executes even when the string term is in the file.
}
So just to clarify, my program worked correctly in Linux but now that I have it in a win32 app project in vs2010, the application builds just fine but the search function isn't working like it normally did. (What I mean by normal is that the code in the if statement didn't execute because, where as now it always executes.)
NOTE: The file is a .xml file and the string term is simply "administration."
One thing that might or might not be important is to know that filename (filename from the code above) is a XML file I have created in the program myself using the code below. Pretty much I create an identical xml file form the pre-existing one except for it is all lower case and in a new location.
void toLowerFile(string filename, string newloc, string& newfilename){
//variables
ifstream fin;
ofstream fout;
string temp = "/";
newfilename = newloc + temp + newfilename;
//open file to read
fin.open(filename.c_str());
//open file to write
fout.open(newfilename.c_str());
//loop through and read line, lower case, and write
while (fin.good()){
getline (fin,temp);
//write lower case version
toLowerString(temp);
fout << temp << endl;
}
//close files
fout.close();
fin.close();
}
void toLowerString(string& data){
std::transform(data.begin(), data.end(), data.begin(), ::tolower);
}

I'm afraid your code is invalid - the search algorithm requires forward iterators, but istreambuf_iterator is only an input iterator.
Conceptually that makes sense - the algorithm needs to backtrack on a partial match, but the stream may not support backtracking.
The actual behaviour is undefined - so the implementation is allowed to be helpful and make it seem to work, but doesn't have to.
I think you either need to copy the input, or use a smarter search algorithm (single-pass is possible) or a smarter iterator.
(In an ideal world at least one of the compilers would have warned you about this.)

Generally, with Microsoft's compiler, if your program compiles and links a main() function rather than a wmain() function, everything defaults to char. It would be wchar_t or WCHAR if you have a wmain(). If you have tmain() instead, then you are at the mercy of your compiler/make settings and it's the UNICODE macro that determines which flavor your program uses. But I doubt that char_t/wchar_t mismatch is actually the issue here because I think you would have got an warning or error if all four of the search parameters didn't use the same the same character width.
This is a bit of a guess, but try this:
if(eof == search(istreambuf_iterator<char>(fin.rdbuf()), eof, term.begin(), term.end())

C++ Carriage return and line feed in a string

I am working with the communication for some TCP/IP connected equipment in C++.
The equipment requires that the commands sent are ended with \r\n.
I am using a configuration file from which I am reading the commands used in the communication.
The problem I have is that the commands \r\n are interpreted as the 4 characters they are and not as carriage return and line feed.
I have tried to use the string.data() but I get the same result as string.c_str().
Is there any nice function to get it correct from the beginning or do I have to solve this with a normal replace function? Or some other way that I have not thought about?
I guess, if I don't find a really neat way to do this I will just omitt the \r\n in the configuration file and add it afterwards, but it would be nice to have it all in the configuration file without any hard coding. I think I would need to do some hard coding as well if I would try to replace the four characters \r\n with their correct characters.
Thanks for any help
Edit:
The config file contains lines like this one.
MONITOR_REQUEST = "TVT?\r\n"

If the data in the configuration file requires translation, you
have to translate it. Short of regular expressions (which are
clearly overkill for this), I don't know of any standard
function which would do this. We use something like:
std::string
globalReplace(
std::string::const_iterator begin,
std::string::const_iterator end,
std::string const& before,
std::string const& after )
{
std::string retval;
std::back_insert_iterator<std::string> dest( retval );
std::string::const_iterator current = begin;
std::string::const_iterator next
= std::search( current, end, before.begin(), before.end() );
while ( next != end ) {
std::copy( current, next, dest );
std::copy( after.begin(), after.end(), dest );
current = next + before.size();
next = std::search( current, end, before.begin(), before.end() );
}
std::copy( current, next, dest );
return retval;
}
for this. You could call it with "\\r\\n", "\r\n" for the
last to arguments, for example.

Sounds like your configuration file actually contains 4 distinct characters '\', 'r', '\', and 'n'... you must either change the file (e.g. by using actual line endings in the file to encode line endings in the strings, and even then some systems won't happen to use \r\n as their text file line delimiters), or do some replace/translation as you suggest. The translation is the more portable approach. It can also be pretty fast... just maintain two pointers/iterators/indices into the string - one for the read position and one for the write, and copy across while compacting the escape notation.

You need not had code anything in code.
You can very well read the characters to append to outgoing string from the same configuration file.
Assuming that you have name value pair in your conf file :
APPEND_EOL="\n\r"
str_to_send1="this is a test%%APPEND_EOL%%"
while parsing the line before sending you can generate the actual line.
In case you can't store the APPEND_EOL in the same line, then you can pick it up from ENV with a default in case it is not defined, or, maybe in another config file of yours.

Read text file step-by-step

I have a file which has text like this:
#1#14#ADEADE#CAH0F#0#0.....
I need to create a code that will find text that follows # symbol, store it to variable and then writes it to file WITHOUT # symbol, but with a space before. So from previous code I will get:
1 14 ADEADE CAH0F 0 0......
I first tried to did it in Python, but files are really big and it takes a really huge time to process file, so I decided to write this part in C++. However, I know nothing about C++ regex, and I'm looking for help. Could you, please, recommend me an easy regex library (I don't know C++ very well) or the well-documented one? It would be even better, if you provide a small example (I know how to perform transmission to file, using fstream, but I need help with how to read file as I said before).

This looks like a job for std::locale and his trusty sidekick imbue:
#include <locale>
#include <iostream>
struct hash_is_space : std::ctype<char> {
hash_is_space() : std::ctype<char>(get_table()) {}
static mask const* get_table()
{
static mask rc[table_size];
rc['#'] = std::ctype_base::space;
return &rc[0];
}
};
int main() {
using std::string;
using std::cin;
using std::locale;
cin.imbue(locale(cin.getloc(), new hash_is_space));
string word;
while(cin >> word) {
std::cout << word << " ";
}
std::cout << "\n";
}

IMO, C++ is not the best choice for your task. But if you have to do it in C++ I would suggest you have a look at Boost.Regex, part of the Boost library.

If you are on Unix, a simple sed 's/#/ /' <infile >outfile would suffice.
Sed stands for 'stream editor' (and supports regexes! whoo!), so it would be well-suited for the performance that you are looking for.

Alright, I'm just going to make this an answer instead of a comment. Don't use regex. It's almost certainly overkill for this task. I'm a little rusty with C++, so I'll not post any ugly code, but essentially what you could do is parse the file one character at a time, putting anything that wasn't a # into a buffer, then writing it out to the output file along with a space when you do hit a #. In C# at least two really easy methods for solving this come to mind:
StreamReader fileReader = new StreamReader(new FileStream("myFile.txt"),
FileMode.Open);
string fileContents = fileReader.ReadToEnd();
string outFileContents = fileContents.Replace("#", " ");
StreamWriter outFileWriter = new StreamWriter(new FileStream("outFile.txt"),
Encoding.UTF8);
outFileWriter.Write(outFileContents);
outFileWriter.Flush();
Alternatively, you could replace
string outFileContents = fileContents.Replace("#", " ");
With
StringBuilder outFileContents = new StringBuilder();
string[] parts = fileContents.Split("#");
foreach (string part in parts)
{
outFileContents.Append(part);
outFileContents.Append(" ");
}
I'm not saying you should do it either of these ways or my suggested method for C++, nor that any of these methods are ideal - I'm just pointing out here that there are many many ways to parse strings. Regex is awesome and powerful and may even save the day in extreme circumstances, but it's not the only way to parse text, and may even destroy the world if used for the wrong thing. Really.
If you insist on using regex (or are forced to, as in for a homework assignment), then I suggest you listen to Chris and use Boost.Regex. Alternatively, I understand Boost has a good string library as well if you'd like to try something else. Just look out for Cthulhu if you do use regex.

You've left out one crucial point: if you have two (or more) consecutive #s in the input, should they turn into one space, or the same number of spaces are there are #s?
If you want to turn the entire string into a single space, then #Rob's solution should work quite nicely.
If you want each # turned into a space, then it's probably easiest to just write C-style code:
#include <stdio.h>
int main() {
int ch;
while (EOF!=(ch=getchar()))
if (ch == '#')
putchar(' ');
else
putchar(ch);
return 0;
}

So, you want to replace each ONE character '#' with ONE character ' ' , right ?
Then it's easy to do since you can replace any portion of the file with string of exactly the same length without perturbating the organisation of the file.
Repeating such a replacement allows to make transformation of the file chunk by chunk; so you avoid to read all the file in memory, which is problematic when the file is very big.
Here's the code in Python 2.7 .
Maybe, the replacement chunk by chunk will be unsifficient to make it faster and you'll have a hard time to write the same in C++. But in general, when I proposed such codes, it has increased the execution's time satisfactorily.
def treat_file(file_path, chunk_size):
from os import fsync
from os.path import getsize
file_size = getsize(file_path)
with open(file_path,'rb+') as g:
fd = g.fileno() # file descriptor, it's an integer
while True:
x = g.read(chunk_size)
g.seek(- len(x),1)
g.write(x.replace('#',' '))
g.flush()
fsync(fd)
if g.tell() == file_size:
break
Comments:
open(file_path,'rb+')
it's absolutely obligatory to open the file in binary mode 'b' to control precisely the positions and movements of the file's pointer;
mode '+' is to be able to read AND write in the file
fd = g.fileno()
file descriptor, it's an integer
x = g.read(chunk_size)
reads a chunk of size chunk_size . It would be tricky to give it the size of the reading buffer, but I don't know how to find this buffer's size. Hence a good idea is to give it a power of 2 value.
g.seek(- len(x),1)
the file's pointer is moved back to the position from which the reading of the chunk has just been made. It must be len(x), not chunk_size because the last chunk read is in general less long than chink_size
g.write(x.replace('#',' '))
writes on the same length with the modified chunk
g.flush()
fsync(fd)
these two instructions force the writing, otherwise the modified chunk could remain in the writing buffer and written at uncontrolled moment
if g.tell() >= file_size: break
after the reading of the last portion of file , whatever is its length (less or equal to chunk_size), the file's pointer is at the maximum position of the file, that is to say file_size and the program must stop
.
In case you would like to replace several consecutive '###...' with only one, the code is easily modifiable to respect this requirement, since writing a shortened chunk doesn't erase characters still unread more far in the file. It only needs 2 files's pointers.

Parse config file in C/C++

I'm a newbie looking for a fast and easy way to parse a text file in C or C++ (wxWidgets)
The file will look something like this (A main category with "sub-objects") which will appear in a list box
[CategoryA]
[SubCat]
Str1 = Test
Str2 = Description
[SubCat] [End]
[SubCat]
Str1 = Othertest
...
[CategoryA] [End]
Any suggestions?

Sounds like you want to parse a file that's pretty close to an ini file.
There's at least a few INI parser libraries out there: minIni, iniParser, libini, for instance.

It should be fairly easy to write your own parser for this if you use streams. You can read a file using an std::ifstream:
std::ifstream ifs("filename.ext");
if(!ifs.good()) throw my_exceptions("cannot open file");
read_file(ifs);
Since it seems line-oriented, you would then first read lines, and then process these:
void read_file(std::istream& is)
{
for(;;) {
std::string line;
std::getline(is, line);
if(!is) break;
std::istringstream iss(line);
// read from iss
}
if(!is.eof()) throw my_exceptions("error reading file");
}
For the actual parsing, you could 1) first peek at the first character. If that's a [, pop it from the stream, and use std::getline(is,identifier,']') to read whatever is within '[' and ']'. If it isn't a [, use std::getline(is, key, '=') to read the left side of a key-value pair, and then std::getline(is, value) to read the right side.
Note: Stream input, unfortunately, is usually not exactly lightning fast. (This doesn't have to be that way, but in practice this often is.) However, it is really easy to do and it is fairly easy to do it right, once you know a very few patterns to work with its peculiarities (like if(strm.good()) not being the same as if(strm) and not being the opposite of if(strm.bad()) and a few other things you'll have to get used to). For something as performance-critical (har har!) as reading an ini file from disk, it should be fast enough in 999,999 out of 1,000,000 cases.

You may want to try Boost.Program_Options. However it has slightly different formatting. More close to INI files. Subcategories are done like this:
[CategoryA]
Option = Data
[CategoryB.Subcategory1]
Option = Data
[CategoryB.Subcategory2]
Option = Data
Also it has some other features so it is actually very useful IMO.

Try Configurator. It's easy-to-use and flexible C++ library for configuration file parsing (from simplest INI to complex files with arbitrary nesting and semantic checking). Header-only and cross-platform. Uses Boost C++ libraries.
See: http://opensource.dshevchenko.biz/configurator

It looks more straightforward to implement your own parser than to try to adapt an existing one you are unfamiliar with.
Your structure seems - from your example - to be line-based. This makes parsing it easy.
It generally makes sense to load your file into a tree, and then walk around it as necessary.

On Windows only, GetPrivateProfileSection does this. It's deprecated in favor of the registry but it's still here and it still works.

How about trying to make a simple XML file? There are plenty of libraries that can help you read it, and the added bonus is that a lot of other programs/languages can read it too.

If you're using wxWidgets I would consider wxFileConfig. I'm not using wxWidgets, but the class seems to support categories with sub-categories.

When you are using GTK, you are lucky.
You can use the Glib KeyFile save_to_file and load_from_file.
https://docs.gtk.org/glib/struct.KeyFile.html
Or when using Gtkmm (C++).
See: https://developer-old.gnome.org/glibmm/stable/classGlib_1_1KeyFile.html
Example in C++ with load_from_file:
#include <glibmm.h>
#include <string>
Glib::KeyFile keyfile;
keyfile.load_from_file(file_path);
std::string path = keyfile.get_string("General", "Path");
bool is_enabled = keyfile.get_boolean("General", "IsEnabled");
Saving is as easy as calling save_to_file:
Glib::KeyFile keyfile;
keyfile.set_string("General", "Path", path);
keyfile.set_boolean("General", "IsEnabled", is_enabled);
keyfile.save_to_file(file_path);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Stuck with removing "\r" from text files! C++ - c++

On some implementations, like Windows, using a read mode of "r" or a write mode of "w" will cause "\r\n" to be read/written when you meant to pass "\n" through. Use "wb" or "rb". For iostream functions, I believe you need to pass in the ios::binary flag.

Windows uses "\r\n" to end lines. Usually programs that are supposed to run on various platforms use some #ifdef to handle similar differences.

I think I understand what the question is now. It's not about dealing with the differences in code - you are actually trying to use a "DOS/Windows" file on a non-Dos/Windows machine - you need to use dos2unix to fix up the end of lines on your file!

Related

C++ delete everything in text file that is located before/after a specific word

Brought a Linux C++ Console Application to a Win32 C++ App using VS2010 and the search function from <algorithm> is no longer working

C++ Carriage return and line feed in a string

Read text file step-by-step

Parse config file in C/C++

Categories

Resources