C++ how to remove all chars and special characters from a file - c++

I have seen how to remove specific chars from a string but I am not sure how to do it with a file open or if you can even do that. Basically a file will be open with anything in it, my goal is to remove all the letters a-z, special characters, and whitespace that may appear so that all that is left is my numbers. Can you easily remove all chars rather than specifying a,b,c etc when the file is open or would I have to convert it to a string? Also would it be better to do this in memory?
My code this far as is follows:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main() {
string filename;
cout << "Enter the name of the data file to open" << endl;
cin >> filename >> endl;
ofstream myfile;
myfile.open(filename);
if (myfile.is_open()) { //if file is open then
while(!myfile.eof()){ //while not end of file
//remove all chars, special and whitespace
}
}
else{
cout << "Error in opening file" << endl;
}
return 0;
}

Preliminary remarks
If I understand well, you want to keep only the numbers. Maybe it's easier to retain chars that are ascii numbers and eliminate the others rather than eliminate a lot of other chars classes and hope that the remainder is only numbers.
Also never loop on eof to read a file. Loop on the stream instead.
finally, you should read from an ifstream and write to an ofstream
First approach: reading strings
You can read/write the file line by line. You need enough memory to store the largest line, but you benefit from buffering effect.
if (myfile.is_open()) { //if file is open then
string line;
while(getline(myfile, line)){ //while succesful read
line.erase(remove_if(line.begin(), line.end(), [](const char& c) { return !isdigit(c); } ), line.end());
... // then write the line in the output file
}
}
else ...
Online demo
Second approach: reading chars
You can read/write char by char, which gives very flexible option for handling individual characters (toggle string flags, etc...). You also benefit from buffering, but you have function call overhaead for every single char.
if (myfile) { //if file is open then
int c;
while((c = myfile.get())!=EOF){ //while succesful read
//remove all chars, special and whitespace
if (isdigit(c) || c=='\n')
... .put(c); // then write the line in the output file
}
}
else ...
Online demo
Other approaches
You could also read a large fixed size buffer, and operate similarly as with the strings (but don't eliminate LF then). The advantage is that the memory need is not impacted by some very large lines in the file.
You could also determine the file size, and try to read the full file at once (or in very large chunks). You'd then maximize performance at the cost of memory consumption.

This is just an example in order to extract all chars you want from a file with a dedicated filter:
std::string get_purged_file(const std::string& filename) {
std::string strbuffer;
std::ifstream infile;
infile.open(filename, std::ios_base::in);
if (infile.fail()) {
// throw an error
}
char c;
while ((infile >> c).eof() == false) {
if (std::isdigit(c) || c == '.') {
strbuffer.push_back(c);
}
}
infile.close();
return strbuffer;
}
Note: this is just an example and it has to be subject to optimizations. Just to give you an idea:
Read more than one char at time, (with a proper buffer).
Reserve memory in string.
Once you have the buffer "purged" you can overwrite your file on save the content into another file.

Related

How can I read from a file and sort them by category

I'm trying to read a bunch of words from a file and sort them into what kind of words they are (Nouns, Adjective, Verbs ..etc). For example :
-Nouns;
zyrian
zymurgy
zymosis
zymometer
zymolysis
-Verbs_participle;
zoom in
zoom along
zoom
zonk out
zone
I'm using getline to read until the delimiter ';' but how can I know when it read in a type and when it read in a word?
The function below stop right after "-Nouns;"
int main()
{
map<string,string> data_base;
ifstream source ;
source.open("partitioned_data.txt");
char type [MAX];
char word [MAX];
if(source) //check to make sure we have opened the file
{
source.getline(type,MAX,';');
while( source && !source.eof())//make sure we're not at the end of file
{
source.getline(word,MAX);
cout<<type<<endl;
cout<<word<<endl;
source.getline(type,MAX,';');//read the next line
}
}
source.close();
source.clear();
return 0;
}
I am not fully sure about the format of your input file. But you seem to have a file with lines, and in that, items separated by a semicolon.
Reading this should be done differently.
Please see the following example:
#include <iostream>
#include <string>
#include <sstream>
#include <fstream>
std::istringstream source{R"(noun;tree
noun;house
verb;build
verb;plant
)"};
int main()
{
std::string type{};
std::string word{};
//ifstream source{"partitioned_data.txt"};
if(source) //check to make sure we have opened the file
{
std::string line{};
while(getline(source,line))//make sure we're not at the end of file
{
size_t pos = line.find(';');
if (pos != std::string::npos) {
type = line.substr(0,pos);
word = line.substr(pos+1);
}
std::cout << type << " --> " << word << '\n';
}
}
return 0;
}
There is no need for open and close statements. The constructor and
destructor of the std::ifstream will do that for us.
Do not check eof in while statement
Do not, and never ever use C-Style arrays like char type [MAX];
Read a line in the while statement and check validity of operation in the while. Then work on the read line later.
Search the ';' in the string, and if found, take out the substrings.
If I would knwo the format of the input file, then I will write an even better example for you.
Since I do not have files on SO, I uses a std::istringstream instead. But there is NO difference compared to a file. Simply delete the std::istringstream and uncomment teh ifstream definition in the source code.

Find and Replace a string in a text file and output to another file

I'm trying to write a program that can open a text file, find a certain string and substitute it with another string and then write the altered text to an output file.
This is what I've coded so far. It works fine, except for that the output file is missing spaces and new line characters.
I need to preserve all spaces and new line characters. How do I do it?
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
string search = "HELLO"; //String to find
string replace = "GOODBYE"; //String that will replace the string we find
string filename = ""; //User-provided filename of the input file
string temp; //temp variable for our loop to hold the characters from the file stream
char c;
cout << "Input filename? ";
cin >> filename;
ifstream filein(filename); //File to read from
ofstream fileout("temp.txt"); //Temporary file
if (!fileout || !filein) //if either file is not available
{
cout << "Error opening " << filename << endl;
return 1;
}
while (filein >> temp) //While the stream continues
{
if (temp == search) //Check if the temp variable has captured the string we are looking for
{
temp = replace; //When we found the string, we substitute it with the replacement string
}
fileout << temp; //Dump everything to fileout (our temp.txt file)
}
//Close our file streams
filein.close();
fileout.close();
return 0;
}
UPDATE:
I followed your advice and did the following, but now it doesn't work at all (the previous code worked fine, except for white spaces). Could you kindly tell me what I'm doing wrong here?
Thank you.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
string search = "or"; //String to find
string replace = "OROROR"; //String that will replace the string we find
string filename = ""; //User-provided filename of the input file
string temp = ""; //temp variable for our loop to hold the characters from the file stream
char buffer;
cout << "Input filename? ";
cin >> filename;
ifstream filein(filename); //File to read from
ofstream fileout("temp.txt"); //Temporary file
if (!fileout || !filein) //if either file is not available
{
cout << "Error opening " << filename << endl;
return 1;
}
while (filein.get(buffer)) //While the stream continues
{
if (buffer == ' ') //check if space
{
if (temp == search) //if matches pattern,
{
temp = replace; //replace with replace string
}
}
temp = string() + buffer;
for (int i = 0; temp.c_str()[i] != '\0'; i++)
{
fileout.put(temp.c_str()[i]);
}
return 0;
}
}
while (filein >> temp)
This temp variable is a std::string. The formatted extraction operator, >>, overload for a std::string skips all whitespace characters (spaces, tabs, newlines) in the input and completely discards them. This formatted extraction operator discards all whitespace until the first non-whitespace character, then extracts it and all following non-whitespace characters and places them into your std::string, which is this temp variable. This is how it works.
Subsequently:
fileout << temp;
This then writes out this string to the output. There's nothing in the shown code that tells your computer to copy all whitespace from the input to the output, as is. The only thing that the shown code does is extract every sequence of non-space characters from the input file, immediately throwing on the floor all spaces and newlines, never to be seen again; and then write what's left (with the appropriate changes) to the output file. And a computer will always do exactly what you tell it to do, and not what you want it to do.
while (filein >> temp)
This is where all spaces in the input file gets thrown in the trash, and discarded. Therefore you wish to preserve them and copy them to the output file, as is, you will have to replace this.
There are several approaches that can be used here. The simplest solution is to simply read the input file one character at a time. If it's not a whitespace character, add it to the temp buffer. If it's a whitespace character, and temp is not empty, then you've just read a complete word; check if it needs replacing; write it out to the output file; clear the temp buffer (in preparation for reading the next word); and then manually write the just-read whitespace character to the output file. In this manner you will copy the input to the output, one character at a time, including spaces, but buffering non-space character into the temp buffer, until each complete word gets read, before copying it to the output file. And you will also need to handle the edge case of handling the very last word in the file, without any trailing whitespace.

How to read substitution char with ifstream in C++ ? (SUB in ASCII)

I am having a hard time finding out why I can't read all characters with fstream get function.
My code is the following :
ifstream input_stream(input_filename.c_str(), ios::in);
string input;
if(input_stream)
{
char character;
while(input_stream.get(character))
{
input += character;
}
input_stream.close();
}
else
cerr << "Error" << endl;
By testing a little, I found out that I get a problem when character = 26 (SUB in ASCII) because input_stream.get(26) return false and I get out of my while loop.
I would like to put in my string input all characters from the file including SUB.
I tryed with getline function at first and I got a similar problem.
Could you help me please ?
You need to read a binary stream, not a textual one (since SUB i.e. '0x1a' (that is 26) is a control character in ASCII or UTF8, not a printable one) Use ios::binary at opening time:
ifstream input_stream(input_filename.c_str(), ios::in | ios::binary);
Maybe you would then code
do {
int c= input_stream.get();
if (c==std::char_traits::eof()) break;
input += (char)c;
} while (!input_stream.fail());
Did you consider using std::getline to read an entire line, assuming the input file is still organized in ('\n' terminated) lines?

Read from a file with blank spaces in C++

I am reading from a file and passing the front of the array(pointer) back into my main function. The problem I am having is that it is not copying the blank spaces in between the words. For example Hello Hello comes out as HelloHello.
I started by using getLine instead and ran into the problems of size of the file. I set it to 500 because no files will be larger than 500, however most files will be below 500 and I am trying to get the exact size of the file.
Here is my code:
char infile()
{
const int SIZE=500;
char input[SIZE];
char fromFile;
int i=0;
ifstream readFile;
readFile .open("text.txt");
while(readFile>>fromFile)
{
input[i]=fromFile;
i++;
}
cout<<endl;
returnArray=new char[i];//memory leak need to solve later
for(int j=0;j<i;j++)
{
returnArray[j]=input[j];
cout<<returnArray[j];
}
cout<<endl;
}
return returnArray[0];
}
Depending on what your file format is, you may want to use ifstream::read() or ifstream::getline() instead.
operator >> will attempt to 'tokenize' or 'parse' the data stream as it is being read, using whitespace as separators between tokens. You're interested in getting the raw data from the file with whitespace intact, therefore you should avoid using it. If you want to read data in one line at a time, using linefeeds as separators, you should use getline(). Otherwise use read().
Use std::string, std::vector and std::getline and you can still return a char. That will solve your memory leak and skipping whitespace problem.
Example:
char infile()
{
std::ifstream readFile("text.txt");
std::vector<std::string> v;
std::string line;
while(std::getline(readFile, line))
{
v.push_back(line);
}
for(auto& s : v)
{
std::cout << s << std::endl;
}
return (v[0])[0];
}
You are asking it to read while delimiting where there is whitespace.
You can use getline() to preserve the whitespace.

Using C++ ifstream extraction operator>> to read formatted data from a file

As my learning, I am trying to use c++ ifstream and its operator>> to read data from a text file using code below. The text file outdummy.txt has following contents:
just dummy
Hello ofstream
555
My questions is how to read char data present in the file into a char array or string. How to do this using the ifstream::operator>> in code below.
#include <iostream>
#include <fstream>
int main()
{
int a;
string s;
char buf[100];
ifstream in("outdummy.txt",ios_base::in);
in.operator>>(a); //How to read integer? How to read the string data.??
cout << a;
in.close();
getchar();
return 0;
}
If you want to use formatted input, you have to know in advance what data to expect and read it into variables of the according data type. For example, if you know that the number is always the fifth token, as in your example, you could do this:
std::string s1, s2, s3, s4;
int n;
std::ifstream in("outdummy.txt");
if (in >> s1 >> s2 >> s3 >> s4 >> n)
{
std::cout << "We read the number " << n << std::endl;
}
On the other hand, if you know that the number is always on the third line, by itself:
std::string line;
std::getline(in, line); // have line 1
std::getline(in, line); // have line 2
std::getline(in, line); // have line 3
std::istringstream iss(line);
if (iss >> n)
{
std::cout << "We read the number " << n << std::endl;
}
As you can see, to read a token as a string, you just stream it into a std::string. It's important to remember that the formatted input operator works token by token, and tokens are separated by whitespace (spaces, tabs, newlines). The usual fundamental choice to make is whether you process a file entirely in tokens (first version), or line by line (second version). For line-by-line processing, you use getline first to read one line into a string, and then use a string stream to tokenize the string.
A word about validation: You cannot know whether a formatted extraction will actually succeed, because that depends on the input data. Therefore, you should always check whether an input operation succeeded, and abort parsing if it doesn't, because in case of a failure your variables won't contain the correct data, but you have no way of knowing that later. So always say it like this:
if (in >> v) { /* ... */ } // v is some suitable variable
else { /* could not read into v */ }
if (std::getline(in, line)) { /* process line */ }
else { /* error, no line! */ }
The latter construction is usually used in a while loop, to read an entire file line by line:
while (std::getline(in, line)) { /* process line */ }
ifstream has ios_base::in by default. You don't need to specify it.
operator>> can be invoked directly as an operator: in >> a.
Reading strings is the same: in >> s, but the caveat is that it is whitespace-delimited, so it will read "just" by itself, without "dummy".
If you want to read complete lines, use std::getline(in, s).
Since you have elected to use C-strings, you can use the getline method of your ifstream object (not std::getline() which works with std::strings), which will allow you to specify the C-string and a maximum size for the buffer.
Based on what you had, and adding an additional buffer for the second line:
char buf[100];
char buf2[100];
in.getline(buf,sizeof(buf));
in.getline(buf2,sizeof(buf2));
in >> a;
However, as the other poster has proposed, try using the std::string and its methods, it will make your life easier.
You can read file contents and use a Finite State Machine for parsing.
Example:
void Parse(const char* buffer, size_t length);
size_t GetBufferSize();
size_t bufferSize = GetBufferSize();
char* buffer = new char[bufferSize];
std::ifstream in("input.txt");
while(in.getline(buffer, bufferSize)) {
Parse(buffer, in.gcount());
}
Alternatively, you can use a tool like Flex to write your parser.