Encode a string of characters given a custom code table - c++

I want to programmatically convert a string of characters stored in a file to a string of character codes (encode) by following a code table. The string of binary codes should then go to a file, from which I can revert it back to the string of characters later (decode). The codes in the code table were generated using Huffman algorithm and the code table is stored in a file.
For example, by following a code table where characters and its corresponding codes are single spaced like this:
E 110
H 001
L 11
O 111
encoding "HELLO" should output as "0011101111111"
My C++ code cannot seem to complete the encoded string. Here is my code:
int main
{
string English;
ifstream infile("English.txt");
if (!infile.is_open())
{
cout << "Cannot open file.\n";
exit(1);
}
while (!infile.eof())
{
getline (infile,English);
}
infile.close();
cout<<endl;
cout<<"This is the text in the file:"<<endl<<endl;
cout<<English<<endl<<endl;
ofstream codefile("codefile.txt");
ofstream outfile ("compressed.txt");
ifstream codefile_input("codefile.txt");
char ch;
string st;
for (int i=0; i<English.length();)
{
while(!codefile_input.eof())
{
codefile_input >> ch >> st;
if (English[i] == ch)
{
outfile<<st;
cout<<st;
i++;
}
}
}
return 0;
}
For an input string of "The_Quick_brown_fox_jumps_over_the_lazy_dog", the output string is 011100110, but it should be longer than that!
output image
Please help! Is there anything I have missed?
(n.b. my C++ code has no syntax errors)

Let's take a look at the main loop, you are doing your work in:
for (int i=0; i<English.length();)
{
while(!codefile_input.eof())
{
codefile_input >> ch >> st;
if (English[i] == ch)
{
outfile<<st;
cout<<st;
i++;
}
}
}
Your code, will read through the codefile_input once, and then will get stuck in codefile_input.eof () == true condition, and then, for (int i=0; i<English.length();) will become an infinite loop, due to the fact, that there won't be a code path, in which i is increased, and it will never reach the value equal to English.length ().
As a side note, take a read on Why is iostream::eof inside a loop condition considered wrong?.
To avoid the issue, explained above, consider reading the dictionary file, to a data container (e.g. std::map), and then, use that, while iterating through the string, that you want to encode.
For example:
std::ifstream codefile_input("codefile.txt");
char ch;
std::string str;
std::map<char, std::string> codes;
while (codefile_input >> ch >> str)
{
codes[ch] = str;
}
codefile_input.close ();
for (int i=0; i<English.length(); ++i)
{
auto it = codes.find (English[i]);
if (codes.end () != it)
{
outfile << codes->second;
cout << codes->second;
}
}
Note, you will need to #include <map> to use std::map.
In addition to solving the issue, about which, your question, was actually, about, your loop:
while (!infile.eof())
{
getline (infile,English);
}
only reads the last line of the file, while discarding all other lines, that came prior to it. If you want to process all the lines in a file, consider changing that loop to:
while (std::getline (infile, English))
{
/* Line processing goes here */
}
And, since, your dictionary is unlikely to be different for different lines, you can move that logic, to the front of this loop:
std::ifstream codefile_input("codefile.txt");
char ch;
std::string str;
std::map<char, std::string> codes;
while (codefile_input >> ch >> str)
{
codes[ch] = str;
}
codefile_input.close ();
ifstream infile("English.txt");
if (!infile.is_open())
{
cout << "Cannot open file.\n";
exit(1);
}
ofstream outfile ("compressed.txt");
string English;
while (std::getline (infile, English))
{
for (int i=0; i<English.length(); ++i)
{
auto it = codes.find (English[i]);
if (codes.end () != it)
{
outfile << codes->second;
cout << codes->second;
}
}
}
In addition, consider adding error checking for all of the files that you open. You check if you can open file English.txt, and exit if you can't, but you don't check if you could open any other file.
On unrelated note #2, considering reading Why is “using namespace std” considered bad practice? (that's why you see me using std:: explicitly in the code, that I added).

Related

read a series of lines from an input stream object istream into a list container. The isteram object can be the standard input. using c++ [duplicate]

The contents of file.txt are:
5 3
6 4
7 1
10 5
11 6
12 3
12 4
Where 5 3 is a coordinate pair.
How do I process this data line by line in C++?
I am able to get the first line, but how do I get the next line of the file?
ifstream myfile;
myfile.open ("file.txt");
First, make an ifstream:
#include <fstream>
std::ifstream infile("thefile.txt");
The two standard methods are:
Assume that every line consists of two numbers and read token by token:
int a, b;
while (infile >> a >> b)
{
// process pair (a,b)
}
Line-based parsing, using string streams:
#include <sstream>
#include <string>
std::string line;
while (std::getline(infile, line))
{
std::istringstream iss(line);
int a, b;
if (!(iss >> a >> b)) { break; } // error
// process pair (a,b)
}
You shouldn't mix (1) and (2), since the token-based parsing doesn't gobble up newlines, so you may end up with spurious empty lines if you use getline() after token-based extraction got you to the end of a line already.
Use ifstream to read data from a file:
std::ifstream input( "filename.ext" );
If you really need to read line by line, then do this:
for( std::string line; getline( input, line ); )
{
...for each line in input...
}
But you probably just need to extract coordinate pairs:
int x, y;
input >> x >> y;
Update:
In your code you use ofstream myfile;, however the o in ofstream stands for output. If you want to read from the file (input) use ifstream. If you want to both read and write use fstream.
Reading a file line by line in C++ can be done in some different ways.
[Fast] Loop with std::getline()
The simplest approach is to open an std::ifstream and loop using std::getline() calls. The code is clean and easy to understand.
#include <fstream>
std::ifstream file(FILENAME);
if (file.is_open()) {
std::string line;
while (std::getline(file, line)) {
// using printf() in all tests for consistency
printf("%s", line.c_str());
}
file.close();
}
[Fast] Use Boost's file_description_source
Another possibility is to use the Boost library, but the code gets a bit more verbose. The performance is quite similar to the code above (Loop with std::getline()).
#include <boost/iostreams/device/file_descriptor.hpp>
#include <boost/iostreams/stream.hpp>
#include <fcntl.h>
namespace io = boost::iostreams;
void readLineByLineBoost() {
int fdr = open(FILENAME, O_RDONLY);
if (fdr >= 0) {
io::file_descriptor_source fdDevice(fdr, io::file_descriptor_flags::close_handle);
io::stream <io::file_descriptor_source> in(fdDevice);
if (fdDevice.is_open()) {
std::string line;
while (std::getline(in, line)) {
// using printf() in all tests for consistency
printf("%s", line.c_str());
}
fdDevice.close();
}
}
}
[Fastest] Use C code
If performance is critical for your software, you may consider using the C language. This code can be 4-5 times faster than the C++ versions above, see benchmark below
FILE* fp = fopen(FILENAME, "r");
if (fp == NULL)
exit(EXIT_FAILURE);
char* line = NULL;
size_t len = 0;
while ((getline(&line, &len, fp)) != -1) {
// using printf() in all tests for consistency
printf("%s", line);
}
fclose(fp);
if (line)
free(line);
Benchmark -- Which one is faster?
I have done some performance benchmarks with the code above and the results are interesting. I have tested the code with ASCII files that contain 100,000 lines, 1,000,000 lines and 10,000,000 lines of text. Each line of text contains 10 words in average. The program is compiled with -O3 optimization and its output is forwarded to /dev/null in order to remove the logging time variable from the measurement. Last, but not least, each piece of code logs each line with the printf() function for consistency.
The results show the time (in ms) that each piece of code took to read the files.
The performance difference between the two C++ approaches is minimal and shouldn't make any difference in practice. The performance of the C code is what makes the benchmark impressive and can be a game changer in terms of speed.
10K lines 100K lines 1000K lines
Loop with std::getline() 105ms 894ms 9773ms
Boost code 106ms 968ms 9561ms
C code 23ms 243ms 2397ms
Since your coordinates belong together as pairs, why not write a struct for them?
struct CoordinatePair
{
int x;
int y;
};
Then you can write an overloaded extraction operator for istreams:
std::istream& operator>>(std::istream& is, CoordinatePair& coordinates)
{
is >> coordinates.x >> coordinates.y;
return is;
}
And then you can read a file of coordinates straight into a vector like this:
#include <fstream>
#include <iterator>
#include <vector>
int main()
{
char filename[] = "coordinates.txt";
std::vector<CoordinatePair> v;
std::ifstream ifs(filename);
if (ifs) {
std::copy(std::istream_iterator<CoordinatePair>(ifs),
std::istream_iterator<CoordinatePair>(),
std::back_inserter(v));
}
else {
std::cerr << "Couldn't open " << filename << " for reading\n";
}
// Now you can work with the contents of v
}
Expanding on the accepted answer, if the input is:
1,NYC
2,ABQ
...
you will still be able to apply the same logic, like this:
#include <fstream>
std::ifstream infile("thefile.txt");
if (infile.is_open()) {
int number;
std::string str;
char c;
while (infile >> number >> c >> str && c == ',')
std::cout << number << " " << str << "\n";
}
infile.close();
Although there is no need to close the file manually but it is good idea to do so if the scope of the file variable is bigger:
ifstream infile(szFilePath);
for (string line = ""; getline(infile, line); )
{
//do something with the line
}
if(infile.is_open())
infile.close();
This answer is for visual studio 2017 and if you want to read from text file which location is relative to your compiled console application.
first put your textfile (test.txt in this case) into your solution folder. After compiling keep text file in same folder with applicationName.exe
C:\Users\"username"\source\repos\"solutionName"\"solutionName"
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream inFile;
// open the file stream
inFile.open(".\\test.txt");
// check if opening a file failed
if (inFile.fail()) {
cerr << "Error opeing a file" << endl;
inFile.close();
exit(1);
}
string line;
while (getline(inFile, line))
{
cout << line << endl;
}
// close the file stream
inFile.close();
}
This is a general solution to loading data into a C++ program, and uses the readline function. This could be modified for CSV files, but the delimiter is a space here.
int n = 5, p = 2;
int X[n][p];
ifstream myfile;
myfile.open("data.txt");
string line;
string temp = "";
int a = 0; // row index
while (getline(myfile, line)) { //while there is a line
int b = 0; // column index
for (int i = 0; i < line.size(); i++) { // for each character in rowstring
if (!isblank(line[i])) { // if it is not blank, do this
string d(1, line[i]); // convert character to string
temp.append(d); // append the two strings
} else {
X[a][b] = stod(temp); // convert string to double
temp = ""; // reset the capture
b++; // increment b cause we have a new number
}
}
X[a][b] = stod(temp);
temp = "";
a++; // onto next row
}

Fixing syntax of number of line reading function

I tried making a program earlier that tells the user then number of char, words, and lines in a text file. I made functions to determine the numbers of each, yet I was passing them by value. This resulted in an error since after reading the number of char it would be at the end of the file and then output zero for the other two. Now I cant seem to rewrite my functions so that the file is open and closed each time its checked for char, words, and lines. Any one see where my errors are?? Thanks! (just copied and pasted one of my functions for now).
int num_of_lines(ifstream file)
{
string myfile;
myfile = argv[1];
ifstream l;
l.open(myfile);
int cnt3 = 0;
string str;
while(getline(file, str))cnt3++;
l.close();
return(cnt3);
}
int main(int argc, char **argv)
{
int num_of_char(ifstream file);
string file;
file = argv[1];
if(argc == 1)die("usage: mywc your_file");
ifstream ifs;
ifs.open(file);
if(ifs.is_open())
{
int a, b, c;
a = num_of_lines(ifs);
cout <<"Lines: " << a << endl;
}
else
{
cerr <<"Could not open: " << file << endl;
exit(1);
}
ifs.close();
return(0);
}
There is no way to "reopen" a file other than knowing the name and creating a new ifstream, but you can use the seekg member function to set your read position in the file, and setting it to 0 will have the next read operation start from the beginning of the file.
A stream is not possible to copy, so you can't pass it "by value", but must pass it by reference.
int num_of_lines(ifstream &file)
{
int count = 0;
string str;
while (getline(file, str)) {
count++;
}
file.seekg(0);
return count;
}
For the full problem, I agree with Mats Petersson, though. Counting both characters, lines and words in one pass will be much more efficient than reading through the file three times.

C++ read from file into a vector

I am working on a program that should read from a file and store the contents of that file in a vector. I must read the contents of the .txt file and push the strings back into a vector before it reaches a ' '. If it is a space you will skip that part of the file and continue pushing back the contents after the space. Does anybody know what function to use to read from a file and put the contents into a vector or array? Thanks for your time.
int main()
{
Code mess;
ifstream inFile;
inFile.open("message1.txt");
if (inFile.fail()) {
cerr << "Could not find file" << endl;
}
vector<string> code;
string S;
while (inFile.good()) {
code.push_back(S);
}
cout << mess.decode(code) << endl;
return 0;
}
Basically you can also do it like this :
std::ifstream fh("text.txt");
std::vector<std::string> vs;
std::string s;
while(fh>>s){
vs.push_back(s);
}
for(int i=0; i<vs.size(); i++){
std::cout<<vs[i]<<std::endl;
}
You should change your reading code to
while (inFile >> S) {
code.push_back(S);
}
Your current code doesn't read anything into your S variable.
Regarding loop conditions while (inFile.good()) see this Q&A please:
Why is iostream::eof inside a loop condition considered wrong?
Using std::iostream::good() has more or less the same issues.

C++ file reading

I have a file that has a number in which is the number of names that follow. For example:
4
bob
jim
bar
ted
im trying to write a program to read these names.
void process_file(ifstream& in, ofstream& out)
{
string i,o;
int tmp1,sp;
char tmp2;
prompt_user(i,o);
in.open (i.c_str());
if (in.fail())
{
cout << "Error opening " << i << endl;
exit(1);
}
out.open(o.c_str());
in >> tmp1;
sp=tmp1;
do
{
in.get(tmp2);
} while (tmp2 != '\n');
in.close();
out.close();
cout<< sp;
}
So far I am able to read the first line and assign int to sp
I need sp to be a counter for how many names. How do I get this to read the names.
The only problem I have left is how to get the names while ignoring the first number.
Until then i cannot implement my loop.
while (in >> tmp1)
sp=tmp1;
This successfuly reads the first int from the and then tries to continue. Since the second line is not an int, extraction fails, so it stops looping. So far so good.
However, the stream is now in fail state, and all subsequent extractions will fail unless you clear the error flags.
Say in.clear() right after the first while loop.
I don't really see why you wrote a loop to extract a single integer, though. You could just write
if (!(in >> sp)) { /* error, no int */ }
To read the names, read in strings. A loop is fine this time:
std::vector<std::string> names;
std::string temp;
while (in >> temp) names.push_back(temp);
You'd might want to add a counter somewhere to make sure that the number of names matches the number you've read from the file.
int lines;
string line;
inputfile.open("names.txt");
lines << inputfile;
for(i=0; i< lines; ++i){
if (std::getline(inputfile, line) != 0){
cout << line << std::endl;
}
}
First of all, assuming that the first loop:
while (in >> tmp1)
sp=tmp1;
Is meant to read the number in the beginning, this code should do:
in >> tmp1;
According to manual operator>>:
The istream object (*this).
The extracted value or sequence is not returned, but directly stored
in the variable passed as argument.
So don't use it in condition, rather use:
in >> tmp1;
if( tmp1 < 1){
exit(5);
}
Second, NEVER rely on assumption that the file is correctly formatted:
do {
in.get(tmp2);
cout << tmp2 << endl;
} while ( (tmp2 != '\n') && !in.eof());
Although whole algorithm seems a bit clumsy to me, this should prevent infinite loop.
Here's a simple example of how to read a specified number of words from a text file in the way you want.
#include <string>
#include <iostream>
#include <fstream>
void process_file() {
// Get file name.
std::string fileName;
std::cin >> fileName;
// Open file for read access.
std::ifstream input(fileName);
// Check if file exists.
if (!input) {
return EXIT_FAILURE;
}
// Get number of names.
int count = 0;
input >> count;
// Get names and print to cout.
std::string token;
for (int i = 0; i < count; ++i) {
input >> token;
std::cout << token;
}
}

Incorrect char from file

I have the following .txt file:
test.txt
1,2,5,6
Passing into a small C++ program I made through command line as follows:
./test test.txt
Source is as follows:
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char **argv)
{
int temp =0;
ifstream file;
file.open(argv[1]);
while(!file.eof())
{
temp=file.get();
file.ignore(1,',');
cout<<temp<<' ';
}
return 0;
}
For some reason my output is not 1 2 5 6 but 49 50 53 54. What gives?
UPDATE:
Also, I noticed there is another implementation of get(). If I define char temp then I can do file.get(temp) and that will also save me converting ASCII representation. However I like using while (file >> temp) so I will be going with that. Thanks.
temp is an int. So you see the encoded ascii values after casting the char to an int.
49 is the ascii code for digit 49-48 = 1.
get() gives you a character (character code).
by the way, eof() only becomes true after a failed read attempt, so the code you show,
while(!file.eof())
{
temp=file.get();
file.ignore(1,',');
cout<<temp<<' ';
}
will possibly display one extraneous character at the end.
the conventional loop is
while( file >> temp )
{
cout << temp << ' ';
}
where the expression file >> temp reads in one number and produces a reference to file, and where that file objected is converted to bool as if you had written
while( !(file >> temp).fail() )
This does not do what you think it does:
while(!file.eof())
This is covered in Why is iostream::eof inside a loop condition considered wrong?, so I won't cover it in this answer.
Try:
char c;
while (file >> c)
{
// [...]
}
...instead. Reading in a char rather than an int will also save you having to convert the ascii representation (ASCII value 49 is 1, etc...).
For the record, and despite this being the nth duplicate, here's how this code might look in idiomatic C++:
for (std::string line; std::getline(file, line); )
{
std::istringstream iss(line);
std::cout << "We read:";
for (std::string n; std::getline(iss, line, ','); )
{
std::cout << " " << n;
// now use e.g. std::stoi(n)
}
std::cout << "\n";
}
If you don't care about lines or just have one line, you can skip the outer loop.