Frequency table in C++ - c++

This is what I have so far; I am trying to have an array with probability of all chars and space in a text file, but I have a problem with the data type.
int main()
{
float x[27];
unsigned sum = 0;
struct Count {
unsigned n;
void print(unsigned index, unsigned total) {
char c = (char)index;
if (isprint(c)) cout << "'" << c << "'";
else cout << "'\\" << index << "'";
cout << " occured " << n << "/" << total << " times";
cout << ", propability is " << (double)n / total << "\n";
}
Count() : n() {}
} count[256];
ifstream myfile("C:\\text.txt"); // one \ masks the other
while (!myfile.eof()) {
char c;
myfile.get(c);
if (!myfile) break;
sum++;
count[(unsigned char)c].n++;
}
for (unsigned i = 0; i<256; i++)
{
count[i].print(i, sum);
}
x[0] = count[33];
int j=68;
for(int i=1;i<27;i++)
{
x[i]=count[j];
j++;
}
return 0;
}

#include <iostream>
#include <fstream>
#include <cctype>
using namespace std;
double probabilities[256]; // now it can be accessed by Count
int main()
{
unsigned sum = 0;
struct Count {
unsigned n;
double prob;
void print ( unsigned index, unsigned total ) {
// if ( ! n ) return;
probabilities[index] = prob = (double)n/total;
char c = (char) index;
if ( isprint(c) ) cout << "'" << c << "'";
else cout << "'\\" << index << "'";
cout<<" seen "<<n<<"/"<<total<<" times, probability is "<<prob<<endl;
}
Count(): n(), prob() {}
operator double() const { return prob; }
operator float() const { return (float)prob; }
} count[256];
ifstream myfile("C:\\text.txt"); // one \ masks the other
while(!myfile.eof()) {
char c;
myfile.get(c);
if ( !myfile ) break;
sum++;
count[(unsigned char)c].n++;
}
for ( unsigned i=0; i<256; i++ ) count[i].print(i,sum);
return 0;
}
I incorporated various changes suggested - Thanks!
Now, who finds the 4 ways to access the actual probabilities?

you are allocating a buffer with size 1000000 1 million characters.
char file[1000000] = "C:\text.txt";
This is not good as the extra values in the buffer are not guaranteed to be zero, the can be anything.
For Windows to read a file you need something like this. I will not give you the solution, you need to learn using msdn and documentation to understand this fully::
you need to include the #include <windows.h> header from the SDK first.
Look at this example here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa363778(v=vs.85).aspx
this example as appending a file to another. Your solution will be similar, instead of writing list to other file, process the buffer to increment your local variables and update the state of the table.
Do not set a large number you come up with for the buffer, as there will risk of not enough buffer space, and thus overflow. You should do like example:
read some bytes in buffer
process that buffer and increment the table
repeat until you reach end of file
while (ReadFile(hFile, buff, sizeof(buff), &dwBytesRead, NULL)
&& dwBytesRead > 0)
{
// write you logic here
}

Related

How to return a string line by line in a function?

I am reading a text file which contains integers separated by a new line. Like this.
5006179359870233335
13649319959095080120
17557656355642819359
15239379993672357891
3900144417965865322
12715826487550005702
From this file, I want to access each integer in a loop and compare it with another, in order to match those two. In function File_read() I can print the integers. But what I want is to get it integer by integer outside the function. For example in main method, if there is a integer called x, I want to check whether x equals one of the integers in my text file.
string File_read() {
std::ifstream my_file("H:\\Sanduni_projects\\testing\\test.txt",
std::ifstream::binary);
if (my_file) {
string line;
for (int i = 0; i < 25; i++){
getline(my_file, line);
//cout << line << endl;
return line;
}
if (my_file)
std::cout << "all characters read successfully."<<endl;
my_file.close();
}
return 0;
}
Never return unconditionally inside a loop.
You are returning unconditionally from inside the loop. This causes the caller to exit the loop and return from the function during the first iteration.
for (int i = 0; i < 25; i++){
getline(my_file, line);
return line; // <-- Return from function (rest of iterations unreachable). Bad.
}
No need to reinvent stuff
Use the standard library to read the numbers, e.g., into a container std::vector.
std::vector<unsigned long long> v{std::istream_iterator<unsigned long long>{my_file},
std::istream_iterator<unsigned long long>{}};
Notice the value type of unsigned long long that is needed to fit the large numbers (you're pushing ~64 bits here).
Find a match
Use, e.g., std::find to find a possible match among the parsed numbers.
auto key = 15239379993672357891ull;
if (auto it = std::find(std::begin(v), std::end(v), key); it != std::end(v)) {
std::cout << "Key found at line " << std::distance(std::begin(v), it) + 1 << std::endl;
}
Here, I'm using a C++1z if(init; condition) statement to limit the scope of the iterator it to inside the if statement. It's optional of course.
Live example
You are, currently, just returning the first number (as a std::string and not a number). If you remove the return statement in your loop you can, of course, print each of them. Here is a slightly modified version of your File_read function that will return a std::vector<unsigned long long> that contains all the numbers. Then you can use this vector in, e.g., your main function to do your processing.
std::vector<unsigned long long> File_read()
{
std::vector<unsigned long long> numbers;
std::ifstream my_file("H:\\Sanduni_projects\\testing\\test.txt"); // Text files are not 'binany', i.e., removed std::ifstream::binary
if (my_file)
{
std::string line;
for (int i = 0; i < 25; i++)
{
std::getline(my_file, line);
numbers.push_back(std::stoull(line));
}
if (my_file)
{
std::cout << "all characters read successfully." << std::endl;
}
// my_file.close(); // Do not do this manually
}
return numbers;
}
Usage example:
int main()
{
unsigned long long x = /* some number */;
// Read all the numbers
std::vector<unsigned long long> vl = File_read();
// Run through all the numbers
for (unsigned long long y : vl)
{
// Check if any of the numbers are equal to x
if (x == y)
{
// There is a match...
// Do stuff
}
}
}
Update
The numbers cannot be held by in a long, however unsigned long long is sufficient.
std::vector<long> File_read(){
vector<long> numbers;
ifstream my_file("H:\\Sanduni_projects\\testing\\test.txt",
std::ifstream::binary);
if (my_file) {
string line;
for (int i = 0; i < frames_sec; i++){
getline(my_file, line);
numbers.push_back(std::stol(line));
}
if (my_file)
std::cout << "all characters read successfully." << endl;
else
std::cout << "error: only " << my_file.gcount() << " could be read" << endl;
my_file.close();
}
else{
cout << "File can not be opened" << endl;
}
return numbers;
}
Although the someone gives the answers that works correctly, I want to share my code.
#include <memory>
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
#define MAX_SIZE 4096
class FileRead
{
public:
FileRead(string path) :_file(path)
{
Reset();
}
void Reset()
{
memset(_buff, 0, MAX_SIZE);
}
string ReadLine()
{
if (!_file.is_open())
{
cout << "error open file" << endl;
return "";
}
if (!_file.eof())
{
Reset();
_file.getline(_buff,MAX_SIZE);
return string(_buff);
}
else
{
cout << "read file finished." << endl;
return "";
}
}
private:
ifstream _file;
string _line;
char _buff[MAX_SIZE];
};
int _tmain(int argc, _TCHAR* argv[])
{
FileRead fr("H:\\Sanduni_projects\\testing\\test.txt");
string line;
while (!(line = fr.ReadLine()).empty())
{
//do some compare..
}
return 0;
}
The other answers are correct about how return works, but there is something that acts how you thought return acted.
using string_coro = boost::coroutines::asymmetric_coroutine<std::string>
void File_read(string_coro::push_type & yield) {
std::ifstream my_file("H:\\Sanduni_projects\\testing\\test.txt", std::ifstream::binary);
if (my_file) {
string line;
for (int i = 0; i < 25; i++){
getline(my_file, line);
yield (line);
}
if (my_file)
std::cout << "all characters read successfully." << std::endl;
my_file.close();
}
}
Which is used like this
string_coro::pull_type(File_read) strings;
for (const std::string & s : strings)
std::cout << s << endl;

Finding int value in large binary file c++

I tried to make a program that loads chunks of a large (We're speaking of a few MBs) of file, and searches for a value, and prints its address and value, except my program every few times throws a !myfile , doesn't give the value except a weird symbol (Although I've used 'hex' in cout), the addresses seem to loop sorta, and it doesn't seem to find all the values at all. I've tried for a long time and I gave up, so I'm asking experiences coders out there to find the issue.
I should note that I'm trying to find a 32 bit value in this file, but all I could make was a program that checks bytes, i'd require assistance for that too.
#include <iostream>
#include <fstream>
#include <climits>
#include <sstream>
#include <windows.h>
#include <math.h>
using namespace std;
int get_file_size(std::string filename) // path to file
{
FILE *p_file = NULL;
p_file = fopen(filename.c_str(),"rb");
fseek(p_file,0,SEEK_END);
int size = ftell(p_file);
fclose(p_file);
return size;
}
int main( void )
{
ifstream myfile;
myfile.open( "file.bin", ios::binary | ios::in );
char addr_start = 0,
addr_end = 0,
temp2 = 0x40000;
bool found = false;
cout << "\nEnter address start (Little endian, hex): ";
cin >> hex >> addr_start;
cout << "\nEnter address end (Little endian, hex): ";
cin >> hex >> addr_end;
unsigned long int fsize = get_file_size("file.bin");
char buffer[100];
for(int counter = fsize; counter != 0; counter--)
{
myfile.read(buffer,100);
if(!myfile)
{
cout << "\nAn error has occurred. Bytes read: " << myfile.gcount();
myfile.clear();
}
for(int x = 0; x < 100 - 1 ; x++)
{
if(buffer[x] >= addr_start && buffer[x] <= addr_end)
cout << "Addr: " << (fsize - counter * x) << " Value: " << hex << buffer[x] << endl;
}
}
myfile.close();
system("PAUSE"); //Don't worry about its inefficiency
}
A simple program to search for a 32-bit integer in a binary file:
int main(void)
{
ifstream data_file("my_file.bin", ios::binary);
if (!data_file)
{
cerr << "Error opening my_file.bin.\n";
EXIT_FAILURE;
}
const uint32_t search_key = 0x12345678U;
uint32_t value;
while (data_file.read((char *) &value, sizeof(value))
{
if (value == search_key)
{
cout << "Found value.\n";
break;
}
}
return EXIT_SUCCESS;
}
You could augment the performance by reading into a buffer and searching the buffer.
//...
const unsigned int BUFFER_SIZE = 1024;
static uint32_t buffer[BUFFER_SIZE];
while (data_file.read((char *)&(buffer[0]), sizeof(buffer) / sizeof(uint32_t))
{
int bytes_read = data_file.gcount();
if (bytes_read > 0)
{
values_read = ((unsigned int) bytes_read) / sizeof(uint32_t);
for (unsigned int index = 0U; index < values_read; ++index)
{
if (buffer[index] == search_key)
{
cout << "Value found.\n";
break;
}
}
}
}
With the above code, when the read fails, the number of bytes should be checked, and if any bytes were read, the buffer then searched.

C++ accessing vector of vector got segmentation fault

I created a vector of vector (10*10000) and try to access this vector through member function. but I got a segmentation fault. I don't know what's wrong here...
Here is Simple.h
class Simple
{
private:
std::vector<double> data_row;
std::vector<std::vector<double> > data;
public:
Simple():data_row(10000), data(10, data_row){};
/*initialize data vector*/
int getSampleCounts(std::istream &File);
/*return number of packet samples in this file*/
Result getModel(std::istream &File);
/*return average and variance of simple delta time*/
void splitData (std::istream &File, const int & sample_in_fold);
};
#endif /* SIMPLE_H */
here is Simple.cpp
void Simple::splitData(std::istream& File, const int & sample_in_fold) {
double value = 0.0;
bool isFailed = true;
int label = 0;
while (File >> value) {
// for each value, generate a label
srand(time(NULL));
label = rand() % 10; // generate label between 0 to 9
while (isFailed) {
// segmentation fault in the next line!
std::cout << "current data size is: " << this->data.size() <<endl;
std::vector<double>::size_type sz = this->data[label].size();
if (sz <= sample_in_fold) {
std::cout << "current size is " << sz << "< samples in fold: " << sample_in_fold << endl;
this->data[label].push_back(value);
std::cout << "push_back succeed!" << endl;
isFailed = false;
} else {
std::cout << "label " << label << "if full. Next label. \n";
srand(time(NULL));
label = rand() % 10;
sz = this->data[label].size();
}
}
}
}
and I'm attaching the main file here.
#include <iostream>
#include <fstream>
#include <string>
#include <cstdlib> // for system())
#include <sys/types.h>
#include <dirent.h>
#include <vector>
#include <limits.h> // for PATH_MAX
#include "Complex.h"
#include "Result.h"
#include "Simple.h"
#include <math.h>
using namespace std;
int main(int argc, char ** argv) {
struct dirent *pDirent;
DIR *pDir;
std::string line;
// check for args
if (argc == 1) {
printf("Usage: ./main + folder name. \n");
return 1;
}
pDir = opendir(argv[1]);
if (pDir == NULL) {
printf("Cannot open directory '%s' \n", argv[1]);
return 1;
}
// readdir returns a pointer to the next direcctory entry dirent structure
while ((pDirent = readdir(pDir)) != NULL) {
// get file name and absolute path
char *name = pDirent->d_name;
char buf[PATH_MAX + 1];
realpath(name, buf);
// std::cout << "Current file is: " << (pDirent->d_name) << endl;
if (has_suffix(pDirent->d_name, ".txt")) {
printf("[%s]\n", pDirent->d_name);
//printf("absolute path is %s. \n", buf);
ifstream infile;
// open file with absolute path
infile.open(buf, ios::in);
if (!infile) {
cerr << "Can't open input file " << buf << endl;
exit(1);
}
//processing for simple pattern
if (has_suffix(name, "testfile.txt")) {
Simple* simple_obj;
int number = simple_obj->getSampleCounts(infile);
Result simplerst = simple_obj->getModel(infile);
std::cout << "Number of delta time is " << number << endl;
infile.clear();
infile.seekg(0);
write_to_file(pDirent->d_name, simplerst);
// divide data into k = 10 folds, get number of data in each fold
int sample_in_fold = floor(number / 10);
std::cout << sample_in_fold << std::endl;
simple_obj->splitData(infile, sample_in_fold);
}
} else {
// printf("This is not a txt file. Continue\n");
}
}
closedir(pDir);
return 0;
}
And here is a sample testfile.txt. I only copied part of the original file, for illustration.
10.145906000
10.151063000
10.131083000
10.143461000
10.131745000
10.151285000
10.147493000
10.123198000
10.144975000
10.144484000
10.138129000
10.131634000
10.144311000
10.157710000
10.138047000
10.122754000
10.137675000
10.204973000
10.140399000
10.142194000
10.138388000
10.141669000
10.138056000
10.138679000
10.141415000
10.154170000
10.139574000
10.140207000
10.149151000
10.164629000
10.106818000
10.142431000
10.137675000
10.204973000
10.140399000
10.142194000
10.138388000
10.141669000
10.138056000
10.138679000
10.141415000
Here is Result.h
#ifndef RESULT_H
#define RESULT_H
typedef struct Result {
double average;
double sigma;
}Result;
and getModel function in Simple.cpp:
Result Simple::getModel(std::istream &File) {
double value = 0.0;
double average = 0.0;
double sum = 0.0;
double counter = 0.0;
double sumsqr = 0.0;
double var = 0.0;
double sigma = 0.0;
while (File >> value) {
++counter;
sum += value;
sumsqr += value * value;
}
average = sum / counter;
var = sumsqr / counter - average * average; //E(x^2) - (E(x))^2
sigma = sqrt(var);
std::cout << "average is " << average << std::endl;
std::cout << "std deviation is " << sigma << std::endl;
File.clear();
File.seekg(0);
Result result = {average, sigma};
return result;
}
One issue right away:
Simple* simple_obj;
int number = simple_obj->getSampleCounts(infile);
simple_obj is an uninitialized pointer, thus your program exhibits undefined behavior at this point.
Why use a pointer anyway? You could have simply done this to avoid the issue:
Simple simple_obj;
simple_obj.getSampleCounts(infile);
Also, this line may not be an issue, but I'll mention it anyway:
Result simplerst = simple_obj->getModel(infile);
We already know that in your original code, simple_obj is bogus, but that's not the issue here. If Result is an object, and that object does not have correct copy semantics, then that assignment will also cause undefined behavior.
You've got a couple of uses of endl without specifying std::endl (they're not the same thing - you always have to type the std:: ). Is endl silently referring to another variable somewhere else?

Using a tokenizer in C++ from a file?

I am working on an assignment that requires me to read in several lines of text from a file, and at the end use qsort to sort the words used alphabetically and display a count of how many times each word was used. I realized I'm going to have to tokenize the strings as they are read in from the file. The only problem is that the individual tokens kind of disappear after you do it so I have to add them to a list. I'm bad at explaining, so here's my code:
#include<iostream>
#include<string>
#include<algorithm>
#include<stdlib.h>
#include<fstream>
using namespace std;
int compare(const void* , const void*);
const int SIZE = 1000;
const int WORD_SIZE = 256;
void main()
{
cout << "This program is designed to alphabetize words entered from a file." << endl;
cout << "It will then display this list with the number of times " << endl;
cout << "that each word was entered." << endl;
cout << endl;
char *words[SIZE];//[WORD_SIZE];
char temp[100];
char *tokenPtr, *nullPtr= NULL;
char *list[SIZE];
string word;
int i = 0, b = 0;
ifstream from_file;
from_file.open("prob1.txt.txt");
if (!from_file)
{
cout << "Cannot open file - prob1.txt";
exit(1); //exits program
}
while (!from_file.eof())
{
from_file.getline(temp, 99);
tokenPtr = strtok(temp, " ");
while (tokenPtr != NULL)
{
cout << tokenPtr << '\n';
list[b] = tokenPtr;
b++;
tokenPtr = strtok(nullPtr, " ");
}
word = temp;
transform(word.begin(), word.end(), word.begin(), ::tolower);
words[i] = list[i];
i++;
}
from_file.close();
qsort(words, i, WORD_SIZE, compare);
int currentcount = 1 ;
int k;
for( int s = 0; s < i; s++ )
{
for( k = 1; k <= s; k++)
{
if( words[s] == words[k] )
{
currentcount++;
}
currentcount = 1;
words[k] = "";
}
cout << words[s] << " is listed: " << currentcount << " times." << endl;
words[s] = "";
}
}
int compare(const void* p1, const void *p2)
{
char char1, char2;
char1 = *(char *)p1; // cast from pointer to void
char2 = *(char *)p2; // to pointer to int
if(char1 < char2)
return -1;
else
if (char1 == char2)
return 0;
else
return 1;
}
The only thing missing is the compare function, but the program works fine, up until the qsort, wherein it crashes, but it doesn't tell me why. Can anybody shed some insight/help me fix this up?
Again, this IS an assignment. (I was told I need to specify this?)
The array words is an array of pointers to char:
char* words[SIZE]; // SIZE elements of type `char*`
So the third parameter WIDTH should be the width of a pointer to char.
qsort(words, i, sizeof(char*), compare);
Also your implementation of compare is not working as you expect.
You are passing pointers to the compare. But they are pointers at the elements. You need to de-reference the pointers to get the values:
int compare(const void* p1, const void *p2)
{
char const* x = *(char**)p1;
char const* y = *(char**)p2;
This does not compare strings:
if( words[s] == words[k] )
This just compares two pointers. To compare the strings they point at use strcmp()
if( strcmp(words[s], words[k]) == 0)
This should stop the crashes, but there is a lot more improvements to this code we can do:
Once you get it working you should post it here https://codereview.stackexchange.com/ for a review.

Access violating writing location (visual studio 2008) Code based on pointers

The main problem is after sem->i = a; is used when yylex is called and c isalpha
sem->s[i] = c; doesn't work because sem->s[i] has an issue with the adress it points to.
more details:
So what i want to do is to open a txt and read what it is inside until the end of file.
If it's an alfanumeric (example: hello ,example2 hello45a) at the function yylex i put each of the characters into an array(sem->s[i]) until i find end of file or something not alfanumeric.
If it's a digit (example: 5234254 example2: 5) at the function yylex i put each of the characters into the array arithmoi[]. and after with attoi i put the number into the sem->i.
If i delete the else if(isdigit(c)) part at yylex it works(if every word in the txt doesn't start with a digit) .
Anyway the thing is that it works great when it finds only words that starts with characters. Then if it finds number(it uses the elseif(isdigit(c) part) it still works...until it finds a words starting with a character. when that happens there is an access violating writing location and the problem seems to be where i have an arrow. if you can help me i would be really thankfull.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <iostream>
using namespace std;
union SEMANTIC_INFO
{
int i;
char *s;
};
int yylex(FILE *fpointer, SEMANTIC_INFO *sem)
{
char c;
int i=0;
int j=0;
c = fgetc (fpointer);
while(c != EOF)
{
if(isalpha(c))
{
do
{
sem->s[i] = c;//the problem is here... <-------------------
c = fgetc(fpointer);
i++;
}while(isalnum(c));
return 1;
}
else if(isdigit(c))
{
char arithmoi[20];
do
{
arithmoi[j] = c;
j++;
c = fgetc(fpointer);
}while(isdigit(c));
sem->i = atoi(arithmoi); //when this is used the sem->s[i] in if(isalpha) doesn't work
return 2;
}
}
cout << "end of file" << endl;
return 0;
}
int main()
{
int i,k;
char c[20];
int counter1 = 0;
int counter2 = 0;
for(i=0; i < 20; i++)
{
c[i] = ' ';
}
SEMANTIC_INFO sematic;
SEMANTIC_INFO *sema = &sematic;
sematic.s = c;
FILE *pFile;
pFile = fopen ("piri.txt", "r");
do
{
k = yylex( pFile, sema);
if(k == 1)
{
counter1++;
cout << "it's type is alfanumeric and it's: ";
for(i=0; i<20; i++)
{
cout << sematic.s[i] << " " ;
}
cout <<endl;
for(i=0; i < 20; i++)
{
c[i] = ' ';
}
}
else if(k==2)
{
counter2++;
cout << "it's type is digit and it's: "<< sematic.i << endl;
}
}while(k != 0);
cout<<"the alfanumeric are : " << counter1 << endl;
cout<<"the digits are: " << counter2 << endl;
fclose (pFile);
system("pause");
return 0;
}
This line in main is creating an uninitialized SEMANTIC_INFO
SEMANTIC_INFO sematic;
The value of integer sematic.i is unknown.
The value of pointer sematic.s is unknown.
You then try to write to sematic.s[0]. You're hoping that sematic.s points to writable memory, large enough to hold the contents of that file, but you haven't made it point to anything.