#include <iostream>
#include <iomanip>
#include <string>
#include <algorithm>
using namespace std;
void getinput (string &first,string &second);
void lengthcheck (string first, string second);
//int anagramcheck (string word);
int* lettercounter (string input);
int main()
{
std::string a;
std::string b;
getinput(a,b);
lengthcheck (a,b);
lettercounter(a);
lettercounter(b);
int* one = lettercounter(a);
int* two = lettercounter(b);
if (one == two)
cout << "You Have Entered An Anagram" << endl;
else
cout << "You Have Not Entered An Anagram" << endl;
}
void getinput (string &first, string &second) {
cout << "Enter First Input: ";
getline(cin, first, '\n');
cout << "Enter Second Input: ";
getline(cin, second, '\n');
cout << "You Entered " << first << " and " << second <<endl;
}
void lengthcheck(string first, string second){
int lengtha = first.length();
int lengthb = second.length();
if ((lengthb > 60) || (lengtha > 60)) {
cout << "Input Is Invalid" << endl;
} else if (lengtha !=lengthb) {
cout << "Input is not an anagram" << endl;
} else {
cout << "Input is Valid" << endl;
}
}
int* lettercounter(string input)
{
static int freq[26] = {0};
int length = input.length();
for (int i=0; i<26; i++) {
freq[i]=0;
}
for (int i=0; i <length; i++) {
if(input[i]>='a' && input[i]<='z')
{
freq[input[i] - 97]++;
}
else if(input[i]>='A' && input[i]<='Z')
{
freq[input[i] - 65]++;
}
}
for(int i=0; i<26; i++) {
/* If current character exists in given string */
if(freq[i] != 0)
{
printf("'%c' = %d\n", (i + 97), freq[i]);
}
return freq;
}
}
I am having trouble returning the array named freq from the user definied function called lettercount. Can someone give me a hint? I need the lettercount to return an array. I need to call the function lettercount twice so i can compare the results of each array to determine if the two inputs are anagrams. I am not sure if the function is returning an actual value to the main.
First of all, freq shouldn't be static. By making it static, you would be accessing the same array everytime. For what you want to do, you don't want to always access the same memory.
In second place, you cannot just return a pointer to memory that has not being allocated dynamically or that isn't static. When you get out of scope (i.e. you return from the function lettercounter back to main), the memory that was occupied by the array will be freed. So, you would be returning a pointer to memory that is no longer reserved, resulting in undefined behavior.
If you really need to work with raw pointers, then each time you enter lettercounter, you would need to allocate memory for the array dynamically like this: int * freq = new int[26];. This will reserve memory for an array of size 26. Then, when you return freq, the memory will still be allocated. However, don't forget that the memory allocated with new doesn't delete itself. You have to clean your mess. In this case, at the end of main you would call delete[] one; and delete[] two;.
int* lettercounter(string input)
{
int * freq = new int[26];
.
.
.
return freq;
}
int main()
{
.
.
int* one = lettercounter(a);
int* two = lettercounter(b);
.
.
delete[] one;
delete[] two;
}
In any case, I'd recommend you to learn to use smart pointers and about standard containers (like a vector). These operations would be much simpler.
This is my code.
#include <iostream>
using namespace std;
typedef struct
{
int polski;
int wf;
int matma;
}oceny;
int funkcja_liczaca(int suma, int ile_liczb, int ktory_przedmiot, oceny &temporary);
int main()
{
int suma = 0;
int temp[3];
int ile_liczb_zostalo_wprowadzonych = 0;
oceny database;
string teksty[3] = {"polski: ", "wf: ", "matma: "};
for (int i=0; i!=3; i++)
{
cout << teksty[i] << endl;
while(temp[i]!=0)
{
cin >> temp[i];
if(cin.good()) //floating point exception here. the code don't even step into this one.
{
{
suma = temp[i] + suma;
ile_liczb_zostalo_wprowadzonych++;
if(temp[i]==0){ile_liczb_zostalo_wprowadzonych--;}
}
}else cout << "error";
};
funkcja_liczaca(suma, ile_liczb_zostalo_wprowadzonych, i, database);
suma = 0;
ile_liczb_zostalo_wprowadzonych = 0;
}
cout << "output of struct members in main() \n";
cout << database.polski << endl;
cout << database.wf << endl;
cout << database.matma << endl;
return 0;
}
int funkcja_liczaca(int suma, int ile_liczb, int ktory_przedmiot, oceny &temporary)
{
if(ktory_przedmiot==0){temporary.polski=suma/ile_liczb;cout << temporary.polski << endl;}
if(ktory_przedmiot==1){temporary.wf=suma/ile_liczb;cout << temporary.wf << endl;}
if(ktory_przedmiot==2){temporary.matma=suma/ile_liczb;cout << temporary.matma << endl;}
}
It counts arithmetic average of inputed numbers untill user input 0 which ends loop. then the arithmetic average of thoose numbers is counted in the funkcja_liczaca() and it's saved into the members of struct oceny.
everything works fine but i want to implement something like "stream" check while inputing from keyboard to prevent inputing bad variables into integer type variable.
so inputing 'g' into temp[i] is causing floating point exception. the question is why? cin.good() and cin.fail() is not working.
When you want to deal with errors in the input stream, it's better to read the input line by line as a string and then attempt to extract your data from the string. If extraction of the data from the string is successful, proceed to process the data. Otherwise, attempt to read the next line of text. Here's the core logic for that.
while ( true )
{
cout << teksty[i] << endl;
std::string line;
if ( !getline(cin, line) )
{
// Problem reading a line of text.
// Exit.
exit(EXIT_FAILURE);
}
// Construct a istringstream object to extract the data.
std::istringstream istr(line);
if ( istr >> temp[i] )
{
// Extracting the number was successful.
// Add any additional checks as necessary.
// Break out of the while loop.
break.
}
// Bad input. Continue to the next iteration of the loop
// and read the next line of text.
}
Here is the assignment:
Write a program that reads in a text file one word at a time. Store a word into a dynamically created array when it is first encountered. Create a paralle integer array to hold a count of the number of times that each particular word appears in the text file. If the word appears in the text file multiple times, do not add it into your dynamic array, but make sure to increment the corresponding word frequency counter in the parallel integer array. Remove any trailing punctuation from all words before doing any comparisons.
Create and use the following text file containing a quote from Bill Cosby to test your program.
I don't know the key to success, but the key to failure is trying to please everybody.
At the end of your program, generate a report that prints the contents of your two arrays in a format similar to the following:
Word Frequency Analysis
I 1
don't 1
know 1
the 2
key 2
...
Here is my code:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int readInFile (string tempArray [], string file, int arraySize);
int main()
{
ifstream inputFile;
string *readInArray = 0,
*compareArray = 0,
filename,
word;
int wordCount = 0;
int encountered = 0;
int j = 0,
*wordFrequency = 0;
cout << "Enter the filename you wish to read in: ";
getline(cin, filename);
inputFile.open(filename.c_str());
if (inputFile)
{
while (inputFile >> word)
{
wordCount++;
}
inputFile.close();
readInArray = new string[wordCount];
readInFile(readInArray, filename, wordCount);
}
else
{
cout << "Could not open file, ending program";
return 0;
}
compareArray = new string[wordCount];
wordFrequency = new int[wordCount];
for (int count = 0; count < wordCount; count++)
wordFrequency[count] = 0;
for(int i = 0; i < wordCount; ++i)
{
j = 0;
encountered = 0;
do
{
if (readInArray[i] == compareArray[j])
encountered = 1;
++j;
} while (j < wordCount);
if (encountered == 0)
{
compareArray[i]=readInArray[i];
wordFrequency[i] += 1;
}
}
for(int k=0; k < wordCount; ++k)
{
cout << "\n" << compareArray[k] << " ";
}
for(int l=0; l < wordCount; ++l)
{
cout << "\n" << wordFrequency[l] << " ";
}
return 0;
}
int readInFile (string tempArray [], string file, int arraySize)
{
ifstream inputFile;
inputFile.open(file.c_str());
if (inputFile)
{
cout << "\nHere is the text file:\n\n";
for(int i=0; i < arraySize; ++i)
{
inputFile >> tempArray[i];
cout << tempArray[i] << " ";
}
inputFile.close();
}
}
Here is my question:
How do you store a word into a dynamically created array when it is first encountered? As you can see from my code made a string array with some of the elements empty. I believe it is suppose to be done using pointers.
Also how do I get rid of the punctuation in the string array? Should it be converted to a c-string first? But then how would I compare the words without converting back to a string array?
Here is a link to a java program that does something similar:
http://math.hws.edu/eck/cs124/javanotes3/c10/ex-10-1-answer.html
Thank you for any help you can offer!!
As to the first part of your question, you are not using a dynamically created array. You are using a regular array. C++ provides implementations of dymnamic arrays, like the vector class http://www.cplusplus.com/reference/vector/vector/
As to the second part of your question, I see no reason to convert it to a c string. The string class in c++ provides functionality for removing and searching for characters. http://www.cplusplus.com/reference/string/string/
The string::erase function can be used to erase punctuation characters found with string::find.
Note: There are other ways of doing this assignment that may be easier (like having an array of structs containing a string and an int, or using a map) but that may defeat the purpose of the assignment.
I have about 25 millions of integers separated by lines in my text file. My first task is to take those integers and sort them. I have actually achieved to read the integers and put them into an array (since my sorting function takes an unsorted array as an argument). However, this reading the integers from a file is a very long and an expensive process. I have searched many other solutions to get the cheaper and efficient way of doing this but I was not able to find one that tackles with such sizes. Therefore, what would your suggestion be to read the integers from the huge (about 260MB) text file. And also how can I get the number of lines efficiently for the same problem.
ifstream myFile("input.txt");
int currentNumber;
int nItems = 25000000;
int *arr = (int*) malloc(nItems*sizeof(*arr));
int i = 0;
while (myFile >> currentNumber)
{
arr[i++] = currentNumber;
}
This is just how I get the integers from the text file. It is not that complicated. I assumed the number of lines are fixed (actually it is fixed)
By the way, it is not too slow of course. It completes reading in approximately 9 seconds in OS X with 2.2GHz i7 processor. But I feel it could be much better.
Most likely, any optimisation on this is likely to have rather little effect. On my machine, the limiting factor for reading large files is the disk transfer speed. Yes, improving the read speed can improve it a little bit, but most likely, you won't get very much from that.
I found in a previous test [I'll see if I can find the answer with that in it - I couldn't find the source in my "experiment code for SO" directory] that the fastest way is to load the file using mmap. But it's only marginally faster than using ifstream.
Edit: my home-made benchmark for reading a file in a few different ways.
getline while reading a file vs reading whole file and then splitting based on newline character
As per usual, benchmarks measure what the benchmark measures, and small changes to either the environment or the way the code is written can sometimes make a big difference.
Edit:
Here are a few implementations of "read a number from a file and store it in a vector":
#include <iostream>
#include <fstream>
#include <vector>
#include <sys/time.h>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <sys/mman.h>
#include <sys/types.h>
#include <fcntl.h>
using namespace std;
const char *file_name = "lots_of_numbers.txt";
void func1()
{
vector<int> v;
int num;
ifstream fin(file_name);
while( fin >> num )
{
v.push_back(num);
}
cout << "Number of values read " << v.size() << endl;
}
void func2()
{
vector<int> v;
v.reserve(42336000);
int num;
ifstream fin(file_name);
while( fin >> num )
{
v.push_back(num);
}
cout << "Number of values read " << v.size() << endl;
}
void func3()
{
int *v = new int[42336000];
int num;
ifstream fin(file_name);
int i = 0;
while( fin >> num )
{
v[i++] = num;
}
cout << "Number of values read " << i << endl;
delete [] v;
}
void func4()
{
int *v = new int[42336000];
FILE *f = fopen(file_name, "r");
int num;
int i = 0;
while(fscanf(f, "%d", &num) == 1)
{
v[i++] = num;
}
cout << "Number of values read " << i << endl;
fclose(f);
delete [] v;
}
void func5()
{
int *v = new int[42336000];
int num = 0;
ifstream fin(file_name);
char buffer[8192];
int i = 0;
int bytes = 0;
char *p;
int hasnum = 0;
int eof = 0;
while(!eof)
{
fin.read(buffer, sizeof(buffer));
p = buffer;
bytes = 8192;
while(bytes > 0)
{
if (*p == 26) // End of file marker...
{
eof = 1;
break;
}
if (*p == '\n' || *p == ' ')
{
if (hasnum)
v[i++] = num;
num = 0;
p++;
bytes--;
hasnum = 0;
}
else if (*p >= '0' && *p <= '9')
{
hasnum = 1;
num *= 10;
num += *p-'0';
p++;
bytes--;
}
else
{
cout << "Error..." << endl;
exit(1);
}
}
memset(buffer, 26, sizeof(buffer)); // To detect end of files.
}
cout << "Number of values read " << i << endl;
delete [] v;
}
void func6()
{
int *v = new int[42336000];
int num = 0;
FILE *f = fopen(file_name, "r");
char buffer[8192];
int i = 0;
int bytes = 0;
char *p;
int hasnum = 0;
int eof = 0;
while(!eof)
{
fread(buffer, 1, sizeof(buffer), f);
p = buffer;
bytes = 8192;
while(bytes > 0)
{
if (*p == 26) // End of file marker...
{
eof = 1;
break;
}
if (*p == '\n' || *p == ' ')
{
if (hasnum)
v[i++] = num;
num = 0;
p++;
bytes--;
hasnum = 0;
}
else if (*p >= '0' && *p <= '9')
{
hasnum = 1;
num *= 10;
num += *p-'0';
p++;
bytes--;
}
else
{
cout << "Error..." << endl;
exit(1);
}
}
memset(buffer, 26, sizeof(buffer)); // To detect end of files.
}
fclose(f);
cout << "Number of values read " << i << endl;
delete [] v;
}
void func7()
{
int *v = new int[42336000];
int num = 0;
FILE *f = fopen(file_name, "r");
int ch;
int i = 0;
int hasnum = 0;
while((ch = fgetc(f)) != EOF)
{
if (ch == '\n' || ch == ' ')
{
if (hasnum)
v[i++] = num;
num = 0;
hasnum = 0;
}
else if (ch >= '0' && ch <= '9')
{
hasnum = 1;
num *= 10;
num += ch-'0';
}
else
{
cout << "Error..." << endl;
exit(1);
}
}
fclose(f);
cout << "Number of values read " << i << endl;
delete [] v;
}
void func8()
{
int *v = new int[42336000];
int num = 0;
int f = open(file_name, O_RDONLY);
off_t size = lseek(f, 0, SEEK_END);
char *buffer = (char *)mmap(NULL, size, PROT_READ, MAP_PRIVATE, f, 0);
int i = 0;
int hasnum = 0;
int bytes = size;
char *p = buffer;
while(bytes > 0)
{
if (*p == '\n' || *p == ' ')
{
if (hasnum)
v[i++] = num;
num = 0;
p++;
bytes--;
hasnum = 0;
}
else if (*p >= '0' && *p <= '9')
{
hasnum = 1;
num *= 10;
num += *p-'0';
p++;
bytes--;
}
else
{
cout << "Error..." << endl;
exit(1);
}
}
close(f);
munmap(buffer, size);
cout << "Number of values read " << i << endl;
delete [] v;
}
struct bm
{
void (*f)();
const char *name;
};
#define BM(f) { f, #f }
bm b[] =
{
BM(func1),
BM(func2),
BM(func3),
BM(func4),
BM(func5),
BM(func6),
BM(func7),
BM(func8),
};
double time_to_double(timeval *t)
{
return (t->tv_sec + (t->tv_usec/1000000.0)) * 1000.0;
}
double time_diff(timeval *t1, timeval *t2)
{
return time_to_double(t2) - time_to_double(t1);
}
int main()
{
for(int i = 0; i < sizeof(b) / sizeof(b[0]); i++)
{
timeval t1, t2;
gettimeofday(&t1, NULL);
b[i].f();
gettimeofday(&t2, NULL);
cout << b[i].name << ": " << time_diff(&t1, &t2) << "ms" << endl;
}
for(int i = sizeof(b) / sizeof(b[0])-1; i >= 0; i--)
{
timeval t1, t2;
gettimeofday(&t1, NULL);
b[i].f();
gettimeofday(&t2, NULL);
cout << b[i].name << ": " << time_diff(&t1, &t2) << "ms" << endl;
}
}
Results (two consecutive runs, forwards and backwards to avoid file-caching benefits):
Number of values read 42336000
func1: 6068.53ms
Number of values read 42336000
func2: 6421.47ms
Number of values read 42336000
func3: 5756.63ms
Number of values read 42336000
func4: 6947.56ms
Number of values read 42336000
func5: 941.081ms
Number of values read 42336000
func6: 962.831ms
Number of values read 42336000
func7: 2572.4ms
Number of values read 42336000
func8: 816.59ms
Number of values read 42336000
func8: 815.528ms
Number of values read 42336000
func7: 2578.6ms
Number of values read 42336000
func6: 948.185ms
Number of values read 42336000
func5: 932.139ms
Number of values read 42336000
func4: 6988.8ms
Number of values read 42336000
func3: 5750.03ms
Number of values read 42336000
func2: 6380.36ms
Number of values read 42336000
func1: 6050.45ms
In summary, as someone pointed out in the comments, the actual parsing of integers is quite a substantial part of the whole time, so reading the file isn't quite as critical as I first made out. Even a very naive way of reading the file (using fgetc() beats the ifstream operator>> for integers.
As can be seen, using mmap to load the file is slightly faster than reading the file via fstream, but only marginally so.
You can use external sorting to sort values in your file without loading them all into memory. Sorting speed will be limited by your hard drive capabilities, but you will be able to mess with really huge files. Here is the implementation.
Try reading blocks of integers and parsing those blocks instead of reading line by line.
260MB is not that big. You should be able to load the whole thing into memory and then parse through it. Once in you can use a nested loop to read the integers between line endings and convert using the usual functions. I'd try and preallocate sufficient memory for your array of integers before you start.
Oh, and you may find the crude old C-style file access functions are the faster options for things like this.
I would do it this way :
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main() {
fstream file;
string line;
int intValue;
int lineCount = 0;
try {
file.open("myFile.txt", ios_base::in); // Open to read
while(getline(file, line)) {
lineCount++;
try {
intValue = stoi(line);
// Do something with your value
cout << "Value for line " << lineCount << " : " << intValue << endl;
} catch (const exception& e) {
cerr << "Failed to convert line " << lineCount << " to an int : " << e.what() << endl;
}
}
} catch (const exception& e) {
cerr << e.what() << endl;
if (file.is_open()) {
file.close();
}
}
cout << "Line count : " << lineCount << endl;
system("PAUSE");
}
It will be pretty straightforward with Qt:
QFile file("h:/1.txt");
file.open(QIODevice::ReadOnly);
QDataStream in(&file);
QVector<int> ints;
ints.reserve(25000000);
while (!in.atEnd()) {
int integer;
qint8 line;
in >> integer >> line; // read an int into integer, a char into line
ints.append(integer); // append the integer to the vector
}
At the end, you have the ints QVector you can easily sort. The number of lines is the same as the size of the vector, provided the file was properly formatted.
On my machine, i7 3770k #4.2 Ghz, it takes about 490 milliseconds to read 25 million ints and put them into a vector. Reading from a regular mechanical HDD, not SSD.
Buffering the entire file into memory didn't help all that much, time dropped to 420 msec.
You don't say how you are reading the values, so it's hard to
say. Still, there are really only two solutions: `someIStream
anIntandfscanf( someFd, "%d", &anInt )` Logically, these
should have similar performance, but implementations vary; it
might be worth trying and measuring both.
Another thing to check is how you're storing them. If you know
you have about 25 million, doing a reserve of 30 million on
the std::vector before reading them would probably help. It
might also be cheaper to construct the vector with 30 million
elements, then trim it when you've seen the end, rather than
using push_back.
Finally, you might consider writing a immapstreambuf, and
using that to mmap the input, and read it directly from the
mapped memory. Or even iterating over it manually, calling
strtol (but that's a lot more work); all of the streaming
solutions probably end up calling strtol, or something
similar, but doing significant work around the call first.
EDIT:
FWIW, I did some very quick tests on my home machine (a fairly
recent LeNova, running Linux), and the results surprised me:
As a reference, I did the trivial, naïve implementation, using
std::cin >> tmp and v.push_back( tmp );, with no attempts to
optimize. On my system, this ran in just under 10 seconds.
Simple optimizations, such as using reserve on the vector,
or initially creating the vector with a size of 25000000, didn't
change much—the time was still over 9 seconds.
Using a very simple mmapstreambuf, the time dropped to
around 3 seconds—with the simplest loop, no reserve,
etc.
Using fscanf, the time dropped to just under 3 seconds. I
suspect that the Linux implementation of FILE* also uses
mmap (and std::filebuf doesn't).
Finally, using a mmapbuffer, iterating with two char*, and
using stdtol to convert, the time dropped to under a second,
These tests were done very quickly (less than an hour to write
and run all of them), and are far from rigorous (and of course,
don't tell you anything about other environments), but the
differences surprised me. I didn't expect as much difference.
One possible solution would be dividing the large file into smaller chunks. Sort each chunk separately and then merge all the sorted chunks one by one.
EDIT:
Apparently this is a well-established method. See 'External merge sort' at http://en.wikipedia.org/wiki/External_sorting
something wrong with getline(), taking the words in correct but still the value of size remains 26.
I tried printing each time it takes in a character and all of them do print so itis taking in strings correctly, but not storing them?
I have attached the code below to refer
Ask me for the whole project if you need to refer what is going wrong if someplace else.
void TldPart::PreloadTLDs()
{
ifstream in(TLD_TEST_FILE);
if(in)
{
string tld;
for(int i =0; !in.eof(); i++)
{
getline(in,tld);
String myString = tld.c_str();
//cout << myString.GetLength() << endl;
for(int j=0; j<myString.GetLength();j++)
{
myString[j]=tolower(myString[j]);
}
//cout << myString << endl;
ValidTLDs.insert(pair<String,int>(myString,i));
//ValidTLDs[myString] = true; //if the map was bool
}
in.close();
cout << ValidTLDs.size(); //Printing the size //prints 26
}
}