Using a tokenizer in C++ from a file? - c++

I am working on an assignment that requires me to read in several lines of text from a file, and at the end use qsort to sort the words used alphabetically and display a count of how many times each word was used. I realized I'm going to have to tokenize the strings as they are read in from the file. The only problem is that the individual tokens kind of disappear after you do it so I have to add them to a list. I'm bad at explaining, so here's my code:
#include<iostream>
#include<string>
#include<algorithm>
#include<stdlib.h>
#include<fstream>
using namespace std;
int compare(const void* , const void*);
const int SIZE = 1000;
const int WORD_SIZE = 256;
void main()
{
cout << "This program is designed to alphabetize words entered from a file." << endl;
cout << "It will then display this list with the number of times " << endl;
cout << "that each word was entered." << endl;
cout << endl;
char *words[SIZE];//[WORD_SIZE];
char temp[100];
char *tokenPtr, *nullPtr= NULL;
char *list[SIZE];
string word;
int i = 0, b = 0;
ifstream from_file;
from_file.open("prob1.txt.txt");
if (!from_file)
{
cout << "Cannot open file - prob1.txt";
exit(1); //exits program
}
while (!from_file.eof())
{
from_file.getline(temp, 99);
tokenPtr = strtok(temp, " ");
while (tokenPtr != NULL)
{
cout << tokenPtr << '\n';
list[b] = tokenPtr;
b++;
tokenPtr = strtok(nullPtr, " ");
}
word = temp;
transform(word.begin(), word.end(), word.begin(), ::tolower);
words[i] = list[i];
i++;
}
from_file.close();
qsort(words, i, WORD_SIZE, compare);
int currentcount = 1 ;
int k;
for( int s = 0; s < i; s++ )
{
for( k = 1; k <= s; k++)
{
if( words[s] == words[k] )
{
currentcount++;
}
currentcount = 1;
words[k] = "";
}
cout << words[s] << " is listed: " << currentcount << " times." << endl;
words[s] = "";
}
}
int compare(const void* p1, const void *p2)
{
char char1, char2;
char1 = *(char *)p1; // cast from pointer to void
char2 = *(char *)p2; // to pointer to int
if(char1 < char2)
return -1;
else
if (char1 == char2)
return 0;
else
return 1;
}
The only thing missing is the compare function, but the program works fine, up until the qsort, wherein it crashes, but it doesn't tell me why. Can anybody shed some insight/help me fix this up?
Again, this IS an assignment. (I was told I need to specify this?)

The array words is an array of pointers to char:
char* words[SIZE]; // SIZE elements of type `char*`
So the third parameter WIDTH should be the width of a pointer to char.
qsort(words, i, sizeof(char*), compare);
Also your implementation of compare is not working as you expect.
You are passing pointers to the compare. But they are pointers at the elements. You need to de-reference the pointers to get the values:
int compare(const void* p1, const void *p2)
{
char const* x = *(char**)p1;
char const* y = *(char**)p2;
This does not compare strings:
if( words[s] == words[k] )
This just compares two pointers. To compare the strings they point at use strcmp()
if( strcmp(words[s], words[k]) == 0)
This should stop the crashes, but there is a lot more improvements to this code we can do:
Once you get it working you should post it here https://codereview.stackexchange.com/ for a review.

Related

Reversing the Character Case of a string

I'm stuck on a particular problem. I'm trying to take a string, and reverse the character cases in the string.
For Example: "HaVinG FuN" should flip to "hAvINg fUn."
I think it has something to do with my loop or my If/Else statements. What am I missing? All capitalized characters come out capitalized still. All lower case characters also come out capitalized as well... My other two functions are behaving correctly, but not my reverseFunct function... FYI I've omitted the other functions to try to cut-down on clutter and focus on my problem.
#include "stdafx.h"
#include <string>
#include <iostream>
#include <cctype>
#include <cstring>
using namespace std;
// Function Prototypes
void upperFunct(char *);
void lowerFunct(char *);
void reverseFunct(char *);
int main()
{
cout << "Enter a string: " << endl;
char ltrs [300];
cin.getline(ltrs, 300);
char *ptr = nullptr;
ptr = ltrs;
upperFunct(ptr);
lowerFunct(ptr);
reverseFunct(ptr);
return 0;
}
//----------------------------------//
void upperFunct(char *ltrptr)
{
int count = 0;
while (ltrptr[count] != '\0')
{
ltrptr[count] = toupper(ltrptr[count]);
count++;
}
{
cout << "---> toupper function: " << ltrptr << endl;
}
}
//------------------------------------//
void lowerFunct(char *ltrptr)
{
int count = 0;
while (ltrptr[count] != '\0')
{
ltrptr[count] = tolower(ltrptr[count]);
count++;
}
cout << "---> tolower function: " << ltrptr << endl;
}
//------------------------------------//
void reverseFunct(char *ltrptr) // <-----NOT REVERSING CHARACTERS
{
int count = 0;
while (ltrptr[count] != '\0')
{
if (isupper(ltrptr[count]))
{
ltrptr[count] = tolower(ltrptr[count]);
}
else
{
ltrptr[count] = toupper(ltrptr[count]);
}
count++;
}
cout << "---> reverse function: " << ltrptr << endl;
}
Your check for lowercase letters reads as
else if (islower(ltrptr[count]));
Notice the extra semicolon.
This semicolon terminates the if statement, and thus the succeeding conversion to uppercase is not a then-clause to this if statement but rather is executed unconditionally on every character.
Change like this
// Function Prototypes "HaVinG FuN" should flip to "hAvINg fUn."
void reverseFunct(char *);
int main()
{
//cout << "Enter a string: " << endl;
char ltrs[300] = "HaVinG FuN";
//cin.getline(ltrs, 300);
char *ptr = nullptr;
ptr = ltrs;
reverseFunct(ptr);
ptr = nullptr;
return 0;
}
void reverseFunct(char *ltrptr) // <-----NOT REVERSING CHARACTERS
{
int count = 0;
while (ltrptr[count] != '\0')
{
if (isupper(ltrptr[count]))
{
ltrptr[count] = tolower(ltrptr[count]);
}
else
{
ltrptr[count] = toupper(ltrptr[count]);
}
count++;
}
cout << "---> reverse function: " << ltrptr << endl;
}
You're writing C code. Here's a C++ way to do it:
#include <string>
#include <algorithm>
char reverse_case_char(char c) {
const auto uc = static_cast<unsigned char>(c); // Sic.
return ::isupper(uc)? ::tolower(uc): ::toupper(uc);
}
void reverse_case(std::string& str) {
std::transform(str.begin(), str.end(), str.begin(), reverse_case_char);
}
#include <cassert>
int main()
{
std::string fun = "HaVinG FuN";
reverse_case(fun);
assert(fun == "hAvINg fUn");
return 0;
}
Others have already pointed out the mistake in your code so no need to repeat that. Instead this answer will give some alternative ways of implementing the task.
Your code is more C-style than C++ style. C++ has a number of functions/features that will allow you to write this in much shorter forms.
char ltrs[300] = "HaVinG FuN";
for (auto& ch : ltrs) ch = islower(ch) ? toupper(ch) : tolower(ch);
std::cout << ltrs << std::endl;
or
char ltrs[300] = "HaVinG FuN";
std::for_each(ltrs, ltrs + strlen(ltrs), [](char& ch)
{ ch = islower(ch) ? toupper(ch) : tolower(ch); });
std::cout << ltrs << std::endl;
or using the std::string
std::string str("HaVinG FuN");
for (auto& ch : str) ch = islower(ch) ? toupper(ch) : tolower(ch);
std::cout << str << std::endl;
Using these C++ functions/features makes the program shorter, easier to understand and the risk of bugs is lower.
Thanks for the help!!! I ended up figuring out my answer, while being able to maintain my less-than elegant code that is fitting with my class. Bipll ended up giving me what I was after, something to think about in terms that my original array was being modified each time.
I realize that my solution is sloppy and not appropriate for a work environment, but it is in-line with my homework assignment, as our teacher is encouraging us to learn C++ from the ground-up, not getting too much direct answers from places like SO. So I'm glad I learned a bit from here, as well as an indirect way to help me see my issues.
I ended up making a copy of my original array, and just passing that copy to my last reversing function. I was able to use the original array for the first 2 functions because the 1st function capitalized each character in the array, while the 2nd made them all lowercase. The 3rd function, the reverse, therefore had to have access to the original array, but in the 3rd order. The easiest way for a noob like me, given where I am in the class, was to make a copy of the 1st array and use that for the 3rd function.
//Snippet of code I needed
int main()
{
int index = 0;
cout << "Enter a string: " << endl;
const int Size = 300;
char ltrs[Size];
cin.getline(ltrs, Size);
char arrayCopy[Size];
char *ptr = nullptr;
char *ptr2 = nullptr;
ptr = ltrs;
//Copy of ltrs Array
//----------------------------------//
while (ptr[index] != '\0') //
{ //
arrayCopy[index] = ptr[index]; //
index++; //
} //
arrayCopy[index] = '\0'; //
//
ptr2 = arrayCopy; //
//----------------------------------//
return 0;
}
// Function to Reverse
void reverseFunct(char *ltrptr)
{
int count = 0;
while (ltrptr[count] != '\0')
{
if (isupper(ltrptr[count]))
{
ltrptr[count] = tolower(ltrptr[count]);
}
else
{
ltrptr[count] = toupper(ltrptr[count]);
}
count++;
}
cout << "---> reverse function: " << ltrptr << endl;
}

Frequency table in C++

This is what I have so far; I am trying to have an array with probability of all chars and space in a text file, but I have a problem with the data type.
int main()
{
float x[27];
unsigned sum = 0;
struct Count {
unsigned n;
void print(unsigned index, unsigned total) {
char c = (char)index;
if (isprint(c)) cout << "'" << c << "'";
else cout << "'\\" << index << "'";
cout << " occured " << n << "/" << total << " times";
cout << ", propability is " << (double)n / total << "\n";
}
Count() : n() {}
} count[256];
ifstream myfile("C:\\text.txt"); // one \ masks the other
while (!myfile.eof()) {
char c;
myfile.get(c);
if (!myfile) break;
sum++;
count[(unsigned char)c].n++;
}
for (unsigned i = 0; i<256; i++)
{
count[i].print(i, sum);
}
x[0] = count[33];
int j=68;
for(int i=1;i<27;i++)
{
x[i]=count[j];
j++;
}
return 0;
}
#include <iostream>
#include <fstream>
#include <cctype>
using namespace std;
double probabilities[256]; // now it can be accessed by Count
int main()
{
unsigned sum = 0;
struct Count {
unsigned n;
double prob;
void print ( unsigned index, unsigned total ) {
// if ( ! n ) return;
probabilities[index] = prob = (double)n/total;
char c = (char) index;
if ( isprint(c) ) cout << "'" << c << "'";
else cout << "'\\" << index << "'";
cout<<" seen "<<n<<"/"<<total<<" times, probability is "<<prob<<endl;
}
Count(): n(), prob() {}
operator double() const { return prob; }
operator float() const { return (float)prob; }
} count[256];
ifstream myfile("C:\\text.txt"); // one \ masks the other
while(!myfile.eof()) {
char c;
myfile.get(c);
if ( !myfile ) break;
sum++;
count[(unsigned char)c].n++;
}
for ( unsigned i=0; i<256; i++ ) count[i].print(i,sum);
return 0;
}
I incorporated various changes suggested - Thanks!
Now, who finds the 4 ways to access the actual probabilities?
you are allocating a buffer with size 1000000 1 million characters.
char file[1000000] = "C:\text.txt";
This is not good as the extra values in the buffer are not guaranteed to be zero, the can be anything.
For Windows to read a file you need something like this. I will not give you the solution, you need to learn using msdn and documentation to understand this fully::
you need to include the #include <windows.h> header from the SDK first.
Look at this example here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa363778(v=vs.85).aspx
this example as appending a file to another. Your solution will be similar, instead of writing list to other file, process the buffer to increment your local variables and update the state of the table.
Do not set a large number you come up with for the buffer, as there will risk of not enough buffer space, and thus overflow. You should do like example:
read some bytes in buffer
process that buffer and increment the table
repeat until you reach end of file
while (ReadFile(hFile, buff, sizeof(buff), &dwBytesRead, NULL)
&& dwBytesRead > 0)
{
// write you logic here
}

Access violating writing location (visual studio 2008) Code based on pointers

The main problem is after sem->i = a; is used when yylex is called and c isalpha
sem->s[i] = c; doesn't work because sem->s[i] has an issue with the adress it points to.
more details:
So what i want to do is to open a txt and read what it is inside until the end of file.
If it's an alfanumeric (example: hello ,example2 hello45a) at the function yylex i put each of the characters into an array(sem->s[i]) until i find end of file or something not alfanumeric.
If it's a digit (example: 5234254 example2: 5) at the function yylex i put each of the characters into the array arithmoi[]. and after with attoi i put the number into the sem->i.
If i delete the else if(isdigit(c)) part at yylex it works(if every word in the txt doesn't start with a digit) .
Anyway the thing is that it works great when it finds only words that starts with characters. Then if it finds number(it uses the elseif(isdigit(c) part) it still works...until it finds a words starting with a character. when that happens there is an access violating writing location and the problem seems to be where i have an arrow. if you can help me i would be really thankfull.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <iostream>
using namespace std;
union SEMANTIC_INFO
{
int i;
char *s;
};
int yylex(FILE *fpointer, SEMANTIC_INFO *sem)
{
char c;
int i=0;
int j=0;
c = fgetc (fpointer);
while(c != EOF)
{
if(isalpha(c))
{
do
{
sem->s[i] = c;//the problem is here... <-------------------
c = fgetc(fpointer);
i++;
}while(isalnum(c));
return 1;
}
else if(isdigit(c))
{
char arithmoi[20];
do
{
arithmoi[j] = c;
j++;
c = fgetc(fpointer);
}while(isdigit(c));
sem->i = atoi(arithmoi); //when this is used the sem->s[i] in if(isalpha) doesn't work
return 2;
}
}
cout << "end of file" << endl;
return 0;
}
int main()
{
int i,k;
char c[20];
int counter1 = 0;
int counter2 = 0;
for(i=0; i < 20; i++)
{
c[i] = ' ';
}
SEMANTIC_INFO sematic;
SEMANTIC_INFO *sema = &sematic;
sematic.s = c;
FILE *pFile;
pFile = fopen ("piri.txt", "r");
do
{
k = yylex( pFile, sema);
if(k == 1)
{
counter1++;
cout << "it's type is alfanumeric and it's: ";
for(i=0; i<20; i++)
{
cout << sematic.s[i] << " " ;
}
cout <<endl;
for(i=0; i < 20; i++)
{
c[i] = ' ';
}
}
else if(k==2)
{
counter2++;
cout << "it's type is digit and it's: "<< sematic.i << endl;
}
}while(k != 0);
cout<<"the alfanumeric are : " << counter1 << endl;
cout<<"the digits are: " << counter2 << endl;
fclose (pFile);
system("pause");
return 0;
}
This line in main is creating an uninitialized SEMANTIC_INFO
SEMANTIC_INFO sematic;
The value of integer sematic.i is unknown.
The value of pointer sematic.s is unknown.
You then try to write to sematic.s[0]. You're hoping that sematic.s points to writable memory, large enough to hold the contents of that file, but you haven't made it point to anything.

How can I modifiy my c++ program to show a word inputted by a user, backwards using a stack?

I want to assign a pointer to every character the user inputs. Then in doing so, I probably can use a loop to store the characters and a second loop rearrange the stack order using the pointers. But I don't know how to write that in a program form, and I'm not sure if it can work. Here is what I have so far:
#include<iostream>
using namespace std;
class Stack{
public:
enum {MaxStack = 50};
void init() {top = -1;}
void push( char n ){
if ( isFull() ) {
cerr << "Full Stack. DON'T PUSH\n";
return;
}
else {
arr[ ++top ] = n;
cout << "Just pushed " << n << endl;
return;}
}
int pop() {
if (isEmpty() ) {
cerr << "\tEmpty Stack. Don't Pop\n\n";
return 1;
}
else
return arr[top--];
}
bool isEmpty() {return top < 0 ? 1 : 0;}
bool isFull() {return top >= MaxStack -1 ? top : 0;}
void dump_stack() {
cout << "The Stack contents, from top to bottom, from a stack dump are: " << endl;
for (int i = top; i >= 0; i--)
cout << "\t\t" << arr[i] << endl;
}
private:
int top;
int arr[MaxStack];
};
int main()
{
Stack a_stack;
int x = 0;
char inputchar;
cout<<"Please enter a word"<<endl;
a_stack.init();
while (inputchar != '.') //terminating char
{
cin >> inputchar;
array[x] = inputchar;
x++;
}
int j = x;
for (int i = 0; i < j; i++)
{
cout << array[x];
x--;
}
a_stack.push();
a_stack.dump_stack();
return 0;
}
A stack, by its very LIFO nature (Last In, First Out), will reverse the order of anything you put in it. Example for string "Hello":
(The top of the stack is to the left)
H push "H"
eH push "e"
leH push "l"
lleH push "l"
olleH push "o"
Now when you pop from the stack, you'll first get "o", then "l", etc. It's whatever you put in but in reverse order. You don't need to do anything special to achive that. Just push to stack in normal order, and when you pop you'll get it reversed:
// while loop
{
cin >> inputchar;
a_stack.push(inputchar);
}
// Display in reverse
while (not a_stack.isEmpty()) {
cout << (char)a_stack.pop();
}
Here's a small example program using std::stack:
(No input error checking is done here.)
#include <iostream>
#include <stack>
int main()
{
std::stack<char> st;
char c = '\0';
while (c != '.') {
c = std::cin.get();
st.push(c);
}
while (not st.empty()) {
std::cout << st.top();
st.pop();
}
std::cout << '\n';
}
Example input and output:
Hello world.
.dlrow olleH
Unless using a stack is a must (i.e. it is a homework), you might be better off with getline(), its parameter delim (cf getline) followed by a reverse loop over the array. It would be faster, cleaner, less prone to errors and basically a two-liner.

Array of Pointer and call-by-reference

I have a little problem with a few simple lines of code.
Following lines I used to call my method:
char** paras = new char*;
inputLength = charUtils::readParameterFromConsole(paras, paraCount, stringBeginningIndex);
The method looks like following:
int charUtils::readParameterFromConsole(char** &inputs, int &paraCount, int &stringBeginningIndex) {
char input[BUFFER_STRING_LENGTH];
cin.getline(input, BUFFER_STRING_LENGTH);
if(strlen(input) > 0)
{
bool stringBeginning = false;
char* part = "";
string partString = "";
for(int i = 0; i < paraCount; i++)
{
if (i == 0)
part = strtok(input, " ");
else
part = strtok(NULL, " ");
inputs[i] = part;
}
} else
{
cout << "Error! No Input!" << endl;
}
cout << &inputs[0] << endl;
cout << inputs[0] << endl;
return strlen(input);
}
In the method readParameterFromConsole are the values correct, but in the calling method they aren't correcy any longer.
I am facing that problem since I refactored the code and make an new class.
Can anyone give me an advice please?
You are passing back pointers into a stack allocated variable, input when you say inputs[i] = part, because part is a pointer into input handed back by strtok.
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
Your code as I'm writing this:
int charUtils::readParameterFromConsole(char** &inputs, int &paraCount, int &stringBeginningIndex) {
char input[BUFFER_STRING_LENGTH];
cin.getline(input, BUFFER_STRING_LENGTH);
if(strlen(input) > 0)
{
bool stringBeginning = false;
char* part = "";
string partString = "";
for(int i = 0; i < paraCount; i++)
{
if (i == 0)
part = strtok(input, " ");
else
part = strtok(NULL, " ");
inputs[i] = part;
}
} else
{
cout << "Error! No Input!" << endl;
}
cout << &inputs[0] << endl;
cout << inputs[0] << endl;
return strlen(input);
}
A main problem is that you're setting inputs[i] = pointer into local array. That array doesn't exist anymore when the function returns. Undefined behavior if you use any of those pointers.
As I understand it you want an array of "words" as a result.
That's easy to arrange (note: code untouched by compiler's hands):
#include <vector>
#include <string>
#include <sstream>
#include <stdexcept>
bool throwX( char const s[] ) { throw std::runtime_error( s ); }
typedef std::vector<std::string> StringVector;
std::string lineFromUser()
{
std::string line;
std::getline( cin, line )
|| throwX( "lineFromUser failed: std::getline failed" );
return line;
}
void getWordsOf( std::string const& s, StringVector& result )
{
std::istringstream stream( s );
std::string word;
StringVector v;
while( stream >> word )
{
v.push_back( word );
}
result.swap( v );
}
StringVector wordsOf( std::string const& s )
{
StringVector result;
getWordsOf( s, result );
return result;
}
// Some call, like
StringVector const words = wordsOf( lineFromUser() );
Again, this is off the cuff code, please just correct any syntax erors.
Cheers & hth.,