C++ saving a text file in an array - c++

So, I came up wit the following code to open a text file and save it in and use an array to print out all the text. My question is, how can I access a specific word or text in the file. If I am not mistaken there should be a for loop involved in this, but I am not quite sure how to go about doing it.
int main() {
ifstream dictionaryFile;
dictionaryFile.open("dictionary.txt");
char output[100];
//char wordsFromDictionary[40437][22];
int i=0;
if(dictionaryFile.is_open()){
while(!dictionaryFile.eof()){
dictionaryFile >> output;
cout<<output<<endl;
}
}
return 0;
}

If you want to read in some strings, the obvious choice would be to use std::strings to do the job. If you want an array of them, store them in an std::vector:
ifstream d("dictionary.txt");
std::vector<std::string> words{std::istream_iterator<std::string>(d), {}};
This reads all the words into the vector. If they're already sorted, you can then (for example) use std::binary_search to find whether a word is in the vector or not.

As you have a char array
char output[100];
there is no distinction between words in this data structure. File was simply read character by character and stored in the array.
A simple implementation of separation into words (using loops as you requested, with minimal changes to your code), assuming words are separated by a delimiter would be
char wordsFromDictionary[40437][22];
char delimiter=...
int i=0;
int j=0
char c;
if(dictionaryFile.is_open()){
while(!dictionaryFile.eof()){
c=dictionaryFile.get();
if(c==delimiter){
i++;
j=0;
}
else if(j<22) {
wordsFromDictionary[i][j]=c;
j++;
}
}
}
Note that this simply cuts, shortens the words that are longer than 22 chars.

Related

How to delimit and write file contents to vector?

So let's say I have a vector of ints and a text file which looks like this:
1|2|3|4|5
How can I add the numbers to the vector?
First, you would open the file using std::ifstream. There are a few ways you could then read these out, but one example would be to use std::getline with a custom "end of line" character, being your | in this case:
std::vector<int> myVect;
std::ifstream reader("./file.txt"); //Replace with path to your file
for(int i = 0; i < 5; i++) {
std::string item;
std::getline(reader, item, '|'); //The third argument tells it to read until a '|' char
int item = std::stoi(item); //Convert from string to int
myVect.push_back(number);
}
This example relies on you knowing how many elements you want to get, but can be modified to work with an unknown size.

Extracting a particular data from a CSV file in c++

I have written a program to read a CSV file but I'm having some trouble in extracting data from that CSV file in c++. I want to count the no. of columns starting from the 5th column in the 1st row until the last column of the 1st row of the CSV file. I have written the following code to read a CVS file, but I am not sure how shall I count the no. of columns as I have mentioned before.
Will appreciate it if anyone could please tell me how shall I go about it?
char* substring(char* source, int startIndex, int endIndex)
{
int size = endIndex - startIndex + 1;
char* s = new char[size+1];
strncpy(s, source + startIndex, size); //you can read the documentation of strncpy online
s[size] = '\0'; //make it null-terminated
return s;
}
char** readCSV(const char* csvFileName, int& csvLineCount)
{
ifstream fin(csvFileName);
if (!fin)
{
return nullptr;
}
csvLineCount = 0;
char line[1024];
while(fin.getline(line, 1024))
{
csvLineCount++;
};
char **lines = new char*[csvLineCount];
fin.clear();
fin.seekg(0, ios::beg);
for (int i=0; i<csvLineCount; i++)
{
fin.getline(line, 1024);
lines[i] = new char[strlen(line)+1];
strcpy(lines[i], line);
};
fin.close();
return lines;
}
I have attached a few lines from the CSV file:-
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,
,Afghanistan,33.0,65.0,0,0,0,0,0,0,0,
,Albania,41.1533,20.1683,0,0,0,0
What I need is, in the 1st row, the number of dates after Long.
To answer your question:
I have attached a few lines from the CSV file:-
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0
What I need is, in the 1st row, the number of dates after Long.
Yeah, not that difficult - that's how I would do it:
#include <iostream>
#include <string>
#include <fstream>
#include <regex>
#define FILENAME "test.csv" //Your filename as Macro
//(The compiler just sees text.csv instead of FILENAME)
void read(){
std::string n;
//date format pattern %m/%dd/%YY
std::regex pattern1("\\b\\d{1}[/]\\d{2}[/]\\d{2}\\b");
//date format pattern %mm/%dd/%YY
std::regex pattern2("\\b\\d{2}[/]\\d{2}[/]\\d{2}\\b");
std::smatch result1, result2;
std::ifstream file(FILENAME, std::ios::in);
if ( ! file.is_open() )
{
std::cout << "Could not open file!" << '\n';
}
do{
getline(file,n,',');
//https://en.cppreference.com/w/cpp/string/basic_string/getline
if(std::regex_search(n,result1,pattern1))
std::cout << result1.str(1) << n << std::endl;
if(std::regex_search(n,result2,pattern2))
std::cout << result2.str(1) << n << std::endl;
}
while(!file.eof());
file.close();
}
int main ()
{
read();
return 0;
}
The file test.csv contains the following for testing:
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0
Province/State,Country/Region,Lat,Long,1/25/20,12/26/20,1/27/20, ,Bfghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Blbania,41.1533,20.1683,0,0,0,0
It actually is pretty simple:
getline takes the open file and "escapes" at a so called escape-charachter,
in your case a comma ','.
(That is the very best way I found in reading csv - you can replace it with whatever you want, for example: ';' or ' ' or '...' - guess you get the drill)
After this you got all data nicely separated underneath one another without a comma.
Now you can "filter" out what you need. I use regex - but use what ever you want.
(Just fyi: For c++ tagged questions you shouldn't use c-style like strncpy..)
I gave you an example for 1.23.20 (m/dd/yy) and to make it simple if your file contains a november or december like 12.22.20 (mm/dd/yy) to make
the regex pattern more easy to read/understand in 2 lines.
you can/may have to expand the regex pattern if the data somehow matches
your date format in the file, really good explained here and not as complicated as it looks.
From that point you can put all the printed stuff f.e. in a vector (some more convenient array) to handle and/or pass/return data - that's up to you.
If you need more explaining I am happy to help you out and/or expand this example, just leave a comment.
You basically want to search for the seperator substring within your line (normally it is ';').
If you print out your lines it should look like this:
a;b;c;d;e;f;g;h
There are several ways to achieve what you want, I would look for a strip or split upon character function. Something along the example below should work. If you use std you can go with str.IndexOf instead of a loop.
int rows(char* line,char seperator, int count) {
unsigned length = strlen(line);
for (int i=pos; i<length;i++){
if(strcmp(line[i],seperator)) break;
}
count++;
if (i<length-1) return rows(substring(line,i,length-i),seperator,count);
else return count;
}
The recursion can obviously be replaced by one loop ;)
int countSign(char* line, char* sign){
unsigned l = strlen(line);
int count = 0;
for (int i=0; i < l; i++) {
if(strcmp(line[i],sign)) count++;
}
}

Writing and reading in and from a binary file in c++

I am a beginner in working with files. What I want to do in my code is to get a name from the user, and hide it in a .bmp picture. And also be able to get the name again from the file. But I want to change the characters into ASCII codes first ( that's what my assignment says)
What I tried to do is to change the name's characters to ASCII codes, and then add them to the end of the bmp picture which I'll open in binary mode. And after adding them, i want to read them from the file and be able to get the name again.
This is what I've done so far. But I am not getting a proper result. All i get is some meaningless characters. Is this code even right?
int main()
{
cout<<"Enter your name"<< endl;
char * Text= new char [20];
cin>> Text; // getting the name
int size=0;
int i=0;
while( Text[i] !='\0')
{
size++;
i++;
}
int * BText= new int [size];
for(int i=0; i<size; i++)
{
BText[i]= (int) Text[i]; // having the ASCII codes of the characters.
}
fstream MyFile;
MyFile.open("Picture.bmp, ios::in | ios::binary |ios::app");
MyFile.seekg (0, ios::end);
ifstream::pos_type End = MyFile.tellg(); //End shows the end of the file before adding anything
// adding each of the ASCII codes to the end of the file.
int j=0;
while(j<size)
{
MyFile.write(reinterpret_cast <const char *>(&BText[j]), sizeof BText[j]);
j++;
}
MyFile.close();
char * Text2= new char[size*8];
MyFile.open("Picture.bmp, ios:: in , ios:: binary");
// putting the pointer to the place where the main file ended and start reading from there.
MyFile.seekg(End);
MyFile.read(Text2,size*8);
cout<<Text2<<endl;
MyFile.close();
system("pause");
return 0;
}
Many flaws are in your code, one important is:
MyFile.open("Picture.bmp, ios::in | ios::binary |ios::app");
Must be
MyFile.open("Picture.bmp", ios::in | ios::binary |ios::app);
^ ^
| |
+-----------+
Second, use std::string instead of C-style strings:
char * Text= new char [20];
should be
std::string Text;
Also, use std::vector to make a array:
int * BText= new int [size];
should be
std::vector<int> BText(size);
And so on...
You write int (which is 32 bits) but read char (which is 8 bits).
Why not write the string as-is? There's no need to convert it to an integer array.
And also, you don't terminate the array you read into.
your write operation is incorrect, you should pass the complete text directly
MyFile.write(reinterpret_cast <const char *>(BText), sizeof (*BText));
Also, casting your string to ints and back to chars will insert spaces between your characters which you don't take into account in your reading operation

How to tokenize (words) classifying punctuation as space

Based on this question which was closed rather quickly:
Trying to create a program to read a users input then break the array into seperate words are my pointers all valid?
Rather than closing I think some extra work could have gone into helping the OP to clarify the question.
The Question:
I want to tokenize user input and store the tokens into an array of words.
I want to use punctuation (.,-) as delimiter and thus removed it from the token stream.
In C I would use strtok() to break an array into tokens and then manually build an array.
Like this:
The main Function:
char **findwords(char *str);
int main()
{
int test;
char words[100]; //an array of chars to hold the string given by the user
char **word; //pointer to a list of words
int index = 0; //index of the current word we are printing
char c;
cout << "die monster !";
//a loop to place the charecters that the user put in into the array
do
{
c = getchar();
words[index] = c;
}
while (words[index] != '\n');
word = findwords(words);
while (word[index] != 0) //loop through the list of words until the end of the list
{
printf("%s\n", word[index]); // while the words are going through the list print them out
index ++; //move on to the next word
}
//free it from the list since it was dynamically allocated
free(word);
cin >> test;
return 0;
}
The line tokenizer:
char **findwords(char *str)
{
int size = 20; //original size of the list
char *newword; //pointer to the new word from strok
int index = 0; //our current location in words
char **words = (char **)malloc(sizeof(char *) * (size +1)); //this is the actual list of words
/* Get the initial word, and pass in the original string we want strtok() *
* to work on. Here, we are seperating words based on spaces, commas, *
* periods, and dashes. IE, if they are found, a new word is created. */
newword = strtok(str, " ,.-");
while (newword != 0) //create a loop that goes through the string until it gets to the end
{
if (index == size)
{
//if the string is larger than the array increase the maximum size of the array
size += 10;
//resize the array
char **words = (char **)malloc(sizeof(char *) * (size +1));
}
//asign words to its proper value
words[index] = newword;
//get the next word in the string
newword = strtok(0, " ,.-");
//increment the index to get to the next word
++index;
}
words[index] = 0;
return words;
}
Any comments on the above code would be appreciated.
But, additionally, what is the best technique for achieving this goal in C++?
Have a look at boost tokenizer for something that's much better in a C++ context than strtok().
Already covered by a lot of questions is how to tokenize a stream in C++.
Example: How to read a file and get words in C++
But what is harder to find is how get the same functionality as strtok():
Basically strtok() allows you to split the string on a whole bunch of user defined characters, while the C++ stream only allows you to use white space as a separator. Fortunately the definition of white space is defined by the locale so we can modify the locale to treat other characters as space and this will then allow us to tokenize the stream in a more natural fashion.
#include <locale>
#include <string>
#include <sstream>
#include <iostream>
// This is my facet that will treat the ,.- as space characters and thus ignore them.
class WordSplitterFacet: public std::ctype<char>
{
public:
typedef std::ctype<char> base;
typedef base::char_type char_type;
WordSplitterFacet(std::locale const& l)
: base(table)
{
std::ctype<char> const& defaultCType = std::use_facet<std::ctype<char> >(l);
// Copy the default value from the provided locale
static char data[256];
for(int loop = 0;loop < 256;++loop) { data[loop] = loop;}
defaultCType.is(data, data+256, table);
// Modifications to default to include extra space types.
table[','] |= base::space;
table['.'] |= base::space;
table['-'] |= base::space;
}
private:
base::mask table[256];
};
We can then use this facet in a local like this:
std::ctype<char>* wordSplitter(new WordSplitterFacet(std::locale()));
<stream>.imbue(std::locale(std::locale(), wordSplitter));
The next part of your question is how would I store these words in an array. Well, in C++ you would not. You would delegate this functionality to the std::vector/std::string. By reading your code you will see that your code is doing two major things in the same part of the code.
It is managing memory.
It is tokenizing the data.
There is basic principle Separation of Concerns where your code should only try and do one of two things. It should either do resource management (memory management in this case) or it should do business logic (tokenization of the data). By separating these into different parts of the code you make the code more generally easier to use and easier to write. Fortunately in this example all the resource management is already done by the std::vector/std::string thus allowing us to concentrate on the business logic.
As has been shown many times the easy way to tokenize a stream is using operator >> and a string. This will break the stream into words. You can then use iterators to automatically loop across the stream tokenizing the stream.
std::vector<std::string> data;
for(std::istream_iterator<std::string> loop(<stream>); loop != std::istream_iterator<std::string>(); ++loop)
{
// In here loop is an iterator that has tokenized the stream using the
// operator >> (which for std::string reads one space separated word.
data.push_back(*loop);
}
If we combine this with some standard algorithms to simplify the code.
std::copy(std::istream_iterator<std::string>(<stream>), std::istream_iterator<std::string>(), std::back_inserter(data));
Now combining all the above into a single application
int main()
{
// Create the facet.
std::ctype<char>* wordSplitter(new WordSplitterFacet(std::locale()));
// Here I am using a string stream.
// But any stream can be used. Note you must imbue a stream before it is used.
// Otherwise the imbue() will silently fail.
std::stringstream teststr;
teststr.imbue(std::locale(std::locale(), wordSplitter));
// Now that it is imbued we can use it.
// If this was a file stream then you could open it here.
teststr << "This, stri,plop";
cout << "die monster !";
std::vector<std::string> data;
std::copy(std::istream_iterator<std::string>(teststr), std::istream_iterator<std::string>(), std::back_inserter(data));
// Copy the array to cout one word per line
std::copy(data.begin(), data.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}

Weird characters appear after writing to a textfile

I am currently trying to read a file, put extra backward slash () if it finds a backward slash, and write it to another file. The problem is, there are weird characters being printed inside the path.txt. I suspect that, the space characters from the file logdata is the root of this problem. Need advice how to solve this.
Here is the code:
// read a file
char str[256];
fstream file_op("C:\\logdata",ios::in);
file_op >> str;
file_op.close();
// finds the slash, and add additional slash
char newPath[MAX_PATH];
int newCount = 0;
for(int i=0; i < strlen(str); i++)
{
if(str[i] == '\\')
{
newPath[newCount++] = str[i];
}
newPath[newCount++] = str[i];
}
// write it to a different file
ofstream out("c:\\path.txt", ios::out | ios::binary);
out.write(newPath, strlen(newPath));
out.close();
Every char string in C has to end with character \0. It is an indicator that the string ends right there.
Your newPath array, after iterating through your for-loop is not correctly ended. It probably ends somewhere later, where \0 appears by accident in memory.
Try doing the following right after exiting the for-loop:
newPath[newCount]=0;
A safer way for using strings in C++, is to use std::string class over plain char arrays.
Try putting a string terminator in the buffer, after the loop :
newPath[newCount] = 0;