Permanently storing a buffer value - c++

I'm trying to parse the lines of a text file and then store them inside of a vector<string>. I'm coming from a Java background and am confused on how C++ handles assigning stuff to the value of a buffer. Here is my code:
string line;
vector<string> adsList;
ifstream inputFile;
inputFile.open("test.txt");
while(getline(inputFile, line))
{
adsList.push_back(line);
}
In Java, when adding to a data structure a copy of the object is made and then that copy is inserted. In C++, my understanding is that the data structures only hold references so that any operation is very fast. What is the proper way to achieve what I want to do in C++? I have also tried the following code:
vector<string> adsList;
string line;
ifstream inputFile;
inputFile.open("test.txt");
while(getline(inputFile, line))
{
string *temp = new string;
*temp = line;
adsList.push_back(*temp);
}
With my reasoning here being that creating a new string object and storing that would preserve it after being destroyed each loop iteration. C++ seems to handle this completely opposite of Java and I am having a hard time wrapping my head around it.
edit: here is what test.txt looks like:
item1 item1 item1
item2 item2 item2
item3 item3 item3
item4 item4 item4
I'm trying to store each line as a string and then store the string inside my vector. So the front of the vector would have a string with value "item1 item1 item1".

push_back() makes a copy, so your first code sample does exactly what you want it to do. In fact, all C++ structures store copies by default. You'd have to have a container of pointers to not get copies.

Your understanding re references is incorrect - Java stored references, C++ stores whatever you ask it to , be it pointers or copies (note you can't store references in stl containers, the equivalent is pointers)
vector::push_back stores a copy of the item being stored in the vector - so you don't have to create a pointer, and new some memory on the heap in order to store the string.
(Internally, there is some heap allocation going on, but that's implementation details of std::string)
What option we do have in C++ is to rather store pointers, and these you have to heap allocate, otherwise when the current stack frame is popped off, the pointers will be pointing to defunct memory... but that is another topic.
See here for a simple working example of your code:
#include <iostream>
#include <vector>
#include <fstream>
int main()
{
std::vector<std::string> adsList;
std::string line;
std::ifstream inputFile;
inputFile.open("test.txt");
// read a line from the file - store it in 'line'
while(getline(inputFile, line))
{
// store a *copy* of line in the vector
adsList.push_back(line);
}
// for each element in adsList vector, get a *reference* (note the '&')
for (std::string& s : adsList)
{
std::cout << s << std::endl;
}
exit(0);
}

Since you didn't post the entire code, I suggest you try this to see if it is reading the file:
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
int main() {
fstream inputFile("test.txt",fstream::in);
string l;
vector<string> v;
while(getline(inputFile,l)) v.push_back(l);
//Display the content of the vector:
for(int i = 0; i < v.size(); ++i) {
cout << v[i] << endl;
}
return 0;
}

Your initial assumption is incorrect. A copy is (generally) stored in vector (ignoring move operations which were brought in with C++11). Generally, this is the way you want to be operating.
If you are truly worried about speed and want to store references (pointers, actually) to things, you'll want to utilize something like std::unique_ptr or std::shared_ptr. For example:
std::vector<std::unique_ptr<std::string>> adsList;
std::string line;
inputFile.open("test.txt");
while(std::getline(inputFile, line)) {
adsList.push_back(std::unique_ptr<std::string>(new std::string(line));
}
Generally this is only done if you must be able to modify the values in the container and have the modifications reflected to the original object - in this case you'd use a std::shared_ptr. By far the most common scenario is your first code example.

Related

Getting input from text file and storing into array but text file contains more than 20.000 strings

Getting inputs from a text file and storing it into an array but text file contains more than 20.000 strings. I'm trying to read strings from the text file and store them into a huge-sized array. How can I do that?
I can not use vectors.
Is it possible to do it without using a hash table?
Afterward, I will try to find the most frequently used words using sorting.
You requirement is to NOT use any standard container like for example a std::vector or a std::unordered_map.
In this case we need to create a dynamic container by ourself. That is not complicated. And we can use this even for storing strings. So, I will even not use std::string in my example.
I created some demo for you with ~700 lines of code.
We will first define the term "capacity". This is the number of elements that could be stored in the container. It is the currently available space. It has nothing to do, how many elements are really stored in the container.
But there is one and the most important functionality of a dynamic container. It must be able to grow. And this is always necessary, if we want to store add more elements to the container, as its capacity.
So, if we want to add an additional element at the end of the container, and if the number of elements is >= its capacity, then we need to reallocate bigger memory and then copy all the old elements to the new memory space. For such events, we will usually double the capacity. This should prevent frequent reallocations and copying activities.
Let me show you one example for a push_back function, which could be implemented like this:
template <typename T>
void DynamicArray<T>::push_back(const T& d) { // Add a new element at the end
if (numberOfElements >= capacity) { // Check, if capacity of this dynamic array is big enough
capacity *= 2; // Obviously not, we will double the capacity
T* temp = new T[capacity]; // Allocate new and more memory
for (unsigned int k = 0; k < numberOfElements; ++k)
temp[k] = data[k]; // Copy data from old memory to new memory
delete[] data; // Release old memory
data = temp; // And assign newly allocated memory to old pointer
}
data[numberOfElements++] = d; // And finally, store the given data at the end of the container
}
This is a basic approach. I use templates in order to be able to store any type in the dynamic array.
You could get rid of the templates, by deleting all template stuff and replacing "T" with your intended data type.
But, I would not do that. See, how easy we can create a "String" class. We just typedef a dynamic array for chars as "String".
using String = DynamicArray<char>;
will give us basic string functionality. And if we want to have a dynamic array of strings later, we can write:
using StringArray = DynamicArray<String>;
and this gives us a DynamicArray<DynamicArray<char>>. Cool.
For this special classes we can overwrite some operators, which will make the handling and our life even more simple.
Please look in the provided code
And, to be able to use the container in the typical C++ environment, we can add full iterator capability. That makes life even more simple.
This needs really some typing effort, but is not complicated. And, it will make life really simpler.
You also wanted to create a hash map. For counting words.
For that we will create a key/value pair. The key is the String that we defined above and the value will be the frequency counter.
We implement a hash function which should be carefully selected to work with strings, has a high entropy and give good results for the bucket size of the hash map.
The hash map itself is a dynamic container. We will also add iterator functionality to it.
For all this I drafted some 700 lines of code for you. You can take this as an example for your further studies.
It can also be easily enhanced with additional functionality.
But caveat: I did only some basic tests and I even used raw pointers for owned memory. This can be done in a schoolproject to learn some dynamic memory management, but not in reality.
Additionlly. You can replace all this code, by simply using std::string, std::vector and std::unordered_map. Nobody would use such code and reinvent the wheel.
But it may give you some ideas on how to implement similar things.
Because Stackoverlof limits the answer size to 32000 characters, I will put the main part on github.
Please click here.
I will just show you main so that you can see how easy the solution can be used:
int main() {
// Open file and check, if it could be opened
std::ifstream ifs{ "r:\\test.txt" };
if (ifs) {
// Define a dynamic array for strings
StringArray stringArray{};
// Use overwritten extraction operator and read all strings from the file to the dynamic array
ifs >> stringArray;
// Create a dynamic hash map
HashMap hm{};
// Now count the frequency of words
for (const String& s : stringArray)
hm[s]++;
// Put the resulting key/value pairs into a dynamic array
DynamicArray<Item> items(hm.begin(), hm.end());
// Sort in descending order by the frequency
std::sort(items.begin(), items.end(), [](const Item& i1, const Item& i2) { return i1.count > i2.count; });
// SHow resulton screen
for (const auto& [string, count] : items)
std::cout << std::left << std::setw(20) << string << '\t' << count << '\n';
}
else std::cerr << "\n\nError: Could not open source file\n\n";
}
You do not need to keep the whole file in memory to count frequency of words. You only need to keep a single entry and some data structure to count the frequencies, for example a std::unordered_map<std::string,unsigned>.
Not tested:
std::unordered_map<std::string,unsigned> processFileEntries(std::ifstream& file) {
std::unordered_map<std::string,unsigned> freq;
std::string word;
while ( file >> entry ) {
++freqs[entry];
}
return freq;
}
For more efficient reading or more elaborated processing you could also read chunks of the file (eg 100 words), process chunks, and then continue with the next chunk.
Assuming you're using C-Style / raw arrays you could do something like:
const size_t number_of_entries = count_entries_in_file();
//Make sure we actually have entries
assert(number_of_entries > 0);
std::string* file_entries = new std::string[number_of_entries];
//fill file_entries with the files entries
//...
//release heap memory again, so we don't create a leak
delete[] file_entries;
file_entries = nullptr;
You can use a std::map to get the frequency of each word in your text file. One example for reference is given below:
#include <iostream>
#include <map>
#include <string>
#include <sstream>
#include <fstream>
int main()
{
std::ifstream inputFile("input.txt");
std::map<std::string, unsigned> freqMap;
std::string line, word;
if(inputFile)
{
while(std::getline(inputFile, line))//go line by line
{
std::istringstream ss(line);
while(ss >> word)//go word by word
{
++freqMap[word]; //increment the count value corresponding to the word
}
}
}
else
{
std::cout << "input file cannot be opened"<<std::endl;
}
//print the frequency of each word in the file
for(auto myPair: freqMap)
{
std::cout << myPair.first << ": " << myPair.second << std::endl;
}
return 0;
}
The output of the above program can be seen here.

How to dynamically allocate memory for nested pointer struct?

I am trying to dynamically allocate memory for a nested struct that happens to be a pointer. I have written some made-up code below to try and illustrate my problem.
These two structs are found in two separate header files, also the code is under one namespace.
Tiles.h
struct Tiles
{
int* m_noOfSections;
int* m_noOfTilesInSecs;
char* m_TileName;
};
// functions omitted
Flooring.h
struct Flooring
{
Tiles* m_Tiles;
int m_noOfTiles;
char* m_FlooringName;
};
void read(Tiles&);
I am working on a function definition for Flooring.h where I have to dynamically allocate an array of Tiles, the size of the array is determined earlier on in this function from user input.
I've tried using the following code but ran into issues:
Flooring.cpp
void read(Flooring&Flr)
{
Tiles* tiles;
tiles = new Tiles[Flr.m_noOfTiles];
for (int i = 0; i < Flr.m_noOfTiles; i++) {
cout << i + 1 << ": ";
read(tiles[i]);
}
}
Note: The read(tiles[i]); [declaration: void read(Tiles&)] function assigns values to the data members of Tiles. I have tested the functions in the Tiles files and they are working as intended. So I have not included the code for those. I believe the issue lies in my Flooring.cpp implementation file.
My expectation is that that the above read function would assign values to:
tiles[i].m_noOfSections, tiles[i].m_noOfTilesinSecs, tiles[i].m_tileName
One of the issues is that I do not have a chance to input tileName, when running the code it skips the part where I would normally input a tileName.
The output would be as follows:
Enter the number of sets of tiles: Enter number of sections: [user is able
to input value here, but not before when asked to enter the number of the set of tiles]
Tiles.cpp
void read(char* tileName)
{
cout << "Enter tile name: ";
read(tileName, 15, "error message") ;
}
The function definition for the read function with three parameters can be found below. This function was pre-defined in this assignment. I have also reviewed the function and I don't see any problems with it, but I will post it regardless if it helps.
void read(char* str, int len, const char* errorMessage)
{
bool ok;
do
{
ok = true;
cin.getline(str, len+1, '\n');
if (cin.fail()) {
cin.clear();
cin.ignore(1000, '\n');
ok = false;
}
}while(!ok && cout << errorMessage);
}
I hope that is enough information, apologies if my formatting isn't adequate, or if my terminology isn't appropriate, I am still quite new to all sorts of programming. Please do let me know if I have forgotten to include some information.
new expression doesn't assign anything, it only value-initializes, if you offer those values, or calls constructor and passes arguments to it.
tiles = new Tiles[Flr.m_noOfTiles];
creates an array of Flr.m_noOfTiles class Tiles with garbage non-nullptr pointers and values. Memory for underlings are not initialized. It could be done, by offering initializer list.
tiles = new Tiles[Flr.m_noOfTiles] { value1, value2, value3 };
You have to allocate memory for every pointer. And when it's not needed, deallocate it, in proper order, from most nested to less nested structure.
This might be not a trivial task depending on operations you need and cause quite a hassle in code. It's the reason why C++ have constructors and destructors. So:
is there reason why you don't use RAII? https://en.cppreference.com/w/cpp/language/raii
Is there reason you don't use standard collections, they already implement RAII.
Is there reason why you don't use smart pointers?
I would suggest using std::vector and std::string, by which the storage of the elements will be managed automatically and thereby manual memory allocations can be avoided and you can concentrate on the implementations.
That means, have a good start with the followings:
#include <iostream>
#include <vector> // std::vector
#include <string> // std::string
struct Tiles
{
std::vector<int> m_noOfSections;
std::vector<int> m_noOfTilesInSecs;
std::string m_TileName;
};
struct Flooring
{
std::vector<Tiles> m_Tiles;
// int m_noOfTiles; // --> obsolete, as you can get the size by calling `m_Tiles.size()`
std::string m_FlooringName;
};
As a side note, in Flooring(maybe also in Tiles) it does look like, you want to map the Tiles's name to a group of Tiles. As per requirements have a look at other data structure like std::pair, std::map, std::multimap, std::unordered_map, std::unordered_multimap
what standard library provides for this scenario.

Why Does a Vector of Class Objects Delete the Objects When Adding Objects with push_back

When I ask why the vector deletes the objects I'm not referring to the mechanics of what a vector does when it's capacity is exceeded by adding elements to it. I know that once a vector reaches capacity the memory that was originally allocated is deleted and then new larger memory is allocated to accommodate the increased number of elements. For a vector of class objects that means the destructors will be called.
What I am wondering is why exactly when I try to access the members of my class I get the error "error reading characters of string" after exceeding the capacity of my class object vector with push_back().
I tried changing push_back to emplace_back() but that didn't help. I got it to work by declaring a vector of a specific size and using at() to assign values but at that point why am I even using a vector to begin with?
#include <iostream>
#include <string>
#include <vector>
#include <fstream>
#include <utility>
#include "Book.h"
using namespace std;
int main () {
string holder[5]; //used to hold strings from file.
int temp; //Used to hold number from file.
ifstream infile;
vector<pair<Book, int>> bookInfo;
infile.open(bookDatabase.txt)
//i < 8 because the vector needs to start off holding 8 book objects.
for (int i = 0; i < 8; i++) {
//Second for loop designed based on specific format of the file.
for (int j = 0; j < 5; j++)
getline(infile, holder[j]);
infile >> temp;
infile.ignore() //ignore newline;
//Call class constructor for 5 string inputs.
Book tempBook(holder[0], holder[1], holder[2], holder[3], holder[4]);
//Here is where, in the debugger, I see that the string members of my
//book class all read "error reading characters of string"
bookInfo.push_back(make_pair(tempBook, temp));
//Read in empty line that separates information from book to book.
string tempString;
getline(infile, tempString);
}
infile.close();
return 0;
}
My Book object has only string members and basic getter and setter functions as well as a function that prints the book's basic information. It has no pointer members and works perfectly.
I'm also certain that the creation of the pairs and the file input is being done correctly.
What I'm wondering is why I lose my book objects when I exceed vector capacity. I'm allowed to create a vector of class objects so why isn't there a mechanism to preserve the contents of the objects whenever the vector needs to move into bigger memory when capacity is exceeded? Or am I doing something wrong?
I figured it out and I'm gonna post my findings for posterity's sake. I apologize to the wonderful souls that were trying to help me for not posting the code to the book class but I just now got to a position where I could post code.
I'm not sure what was causing my program to mess up elsewhere but my vector issue was just me being confused. The debugger acted odd from my point of view so I assumed something was wrong but the "error reading characters from string" was from the old memory that was deleted. The members were successfully copied to a __that object (not sure if object is right) and after tackling it again my vectors worked precisely as they were supposed to.

creating large number of object pointers

I have defined a class like this:
class myClass {
private:
int count;
string name;
public:
myClass (int, string);
...
...
};
myClass::myClass(int c, string n)
{
count = c;
name = n;
}
...
...
I have also a *.txt file which in each line there is a name:
David
Jack
Peter
...
...
Now I read the file line by line and create a new object pointer for each line and store all objects in a vector. The function is like this:
vector<myClass*> myFunction (string fileName)
{
vector<myClass*> r;
myClass* obj;
ifstream infile(fileName);
string line;
int count = 0;
while (getline(infile, line))
{
obj = new myClass (count, line);
r.push_back(obj);
count++;
}
return r;
}
For small *.txt files I have no problem. However, sometimes my *.txt files contain more than 1 million lines. In these cases, the program is dramatically slow. Do you have any suggestion to make it faster?
First, find faster io than std streams.
Second, can you use string views instead of strings? They are C++17, but there are C++11 and earlier versions everywhere.
Third,
myClass::myClass(int c, string n) {
count = c;
name = n;
}
should read
myClass::myClass(int c, std::string n):
count(c),
name(std::move(n))
{}
which would make a difference for long names. None for short ones due to "small string optimization".
Forth, stop making vectors of pointers. Create vectors of values.
Fifth, failing that, find a more efficient way to allocate/deallocate the objects.
One thing you can do is directly move the string you've read from the file into the objects you're creating:
myClass::myClass(int c, string n)
: count{c}, name{std::move(n)}
{ }
You could also benchmark:
myClass::myClass(int c, string&& n)
: count{c}, name{std::move(n)}
{ }
The first version above will make a copy of line as the function is called, then let the myClass object take over the dynamically allocated buffer used for that copy. The second version (with string&& n argument), will let the myClass object rip out line's buffer directly: that means less copying of textual data but also line's likely to be stripped of any buffer as each line of the file is read in. Hopefully your allocation will normally be able to see from the input buffer how large a capacity line needs to read in the next line, and avoid any extra allocations/copying. As always, measure when you've reason to care.
You'd likely get a small win by reserving space for your vector up front, though the fact that you're storing pointers in the vector instead of storing myClass objects by value makes any vector resizing relatively cheap. Countering that, storing pointers does mean you're doing an extra dynamic allocation.
Another thing you can do is increase the stream buffer size: see pubsetbuf and the example therein.
If speed it extremely important, you should memory map the file and store pointers into the memory mapped region, instead of copying from the file stream buffer into distinct dynamically-allocated memory regions inside distinct strings. This could easily make a dramatic difference - perhaps as much as an order of magnitude - but a lot depends on the speed of your disk etc. so benchmark both if you've reason to care.

Trying to read file which contains ints and store in int vector, but I keep getting "Segmentation fault (core dumped)" error

So I made this practice file for my project to try and read a file containing integer numbers, and store them in an int vector. My problem is whenever I run the program, it will give me "Segmentation fault (core dumped)" during the call to the readFile() function.
Do not mind the extra imports, I just copy and paste the same imports to all my practice files. Also the cout << "hi" << endl; is just to see when the program has the error.
#include <string>
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cassert>
#include <vector>
using namespace std;
vector<int> readFile(string fileName);
int main()
{
vector <int> intvec = readFile("ints.txt");
cout << "hi" << endl;
for (int i = 0; i < intvec.size(); i++)
{
cout << intvec[i] << endl;
}
return 0;
}
vector<int> readFile(string fileName)
{
ifstream inFile(fileName);
vector<int> inputVec;
int i = 0;
while (!inFile.eof())
{
inFile >> inputVec[i];
i++;
}
inFile.close();
return inputVec;
}
You would want to do this
while (inFile >> i)
{
inputVec.push_back(i);
}
In your code, you define inputVec without giving it an initial capacity, so you can assume that its size is 0. So when you write inputVec[i], you're actually trying to access an index out of the vector's boundaries. It's like accessing the 5th element in an array of size 4.
By using push_back, you can add elements to the vector and it'll adjust the size dynamically.
c++ std::vectors need to be resized before adding to them. Try the "push_back" function instead, that adds them on the end and resizes the vector to fit.
BTW unlike e.g. JavaScript you can't use "vector[i] == value" to automatically resize a c/c++ array/vector.
BTW, square brackets [] operator in c arrays and std::vector is completely different to that in JavaScript. JavaScript arrays are associative maps and using array[value] causes it to create a key "value" automatically. But this is extremely slow. The square brackets operators in c/c++ don't work like that. They are a much faster direct memory access system.
If you have an array called "myArray" and you asked for myArray[10], the computer just looks at whatever is inside the computers RAM 10 addresses beyond the start of myArray (multiplied by the size of the elements, so myArray[10] would be 40 bytes past the start of an array with 4-byte values such as int or float.
It's designed for pure speed, so there's no bounds checking added to this. it's entirely up to the c/c++ programmer to ensure that you don't read or write outside the bounds with square brackets operatork, but the payoff is much faster programs. You can add your own bounds checking to your own code, or you can just be careful not to read past the range you've allocated.
Using the std::copy algorithm and iterators, a vector can be filled with the following:
#include <fstream>
#include <iterator>
#include <algorithm>
#include <vector>
int main()
{
std::vector<int> intvec;
std::ifstream ifs("ints.txt");
std::copy(std::istream_iterator<int>(ifs),
std::istream_iterator<int>(),
std::back_inserter(intvec));
}
Note: The std::back_inserter automatically calls push_back for each item in the stream.