I'd like to print std::list with data more than 200,000.
To save into file is not efficient.
File is too big, so it's not easy to handle.
std::list<Type_t> list;
struct Type_t
{
int a;
char *b;
...
};
"p list" is available, but I want to partially print or save into smaller file.
Related
Getting inputs from a text file and storing it into an array but text file contains more than 20.000 strings. I'm trying to read strings from the text file and store them into a huge-sized array. How can I do that?
I can not use vectors.
Is it possible to do it without using a hash table?
Afterward, I will try to find the most frequently used words using sorting.
You requirement is to NOT use any standard container like for example a std::vector or a std::unordered_map.
In this case we need to create a dynamic container by ourself. That is not complicated. And we can use this even for storing strings. So, I will even not use std::string in my example.
I created some demo for you with ~700 lines of code.
We will first define the term "capacity". This is the number of elements that could be stored in the container. It is the currently available space. It has nothing to do, how many elements are really stored in the container.
But there is one and the most important functionality of a dynamic container. It must be able to grow. And this is always necessary, if we want to store add more elements to the container, as its capacity.
So, if we want to add an additional element at the end of the container, and if the number of elements is >= its capacity, then we need to reallocate bigger memory and then copy all the old elements to the new memory space. For such events, we will usually double the capacity. This should prevent frequent reallocations and copying activities.
Let me show you one example for a push_back function, which could be implemented like this:
template <typename T>
void DynamicArray<T>::push_back(const T& d) { // Add a new element at the end
if (numberOfElements >= capacity) { // Check, if capacity of this dynamic array is big enough
capacity *= 2; // Obviously not, we will double the capacity
T* temp = new T[capacity]; // Allocate new and more memory
for (unsigned int k = 0; k < numberOfElements; ++k)
temp[k] = data[k]; // Copy data from old memory to new memory
delete[] data; // Release old memory
data = temp; // And assign newly allocated memory to old pointer
}
data[numberOfElements++] = d; // And finally, store the given data at the end of the container
}
This is a basic approach. I use templates in order to be able to store any type in the dynamic array.
You could get rid of the templates, by deleting all template stuff and replacing "T" with your intended data type.
But, I would not do that. See, how easy we can create a "String" class. We just typedef a dynamic array for chars as "String".
using String = DynamicArray<char>;
will give us basic string functionality. And if we want to have a dynamic array of strings later, we can write:
using StringArray = DynamicArray<String>;
and this gives us a DynamicArray<DynamicArray<char>>. Cool.
For this special classes we can overwrite some operators, which will make the handling and our life even more simple.
Please look in the provided code
And, to be able to use the container in the typical C++ environment, we can add full iterator capability. That makes life even more simple.
This needs really some typing effort, but is not complicated. And, it will make life really simpler.
You also wanted to create a hash map. For counting words.
For that we will create a key/value pair. The key is the String that we defined above and the value will be the frequency counter.
We implement a hash function which should be carefully selected to work with strings, has a high entropy and give good results for the bucket size of the hash map.
The hash map itself is a dynamic container. We will also add iterator functionality to it.
For all this I drafted some 700 lines of code for you. You can take this as an example for your further studies.
It can also be easily enhanced with additional functionality.
But caveat: I did only some basic tests and I even used raw pointers for owned memory. This can be done in a schoolproject to learn some dynamic memory management, but not in reality.
Additionlly. You can replace all this code, by simply using std::string, std::vector and std::unordered_map. Nobody would use such code and reinvent the wheel.
But it may give you some ideas on how to implement similar things.
Because Stackoverlof limits the answer size to 32000 characters, I will put the main part on github.
Please click here.
I will just show you main so that you can see how easy the solution can be used:
int main() {
// Open file and check, if it could be opened
std::ifstream ifs{ "r:\\test.txt" };
if (ifs) {
// Define a dynamic array for strings
StringArray stringArray{};
// Use overwritten extraction operator and read all strings from the file to the dynamic array
ifs >> stringArray;
// Create a dynamic hash map
HashMap hm{};
// Now count the frequency of words
for (const String& s : stringArray)
hm[s]++;
// Put the resulting key/value pairs into a dynamic array
DynamicArray<Item> items(hm.begin(), hm.end());
// Sort in descending order by the frequency
std::sort(items.begin(), items.end(), [](const Item& i1, const Item& i2) { return i1.count > i2.count; });
// SHow resulton screen
for (const auto& [string, count] : items)
std::cout << std::left << std::setw(20) << string << '\t' << count << '\n';
}
else std::cerr << "\n\nError: Could not open source file\n\n";
}
You do not need to keep the whole file in memory to count frequency of words. You only need to keep a single entry and some data structure to count the frequencies, for example a std::unordered_map<std::string,unsigned>.
Not tested:
std::unordered_map<std::string,unsigned> processFileEntries(std::ifstream& file) {
std::unordered_map<std::string,unsigned> freq;
std::string word;
while ( file >> entry ) {
++freqs[entry];
}
return freq;
}
For more efficient reading or more elaborated processing you could also read chunks of the file (eg 100 words), process chunks, and then continue with the next chunk.
Assuming you're using C-Style / raw arrays you could do something like:
const size_t number_of_entries = count_entries_in_file();
//Make sure we actually have entries
assert(number_of_entries > 0);
std::string* file_entries = new std::string[number_of_entries];
//fill file_entries with the files entries
//...
//release heap memory again, so we don't create a leak
delete[] file_entries;
file_entries = nullptr;
You can use a std::map to get the frequency of each word in your text file. One example for reference is given below:
#include <iostream>
#include <map>
#include <string>
#include <sstream>
#include <fstream>
int main()
{
std::ifstream inputFile("input.txt");
std::map<std::string, unsigned> freqMap;
std::string line, word;
if(inputFile)
{
while(std::getline(inputFile, line))//go line by line
{
std::istringstream ss(line);
while(ss >> word)//go word by word
{
++freqMap[word]; //increment the count value corresponding to the word
}
}
}
else
{
std::cout << "input file cannot be opened"<<std::endl;
}
//print the frequency of each word in the file
for(auto myPair: freqMap)
{
std::cout << myPair.first << ": " << myPair.second << std::endl;
}
return 0;
}
The output of the above program can be seen here.
In my code I have something like this:
struct SomeStruct
{
int test1;
int test2;
float test3;
float test4;
};
std::vector<SomeStruct> SomeStructs;
I am looking for a way to get a part of that vector in a continues manner, So that I can access it with a pointer or a c-array.
Suppose I want a pointer to access only the part of the struct that are test2.
I want to pass that part of the vector to a C API, Is it possible?
I'm trying to avoid creating a new std::vector/c-array.
How It look's like in memory(kind of):
No, what you are asking for is impossible. Quick review:
We can clearly see that the test2 entries of the struct are not currently laid out contiguously in memory, because they are just one member of a struct that is contiguous, so clearly the other struct elements sit in between each occurence of test2.
You want them to be contiguous, so you need a different memory layout than you currently have.
You don't want to create a new vector or array, so you are stuck with your current memory layout, which is wrong.
Your options are like so:
Change the variable to be say 4 vectors of variables instead of one vector of structs.
Create a new vector when you need one.
Don't use a C API that requires contiguous memory.
For 3 it is worth noting that some C APIs, notably BLAS, supported "strided" arrays which means that there is a fixed size gap between elements, which would solve this issue for you.
There's no way to access just the contents of each test2 from a pointer or array, because the test2 members are not contiguous in memory even if the structs in the vector are. There's other data in the struct, so you need to skip over that to read each test2.
When you find yourself asking a question like this, try thinking about other data structures you can use that would make the problem easier to solve. For this case, perhaps a std::unordered_map would be a good choice.
std::unordered_map<int,SomeStruct> map;
// make a struct...
map.insert(std::make_pair<int,SomeStruct>(a_struct.test2,a_struct));
// add a bunch more structs...
// get an iterator to all the keys in the map (ie. something like a pointer to all test2 members)
key_iterator = map.begin()
To implement this in an idiomatic manner you will need to roll your own custom iterator that returns that field specifically.
Take a look at c++ how to create iterator over one field of a struct vector
C and C++ interoperability is a vastly different problem. To simplify things you may just want to implement it in C.
If i got you well, i think its possible.
struct SomeStruct
{
int test1;
int test2;
float test3;
float test4;
};
int size = 3;// whatever size
vector<SomeStruct> myStruct(size);
myStruct[0].test1 = 0;
myStruct[1].test1 = 1;
myStruct[2].test1 = 2;
/* myPtest1 will allow you to get the test1 part of myStruct in a
contiguous memory manner*/
int *myPtest1 = new int(myStruct.size());
for(int i = 0; i< myStruct.size(); i++)
myPtest1[i] = myStruct[i].test1;
// for illustration.
cout << myPtest1[0] << endl; // 0
cout << myPtest1[1] << endl; // 1
cout << myPtest1[2] << endl; // 2
You can now pass myPointer to your API, myPointer gives you access to only test1 part of myStruct vector.
you can do same for the rest of the SomeStruct attributes.
If you want to access it in a for loop you can do:
for (const auto& iterator: SomeStructs)
{
const int& test2 = iterator.test2;
// do what you need to do
}
If you need the j-th element:
const int& test2 = SomeStructs[j].test2
In this way you are not making extra copies but you are directly referencing items from vector. Remove const if you need to change the value
So I'm creating a closed-hashing hash table for a class and I have a structure
struct Example {
string key;
string data;
Example() { key = "000"; }
};
and a class which contains a member that points to a vector of structures, a constructor, and a function I'll be using to illustrate the problem.
class hash_table {
private:
vector<Example>* hash;
public:
hash_table(int size);
void dummy_method();
};
It is meant to dynamically allocate the number of structures in the vector based on user/file input.
hash_table :: hash_table ( int size=10 )
{
//initialize vector
vector<Example> * hash = new vector<Example>(size);
//test objects
for(int i=0;i<size;i++)
cout<<(*hash)[i].key<<endl;
}
the above code appears to initialize the 10 members, as it prints out "000"
ten times.
however, once I try this by calling dummy_method-
void hash_table::dummy_method() {
cout<<(*hash)[0].key<<endl;
}
I get a segmentation fault
I'm pretty sure this isn't even the correct way to do this, but I've been looking/tinkering forever and I can't seem to find a solution. I absolutely have to use a pointer to a vector of structures however, and I'm pretty sure I'm supposed to be dynamically allocating each of those structures as well (somehow). Thanks for any help.
(also, yes, we actually HAVE to use namespace, thus no std anywhere)
vector<Example> * hash = new vector<Example>(size); expression will initialize a local variable with name hash, not a hash_table::hash member (which is left uninitialized).
I am trying to find the difference in size from a struct with vector of object and struct with a vector of object pointers.
The code I have written shows that size of the both structs are the same even in theory at least based their content they should be different.
What would be the correct way of finding the correct size of a struct based on it's contents?
#include <iostream>
#include <string>
#include <vector>
using namespace std;
struct Song{
string Name;
string Artist;
};
struct Folder{
vector<Song*> list;
};
struct Library{
vector<Song> songs;
};
int main(int argc, const char * argv[]) {
Library library;
Folder favorites;
Folder recentPurchaces;
library.songs.push_back(Song{"Human After All", "Daft Punk"});
library.songs.push_back(Song{"All of my love", "Led Zepplin"});
favorites.list.push_back(&library.songs[0]);
favorites.list.push_back(&library.songs[2]);
cout << "Size of library: " << sizeof(library) << endl;
cout << "Size of favorites: " << sizeof(favorites) << endl;
return 0;
}
in theory at least based their content they should be different.
No, the sizes shouldn't be different: std::vector<T> stores the data in dynamically allocated storage, while the struct stores only the "anchor" part, consisting of a couple of pointers. The number of items inside the vectors, as well as the size of the items inside the vector itself, are not counted in determining the size of this footprint.
In order to compute the size in memory you need to write a function that adds up sizes of the individual items inside each container, and add the capacity of a container itself, times the size of a container item.
Most likely, std::vector just keeps a pointer to a dynamically allocated array of elements, and the size of that pointer is the same regardless of whether it's Song* or Song**. The allocated size of the pointed-to memory would, of course, likely be different.
Put another way, sizeof() is not a good way to measure how much memory a std::vector requires.
Why would you expect the size of the structures to be different? std::vector stores its data via dynamic allocation ("on the heap"), so there is no reason for its implementation to contain anything else than a few pointers.
Exact implementation details depend on the standard library implementation, of course, but a typical std::vector<T, Alloc> could just contain something like this:
template <class T, class Alloc = allocator<T>>
class vector
{
T *_Begin, *_End, *_EndOfCapacity;
Alloc _Allocator;
// No other data members
public:
/* ... */
};
I was given the task to program something like a dictionary, and the way I am allocating memory for the meanings is just to allocate for 100 meanings in the constructor, which works perfectly fine.
However, the professor didn't approve that and he asked me to rewrite the code in a way that I allocate memory for a relevant number of meanings. I basically have no idea how to do that, how the constructor will know in advance how many meanings I will have?
What would you guys suggest? I post just part of the code, which is relevant for the problem.
#include"expression.h"
//---------------------METHODS-------------------------------------------
Expression::Expression(int m_ctr)
{
count_meanings = m_ctr; // Set the counter to 0
meanings = new char * [100]; // Allocate memory for 100 meanings
}
Expression::~Expression()
{
delete [] meanings; // Free the allocated memory
delete [] word_with_several_meanings; // Free the allocated memory
}
void Expression::word(char *p2c)
{
word_with_several_meanings = new char[strlen(p2c)+1];
strcpy(word_with_several_meanings, p2c); // copy the string, method: DEEP copy
}
void Expression::add_meaning(char *p2c)
{
meanings[count_meanings] = new char[strlen(p2c)+1];
strcpy(meanings[count_meanings++], p2c); // copy the string, method: DEEP copy
}
char * Expression::get_word()
{
return word_with_several_meanings;
}
char * Expression::get_meaning(int n_meaning)
{
return * (meanings + n_meaning);
}
int Expression::get_total_number_of_meanings()
{
return count_meanings;
}
int main(void)
{
Expression expr;
expr.word("bank");
expr.add_meaning("a place to get money from");
expr.add_meaning("a place to sit");
cout << expr.get_word() << endl;
for(int i = 0; i<expr.get_total_number_of_meanings(); i++)
cout << " " << expr.get_meaning(i) << endl;
The C++ way of doing that is to use:
std::string to store a single string (instead of raw char* C-like strings)
std::vector to store a sequence of strings (like the "meanings" in your dictionary)
So, you can have a vector<string> data member inside your class, and you can dynamically add meanings (i.e. strings) to it, using vector::push_back().
If you - for some reason - want to stay at the raw C level, you could use a linked list data structure, storing a raw C string pointer inside each node, and when you add a new meaning, you can create a new node pointing to that string, and add that node to the linked list. A singly-linked list having a node definition like this may suffice:
struct MeaningListNode
{
char * Meaning; // Meaning raw C string
struct MeaningListNode* Next; // Pointer to next meaning node, or nullptr for last
};
But, frankly speaking, the vector<string>> approach seems much simpler and better to me.