Improve Time Efficiency of Driver Program

Improve Time Efficiency of Driver Program - c++

Sorry that the title is vague. Essentially I am trying to approve the time (and overall) efficiency of a C++ driver program which:
Reads in a file line by line using ifstream
It is vital to my program that the lines are processed seperately, so I currently have 4 seperate calls to getline.
The program reads the string line into a vector of integers using string-stream.
Finally, it converts the vector into to a linked list of integers. Is there a way or a function that can directly read the integers from the file into the ll of integers?
Here is the driver code:
int main(int argc, char *argv[])
{
ifstream infile(argv[1]);
vector<int> vals_add;
vector<int> vals_remove;
//Driver Code
if(infile.is_open()){
string line;
int n;
getline(infile, line);
istringstream iss (line);
getline(infile, line);
istringstream iss2 (line);
while (iss2 >> n){
vals_add.push_back(n);
}
getline(infile, line);
istringstream iss3 (line);
getline(infile, line);
istringstream iss4 (line);
while (iss4 >> n){
vals_remove.push_back(n);
}
int array_add[vals_add.size()];
copy(vals_add.begin(), vals_add.end(), array_add);
int array_remove[vals_remove.size()];
copy(vals_remove.begin(), vals_remove.end(), array_remove);
Node *ptr = CnvrtVectoList(array_add, sizeof(array_add)/sizeof(int));
print(ptr);
cout << "\n";
for(int i = 0; i < vals_remove.size(); i++){
deleteNode(&ptr, vals_remove[i]);
}
print(ptr);
cout << "\n";
}
Here is a small example input:
7
6 18 5 20 48 2 97
8
3 6 9 12 28 5 7 10
Where lines 2 and 4 MUST be processed as separate lists, and lines 1 and 3 are the size of the lists (they must dynamically allocate memory so the size must remain exact to the input).

There are multiple points that can be improved.
First off, remove unnecessary code: you’re not using iss and iss3. Next, your array_add and array_remove seem to be redundant. Use the vectors directly.
If you have a rough idea of how many values you’ll read on average, reserve space in the vectors to avoid repeated resizing and copying (actually you seem to have these numbers in your input; use this information instead of throwing it away!). You can also replace your while reading loops with std::copy and std::istream_iterators.
You haven’t shown how CnvrtVectoList is implemented but in general linked lists aren’t particularly efficient to work with due to lack of locality: they throw data all over the heap. Contiguous containers (= vectors) are almost always more efficient, even when you need to remove elements in the middle. Try using a vector instead and time the performance carefully.
Lastly, can you sort the values? If so, then you can implement the deletion of values a lot more efficiently using iterative calls to std::lower_bound, or a single call to std::set_difference.
If (and only if!) the overhead is actually in the reading of the numbers from a file, restructure your IO code and don’t read lines separately (that way you’ll avoid many redundant allocations). Instead, scan directly through the input file (optionally using a buffer or memory mapping) and manually keep track of how many newline characters you’ve encountered. You can then use the strtod family of functions to scan numbers from the input read buffer.
Or, if you can assume that the input is correct, you can avoid reading separate lines by using the information provided in the file:
int add_num;
infile >> add_num;
std::copy_n(std::istream_iterator<int>(infile), std::inserter(your_list, std::end(your_list));
int del_num;
infile >> del_num;
std::vector<int> to_delete(del_num);
std::copy_n(std::istream_iterator<int>(infile), del_num, to_delete.begin());
for (auto const n : del_num) {
deleteNode(&ptr, n);
}

First of all: why do you use some custom list data structure? It's very likely that it is half-baked, i.e. doesn't have support for allocators, and thus would be much harder to adapt to perform well. Just use std::list for a doubly-linked list, or std::forward_list for a singly-linked list. Easy.
There are several requirements that you seem to imply:
The values of type T (for example: an int) are to be stored in a linked list - either std::list<T> or std::forward_list<T> (not a raw list of Nodes).
The data shouldn't be unnecessarily copied - i.e. the memory blocks shouldn't be reallocated.
The parsing should be parallelizable, although this makes sense only on fast data sources where the I/O won't dwarf CPU time.
The idea is then:
Use a custom allocator to carve memory in contiguous segments that can store multiple list nodes.
Parse the entire file into linked lists that uses the above allocator. The list will allocate memory segments on demand. A new list is started on each newline.
Return the 2nd and 4th list (i.e. lists of elements in the 2nd and 4th line).
It's worth noting that the lines that contain element counts are unnecessary. Of course, that data could be passed to the allocator to pre-allocate enough memory segments, but this disallows parallelization, since parallel parsers don't know where the element counts are - these get found only after the parallel-parsed data is reconciled. Yes, with a small modification, this parsing can be completely parallelized. How cool is that!
Let's start simple and minimal: parse the file to produce two lists. The example below uses a std::istringstream over the internally generated text view of the dataset, but parse could also be passed a std::ifstream of course.
// https://github.com/KubaO/stackoverflown/tree/master/questions/linked-list-allocator-58100610
#include <forward_list>
#include <iostream>
#include <sstream>
#include <vector>
using element_type = int;
template <typename allocator> using list_type = std::forward_list<element_type, allocator>;
template <typename allocator>
std::vector<list_type<allocator>> parse(std::istream &in, allocator alloc)
{
using list_t = list_type<allocator>;
std::vector<list_t> lists;
element_type el;
list_t *list = {};
do {
in >> el;
if (in.good()) {
if (!list) list = &lists.emplace_back(alloc);
list->push_front(std::move(el));
}
while (in.good()) {
int c = in.get();
if (!isspace(c)) {
in.unget();
break;
}
else if (c=='\n') list = {};
}
} while (in.good() && !in.eof());
for (auto &list : lists) list.reverse();
return lists;
}
And then, to test it:
const std::vector<std::vector<element_type>> test_data = {
{6, 18, 5, 20, 48, 2, 97},
{3, 6, 9, 12, 28, 5, 7, 10}
};
template <typename allocator = std::allocator<element_type>>
void test(const std::string &str, allocator alloc = {})
{
std::istringstream input{str};
auto lists = parse(input, alloc);
assert(lists.size() == 4);
lists.erase(lists.begin()+2); // remove the 3rd list
lists.erase(lists.begin()+0); // remove the 1st list
for (int i = 0; i < test_data.size(); i++)
assert(std::equal(test_data[i].begin(), test_data[i].end(), lists[i].begin()));
}
std::string generate_input()
{
std::stringstream s;
for (auto &data : test_data) {
s << data.size() << "\n";
for (const element_type &el : data) s << el << " ";
s << "\n";
}
return s.str();
}
Now, let's look at a custom allocator:
class segment_allocator_base
{
protected:
static constexpr size_t segment_size = 128;
using segment = std::vector<char>;
struct free_node {
free_node *next;
free_node() = delete;
free_node(const free_node &) = delete;
free_node &operator=(const free_node &) = delete;
free_node *stepped_by(size_t element_size, int n) const {
auto *p = const_cast<free_node*>(this);
return reinterpret_cast<free_node*>(reinterpret_cast<char*>(p) + (n * element_size));
}
};
struct segment_store {
size_t element_size;
free_node *free = {};
explicit segment_store(size_t element_size) : element_size(element_size) {}
std::forward_list<segment> segments;
};
template <typename T> static constexpr size_t size_for() {
constexpr size_t T_size = sizeof(T);
constexpr size_t element_align = std::max(alignof(free_node), alignof(T));
constexpr auto padding = T_size % element_align;
return T_size + padding;
}
struct pimpl {
std::vector<segment_store> stores;
template <typename T> segment_store &store_for() {
constexpr size_t element_size = size_for<T>();
for (auto &s : stores)
if (s.element_size == element_size) return s;
return stores.emplace_back(element_size);
}
};
std::shared_ptr<pimpl> dp{new pimpl};
};
template<typename T>
class segment_allocator : public segment_allocator_base
{
segment_store *d = {};
static constexpr size_t element_size = size_for<T>();
static free_node *advanced(free_node *p, int n) { return p->stepped_by(element_size, n); }
static free_node *&advance(free_node *&p, int n) { return (p = advanced(p, n)); }
void mark_free(free_node *free_start, size_t n)
{
auto *p = free_start;
for (; n; n--) p = (p->next = advanced(p, 1));
advanced(p, -1)->next = d->free;
d->free = free_start;
}
public:
using value_type = T;
using pointer = T*;
template <typename U> struct rebind {
using other = segment_allocator<U>;
};
segment_allocator() : d(&dp->store_for<T>()) {}
segment_allocator(segment_allocator &&o) = default;
segment_allocator(const segment_allocator &o) = default;
segment_allocator &operator=(const segment_allocator &o) {
dp = o.dp;
d = o.d;
return *this;
}
template <typename U> segment_allocator(const segment_allocator<U> &o) :
segment_allocator_base(o), d(&dp->store_for<T>()) {}
pointer allocate(const size_t n) {
if (n == 0) return {};
if (d->free) {
// look for a sufficiently long contiguous region
auto **base_ref = &d->free;
auto *base = *base_ref;
do {
auto *p = base;
for (auto need = n; need; need--) {
auto *const prev = p;
auto *const next = prev->next;
advance(p, 1);
if (need > 1 && next != p) {
base_ref = &(prev->next);
base = next;
break;
} else if (need == 1) {
*base_ref = next; // remove this region from the free list
return reinterpret_cast<pointer>(base);
}
}
} while (base);
}
// generate a new segment, guaranteed to contain enough space
size_t count = std::max(n, segment_size);
auto &segment = d->segments.emplace_front(count);
auto *const start = reinterpret_cast<free_node*>(segment.data());
if (count > n)
mark_free(advanced(start, n), count - n);
else
d->free = nullptr;
return reinterpret_cast<pointer>(start);
}
void deallocate(pointer ptr, std::size_t n) {
mark_free(reinterpret_cast<free_node*>(ptr), n);
}
using propagate_on_container_copy_assignment = std::true_type;
using propagate_on_container_move_assignment = std::true_type;
};
For the little test data we've got, the allocator will only allocate a segment... once!
To test:
int main()
{
auto test_input_str = generate_input();
std::cout << test_input_str << std::endl;
test(test_input_str);
test<segment_allocator<element_type>>(test_input_str);
return 0;
}
Parallelization would leverage the allocator above, starting multiple threads and in each invoking parse on its own allocator, each parser starting at a different point in the file. When the parsing is done, the allocators would have to merge their segment lists, so that they'd compare equal. At that point, the linked lists could be combined using usual methods. Other than thread startup overhead, the parallelization would have negligible overhead, and there'd be no data copying involved to combine the data post-parallelization. But I leave this exercise to the reader.

Related

Call C API with vector of strings from C++

In my C++ application, I use an external library that exposes a C API. Some of the C functions take arrays of strings as input and use char** for that:
void c_api_function(char** symbols, int count);
(Note: I think a pointer to const would be more appropriate, but it seems as if const correctness was not important for the library authors.)
The strings must use a specific encoding.
Currently, in order to call the API, I first convert the strings to the correct encoding and store the result in a vector<string>. Then I create a vector<char*> that can be passed to the C API:
std::string encode(std::string const& symbol);
void call_api(std::vector<std::string> const& symbols)
{
std::vector<std::string> encoded_symbols;
for (auto const& s : symbols)
{
encoded_symbols.push_back(encode(s));
}
std::vector<char*> encoded_symbol_ptrs;
for (auto const& s : encoded_symbols)
{
encoded_symbols_ptrs.push_back(s.data());
}
c_api_function(encoded_symbols_ptrs.data(), (int)encoded_symbols_ptrs.size());
}
I dont like this approach, because I need two vectors. The first vector ensures that the strings are kept alive, the second vector can be passed to the API. Is there a way that only uses a single container, but still uses automatic memory management? If necessary, I can freely change the signature of the encode function, for example, using std::unique_ptr as return value.

//
// This is how to do this with the smallest number of allocations as possible (best performance)
//
// I guess: if string contains only ascii encode has no job to do and then input == output,
// then allocation isn't necessary and input string can be used to pass to c_api_function()
// I would recommend to change the encode() to do not generate output string if encode has nothing to do
// in that case if encode() returns true (encoding was made: input != output) and output will be store in 'out'
// if encode() returns false then input == output and 'out' isn't used
//
bool encode(const std::string& symbol_in, std::string& out);
void call_api(const std::vector<std::string>& symbol_in) {
const size_t c = symbol_in.size(); // size usually have cost in operation: end - begin :)
if (c > 0x7FFFFFFF) { // good idea if you must convert from size_t to int further
throw std::overflow_error("..we have a problem here..");
}
auto it_in = symbol_in.cbegin(); // const iterator for input
auto it_end = symbol_in.cend(); // const iterator for input end
std::vector<std::string> encoded_symbols(c); // allocate array of string, but some std::string items may not be used if encode will return false
std::vector<const char*> encoded_symbols_raw(c); // array of C raw pointers
auto it_out = encoded_symbols.begin(); // iterator for std::string objects output
auto raw_out = encoded_symbols_raw.begin(); // iterator for raw output
for (; it_in != it_end; ++it_in, ++it_out, ++raw_out) {
if (encode(*it_in, *it_out)) { // if *it_out contains encoding result:
*raw_out = it_out->c_str(); // set std::string buffer as raw pointer
}
else {
*raw_out = it_in->c_str(); // no encoding needed - just pass input string buffer
}
}
c_api_function((char**)encoded_symbols_raw.data(), (int)c);
}

Since encoded_symbols only exists within your call_api() function, I would prefer to have an object that contains the all the encoded symbols and a member function which acts upon them. As an executable example:
#include <iostream>
#include <memory>
#include <string>
#include <vector>
void c_api_function(char** symbols, int count)
{
for(int x = 0; x < count; ++x)
{
std::cout << symbols[x] << '\n';
}
}
std::string encode(std::string const& symbol)
{
return symbol;
}
class EncodedSymbols
{
friend void call_api(EncodedSymbols& symbols);
friend void call_api(std::vector<std::string> const& symbols);
public:
EncodedSymbols(const EncodedSymbols& other) = delete;
EncodedSymbols(EncodedSymbols&& other) = default;
~EncodedSymbols()
{
for(int x = 0; x < number_of_encoded_symbols; ++x)
{
delete[] encoded_symbol_array[x];
}
}
static EncodedSymbols create_from(const std::vector<std::string>& symbols)
{
EncodedSymbols obj;
obj.encoded_symbol_array = std::make_unique<char*[]>(symbols.size());
obj.number_of_encoded_symbols = symbols.size();
for(int x = 0; x < symbols.size(); ++x)
{
const std::string encoded = encode(symbols[x]);
obj.encoded_symbol_array[x] = new char[encoded.length() + 1];
std::copy(encoded.begin(), encoded.end(), obj.encoded_symbol_array[x]);
obj.encoded_symbol_array[x][encoded.length()] = '\0';
}
return obj;
}
void call_api()
{
c_api_function(encoded_symbol_array.get(), number_of_encoded_symbols);
}
private:
EncodedSymbols() = default;
std::unique_ptr<char*[]> encoded_symbol_array;
int number_of_encoded_symbols;
};
void call_api(EncodedSymbols& symbols)
{
c_api_function(symbols.encoded_symbol_array.get(),
symbols.number_of_encoded_symbols);
}
void call_api(std::vector<std::string> const& symbols)
{
auto encoded = EncodedSymbols::create_from(symbols);
c_api_function(encoded.encoded_symbol_array.get(),
encoded.number_of_encoded_symbols);
}
int main()
{
std::vector<std::string> symbols{"one", "two"};
auto encoded_symbols = EncodedSymbols::create_from(symbols);
encoded_symbols.call_api();
call_api(encoded_symbols);
call_api(symbols);
return 0;
}
If there are other functions in your C library that act upon encoded symbols then (in my mind) it makes more sense to put them in a class. All the manual memory management can be hidden behind a nice interface.
If you prefer, you can also have a bare function which acts upon an EncodedSymbols instance. I have included that variant too.
As a third alternative, you could keep your current function prototype and use the EncodedSymbols type for RAII. I have shown that too in my example.

Random access to array of raw buffers of different sizes?

I have an array of array: struct chunk { char * data; size_t size; }; chunk * chunks;. The data size in each chunk is dynamic and differ between chunks. Linear access to data is easy with a nested for loop:
for (chunk * chunk_it = chunks; chunk_it != chunks + count; ++chunk_it) {
for (char * it = chunk_it->data; it != chunk_it->data + chunk_it->size; ++it) {
/* use it here */
}
}
I want to turn this into random access to chunks->data using operator[] as an interface, spanning multiple chunks.
It works by linearly searching for the right chunk, then just calculating the offset of the data I want.
template <class T>
void random_access(int n) {
chunk * c;
for (int i = 0; i < count; ++i) {
c = chunks + i;
size_t size = c->size;
if (n - size < 0) {
n -= size; // mutate n to fit into current chunk
} else {
break; // found
}
}
T * data = reinterpret_cast<T *>(c->data + n);
// use data here
}
Is there a more efficient way to do this? It would be crazy to do this every time I need a T from chunks. I plan on iterating over all chunk data linearly, but I want to use the data outside of the function, and thus need to return it at the inner loop (hence why I want to turn it inside out). I also thought of using a function pointer at the inner loop, but rather not as just doing chunk_iterator[n] is much nicer.

I understand your data structure is more complicated but could you not do something like this?
I build a contiguous block of the chunk data and record the position and size of each one in the chunks array:
class chunk_manager
{
struct chunk
{
std::size_t position;
std::size_t size;
chunk(std::size_t position, std::size_t size)
: position(position), size(size) {}
};
public:
void add_chunk(std::string const& chunk)
{
m_chunks.emplace_back(m_data.size(), chunk.size());
m_data.append(chunk);
}
char* random_access(std::size_t n) { return &m_data[n]; }
std::size_t size_in_bytes() const { return m_data.size(); }
private:
std::vector<chunk> m_chunks;
std::string m_data;
};
int main()
{
chunk_manager cm;
cm.add_chunk("abc");
cm.add_chunk("def");
cm.add_chunk("ghi");
for(auto n = 0ULL; n < cm.size_in_bytes(); ++n)
std::cout << cm.random_access(n) << '\n';
}

how to transfer a unique_ptr from a priority queue to a queue

In "short", I have a priority_queue selecting k unordered_set<int> satisfying certain condition. I'd like to return them (the hash sets) as a queue.
Since the creation and maintainence of a priority_queue expects element swap, I use a pointer instead of unordered_set<int> as the entry of the priority_queue.
Hence the return type should be queue< smart_ptr<unordered_set<int>> >.
If I use shared_ptr the code works fine, but I wish to use unique_ptr since it is more economic and the clients promise to use it as a unique_ptr.
How to use unique_ptr to implement the following code?
-------------------Detailed discription-------------------------------
I have a function to read from a file retain k lines that have sizes closest to a reference size. Say, if k=2, the reference size is 5, and the file contains 6 lines with sizes (number of integers in this line) 3,5,6,20,2,1. The k-closest lines are the two lines at size 5 and 6 respectively.
I use a priority_queue of size k with customized comparator to achieve the goal. I decide to return a queue containing the selected k-closest lines, since the clients do not want to know how a comparator is implemented (the comparator is an argument to the priority_queue template).
using ptr_type = shared_ptr<unordered_set<int>>;
// ???????????????????????????????????????
// using ptr_type = unique_ptr<unordered_set<int>>; // unique_ptr does not work
// ???????????????????????????????????????
// Is it possible to transfer unique_ptr entries from a priority_queue to a queue?
using pair_comm_type = pair<int,ptr_type>;
queue<pair_comm_type> f() {
// myFile.txt is a space separated file of integers.
// Different lines may have different lengths (number of integers)
string inputFile = "myFile.txt";
const int TOP_K_LINE = 3;
// to open a file
ifstream fin(inputFile.c_str());
string readBuffer;
// The file opened
// to define a priority_queue
// define customized compare function, such that retained lines have size
// closest to the reference value.
double referenceSize = log10(10.0);
auto comp = [&referenceSize](const pair_comm_type &LHS, const pair_comm_type &RHS)
{ return abs(log10(LHS.first)-referenceSize)
< abs(log10(RHS.first)-referenceSize); };
priority_queue<pair_comm_type, vector<pair_comm_type>, decltype(comp)> myHeap(comp);
// the priority_queue defined
int bufIntValue = -1;
int curMinArraySize = -1; // auxilliary variable, to reduce heap top access
// to read the file line by line
while (getline(fin,readBuffer)) {
// to read int in each line to a hash set
istringstream S(readBuffer);
ptr_type lineBufferPtr(new unordered_set<int>);
while (S>>bufIntValue) lineBufferPtr->insert(bufIntValue);
// one line read
// to decide retain or not based on the length of this line
int arraySize = lineBufferPtr->size();
if (myHeap.size() < TOP_K_LINE) {
// We can add new lines as long as top-k is not reached
myHeap.emplace(arraySize,std::move(lineBufferPtr));
curMinArraySize = myHeap.top().first;
continue;
}
if (arraySize <= curMinArraySize) continue;
myHeap.emplace(arraySize,std::move(lineBufferPtr));
myHeap.pop();
curMinArraySize = myHeap.top().first;
}
// all lines read
fin.close();
// to transfer values from the priority_queue to a queue
// ???????????????????????????????????????
// Is it possible that we can make changes here such that unique_ptr can also work??????
// ???????????????????????????????????????
queue<pair_comm_type> Q;
while (!myHeap.empty()) {
auto temp = myHeap.top();
myHeap.pop();
Q.emplace(temp.first,std::move(temp.second));
}
/*
while (!Q.empty()) {
printf("%d, ",Q.front().first);
Q.pop();
}
printf("\n");
*/
return Q;
}

STL containers are designed to be moved and when you do so it is just as efficient as using pointers. In fact they use pointers internally so you don't have to.
I would consider just using values like this:
using pair_comm_type = pair<int, unordered_set<int>>;
queue<pair_comm_type> f() {
string inputFile = "myFile.txt";
const int TOP_K_LINE = 3;
ifstream fin(inputFile.c_str());
string readBuffer;
double referenceSize = log10(10.0);
auto comp = [&referenceSize](const pair_comm_type &LHS, const pair_comm_type &RHS)
{ return abs(log10(LHS.first)-referenceSize)
< abs(log10(RHS.first)-referenceSize); };
priority_queue<pair_comm_type, vector<pair_comm_type>, decltype(comp)> myHeap(comp);
int bufIntValue = -1;
int curMinArraySize = -1;
while (getline(fin,readBuffer)) {
istringstream S(readBuffer);
// no need to use pointers here
unordered_set<int> lineBufferPtr;
while (S>>bufIntValue)
lineBufferPtr.insert(bufIntValue);
int arraySize = lineBufferPtr.size();
if (myHeap.size() < TOP_K_LINE) {
myHeap.emplace(arraySize,std::move(lineBufferPtr));
curMinArraySize = myHeap.top().first;
continue;
}
if (arraySize <= curMinArraySize) continue;
myHeap.emplace(arraySize,std::move(lineBufferPtr));
myHeap.pop();
curMinArraySize = myHeap.top().first;
}
fin.close();
// Use std::move to transfer the top() element which will be
// just as efficient as using pointers
queue<pair_comm_type> Q;
while (!myHeap.empty()) {
auto temp = std::move(myHeap.top()); // USE MOVES
myHeap.pop();
Q.push(std::move(temp));
}
return Q;
}

#Galik 's solution works.
As for the original question, the simple answer is NO. We can not transfer a unique_ptr out of a priority_queue.
The copy constructor of unique_ptr, of which the argument is a const reference, is deleted. The return type of priority_queue::top() is a const reference. Hence we cannot use the return value to create a new unique_ptr object.

Reading in from CSV to template vector

I've been having difficulty all week trying to get one of my projects up and running. I'm required to read in from a 10,000 line CSV file from a meteorological database and output certain fields with a few demonstrations (Max blah blah).
I'm meant to design this using a self made template vector and aren't allowed access to the STL libraries.
As i'm just learning and this has been a few weeks in the making I think i've over complicated it for myself and now i'm stuck not knowing how to progress.
The main issue here is my confusion of how i'm going to not only read into a struct and parse the information to only read in what i need but then transform that data into the template vector.
Anyway, without further ado, here is my source code:
#include <iostream>
#include <fstream>
#include "Date.h"
#include "Time.h"
#include "Vector.h"
typedef struct {
Date d;
Time t;
float speed;
} WindLogType;
int main()
{
Vector<WindLogType> windlog;
std::string temp;
std::ifstream inputFile("MetData-31-3.csv");
int timeIndex, windSpeedIndex;
//18 Elements per line
//Need the elements at index 0 & 10
while(!inputFile.eof())
{
getline(inputFile, WindLogType.d,' ');
getline(inputFile, WindLogType.t,',');
for(int i = 0; i < 9; i++)
{
getline(inputFile, temp, ',');
}
getline(inputFile, WindLogType.speed);
windlog.push_back(WindLogType);
}
return 0;
}
Vector.h
#ifndef VECTOR_H
#define VECTOR_H
template <class elemType>
class Vector
{
public:
bool isEmpty() const;
bool isFull() const;
int getLength() const;
int getMaxSize() const;
void sort();
// T* WindLogType;
Vector(int nMaxSize = 64); //Default constructor, array size of 64.
Vector(const Vector&); //Copy constructor
~Vector(); //Destructor
void push_back(int);
int operator[](int);
int at(int i);
private:
int maxSize, length;
elemType* anArray;
void alloc_new();
};
template <class elemType>
bool Vector<elemType>::isEmpty() const
{
return (length == 0);
}
template <class elemType>
bool Vector<elemType>::isFull() const
{
return (length == maxSize);
}
template <class elemType>
int Vector<elemType>::getLength() const
{
return length;
}
template <class elemType>
int Vector<elemType>::getMaxSize() const
{
return maxSize;
}
//Constructor that takes the max size of vector
template <class elemType>
Vector<elemType>::Vector(int nMaxSize)
{
maxSize = nMaxSize;
length = 0;
anArray = new elemType[maxSize];
}
//Destructor
template <class elemType>
Vector<elemType>::~Vector()
{
delete[] anArray;
}
//Sort function
template <class elemType>
void Vector<elemType>::sort()
{
int i, j;
int min;
elemType temp;
for(i = 0; i < length; i++)
{
min = i;
for(j = i+1; j<length; ++j)
{
if(anArray[j] < anArray[min])
min = j;
}
temp = anArray[i];
anArray[i] = anArray[min];
anArray[min] = temp;
}
}
//Check if vector is full, if not add the item to the vector
template <class elemType>
void Vector<elemType>::push_back(int i)
{
if(length+1 > maxSize)
alloc_new();
anArray[length]=i;
length++;
}
template <class elemType>
int Vector<elemType>::operator[](int i)
{
return anArray[i];
}
//Return the vector at position 'i'
template <class elemType>
int Vector<elemType>::at(int i)
{
if(i < length)
return anArray[i];
throw 10;
}
//If the vector is about to get full, create a new temporary
//vector of double size and copy the contents across.
template <class elemType>
void Vector<elemType>::alloc_new()
{
maxSize = length*2;
int* tmp=new int[maxSize];
for(int i = 0; i < length; i++)
tmp[i]= anArray[i];
delete[] anArray;
anArray = tmp;
}
/**
//Copy Constructor, takes a reference to a vector and copies
//the values across to a new vector.
Vector::Vector(const Vector& v)
{
maxSize= v.maxSize;
length = v.length;
anArray = new int[maxSize];
for(int i=0; i<v.length; i++)
{
anArray[i] = v.anArray[i];
}
}**/
#endif
There are some things in the vector class that are completely unnecessary, they were just from a bit of practice.
Here is a sample of the CSV file:
WAST,DP,Dta,Dts,EV,QFE,QFF,QNH,RF,RH,S,SR,ST1,ST2,ST3,ST4,Sx,T
31/03/2016 9:00,14.6,175,17,0,1013.4,1016.9,1017,0,68.2,6,512,22.7,24.1,25.5,26.1,8,20.74
31/03/2016 9:10,14.6,194,22,0.1,1013.4,1016.9,1017,0,67.2,5,565,22.7,24.1,25.5,26.1,8,20.97
31/03/2016 9:20,14.8,198,30,0.1,1013.4,1016.9,1017,0,68.2,5,574,22.7,24,25.5,26.1,8,20.92
31/03/2016 9:30,15.1,215,27,0,1013.4,1016.8,1017,0,66.6,5,623,22.6,24,25.5,26.1,8,21.63
I require the elements in the WAST column and the S column, as WAST contains the date and S contains windspeed.
By no means do i want people to give me just the solution, I need to understand how i would read in and parse this data utilizing the struct & template vector.
There's no real "error" per se, I just lack the fundamental understanding of where to go next.
Any help would be greatly appreciated!
Thankyou

One easy and efficient way would be to have a vector per column, aka column-oriented storage. Column-oriented storage minimizes space requirements and allows you to easily apply linear algebra algorithms (including SIMD optimized), whithout having to pick individual struct members (as would be the case with row-oriented storage).
You can then parse each line using fscanf, each value into a separate variable. And then push_back the variables into the corresponding columns.
As fscanf does not parse dates, you would need to extract the date string into a char[64] and then parse that into struct tm which then can be converted to time_t.
The above assumes that you know the layout of the CSV and the types of the columns.
Pseudo-code:
vector<time_t> timestamps;
vector<double> wind_speeds;
for(;;) {
// Parse the CSV line into variables.
char date_str[64 + 1];
double wind_speed;
fscanf(file, "%64[^,], ..., %lf,...", date_str, ..., &wind_speed, ...);
time_t timestamp = parse_date(date_str);
// Store the parsed variables into the vectors.
timestamps.push_back(timestamp);
wind_speed.push_back(wind_speed);
}
double average_wind_speed = std::accumulate(wind_speeds.begin(), wind_speeds.end(), 0.) / wind_speeds.size();

.csv files are a representation of a table, delimited by "," (coma) to change cell and ";" (semi-column) for the end of the line.
EDIT: In the case of the ; does not work, the usual "\n" works. The below algorithm can easily be applied with the "\n"
In fact, there are no need to create a complicate program.. just if and while are enough. Here is an idea on how to proceed, I hope it can help you to understand a method, as it is what you are requesting.
1- Read every character (store it in a char) and add it to a string (the string += the char).
1.1- If the character is a ",", increase a counter and then you compare the string to the value desired (Here WAST).
1.1.2- If the string equales the desired value, save the counter in an integer (It allows knowing the position of the column you want.)
1.1.2- If not, continue until the end of the line ";" (which means in your case the desired column does not exist) or until you have a match (your string == "WAST")
NB: You can do it with different counters so that you know WAST position, S position etc.
Then:
Initialise a new counter
2- Compare the new counter to the saved value in 1.1.2.
2.1.1- If the values match, store the char contents in a string until you have a new coma.
2.1.2- If not, read every char until you find a new coma. Then increase your counter and restart from 2.
3- Continue to read the characters until you find a semi-column ";", and restart at step 2, until you finish to read the file.
To summarise, in this case the first step it to read every columns names, until finding the one you want or arriving at the end of the line. Store its position (noticed by the "," (comas)) thanks to a counter1.
Read every other line and storing the string in the desired column position (noticed by the "," (comas)) with counter1 compared to a new counter.
It may not be the most powerful algorithm by far, but it works and is easy to understand.
I tried to avoid writting it in C so that you can understand the steps without seeing the programmed solution. I hope it suits you.

C++ Making a 2D boolean matrix

I am making a program where I have 2 vectors (clientvec and productslist) and I need to create a 2D boolean matrix where the columns is the size of productslist vector and the lines is the size of clientvec vector, but it gives me this error:
"expression must have a constant value"
Here is the code I used:
unsigned int lines = clientvec.size();
unsigned int columns = productslist.size();
bool matrixPublicity[lines][columns] = {false};
Pls help me..
Edit: I am new at c++ so assume I know nothing xD
Edit2: I already know for the answers that I cannot initialize an array with non constant values, now the question is how can I put them after initialize...

The error message is clear: :expression must have a constant value"
It means the array dimension cannot be of variable type. Only enums or pre-processor defined constants are valid.
See for more info:
Why can't I initialize a variable-sized array?
Edit: Since you mentioned you are new to C++, here is a piece of code that might help you:
#include <iostream>
#include <vector>
#include <bitset>
int main()
{
unsigned int lines = 10;
const unsigned int columns = 5;
std::vector<std::bitset<columns>> matrixPublicity;
matrixPublicity.resize(lines);
for(int i=0; i < lines; i++)
{
for(int j=0; j < columns; j++)
std::cout << matrixPublicity[i][j] <<' ';
std::cout<<'\n';
}
}
note that in this case, columns must be constant.
Edit 2: And if the size of lines are not the same, then you must stick to vector types:
typedef std::vector<bool> matrixLine;
std::vector<matrixLine> matrixPublicity;
now you can use resize method for the i-th line of the matrix, e.g.
matrixPublicity[1].resize(number_of_columns_in_line_2);

What you are trying to do would be the same as this:
std::vector<unsigned int> v1 { 1, 2, 3, 4, 5 };
std::vector<unsigned int> v2 { 6, 7, 8, 9 };
bool mat[v1.size()][v2.size()] = false;
This is how the compiler will interpret it without the temporaries and this is invalid. When you declare an array of any type its size has to be known at compile time.
bool mat[2][3] = false; // still invalid
bool mat[2][3] = { false }; // Okay
const int x = 5;
const int y = 7;
bool mat[x][y] = false; // invalid
bool mat[x][y] = { false }; // okay
// Even this is invalid
std::vector<int> v1{ 1, 2, 3 };
std::vector<int> v2{ 4, 5, 6, 7 };
const std::size_t x1 = v1.size();
const std::size_t y1 = v2.size();
bool mat2[x1][y1] = { false }; // Still won't compile.
Value to declare an array must be a constant expression.

Instead of making an array as you have tried to do, you could make a class template that will construct a matrix like object for you. Here is what I have come up with, now the overall design or pattern of this template will fit your condition but the actual implementation to generate the internal matrix will depend on your data and what you intend.
#include <vector>
#include <iostream>
#include <conio.h>
template <class T, class U>
class Matrix {
private:
std::vector<T> m_lines;
std::vector<T> m_cols;
std::vector<U> m_mat;
std::size_t m_size;
std::size_t m_lineCount;
std::size_t m_colsCount;
public:
Matrix() {};
Matrix( const std::vector<T>& lines, const std::vector<T>& cols ) :
m_lines(lines),
m_cols(cols),
m_lineCount( lines.size() ),
m_colsCount( cols.size() )
{
addVectors( lines, cols );
}
void addVectors( const std::vector<T>& v1, const std::vector<T>& v2 ) {
m_lines = v1;
m_cols = v2;
m_lineCount = m_lines.size();
m_colsCount = m_cols.size();
for ( unsigned int i = 0; i < m_lineCount; ++i ) {
for ( unsigned int j = 0; j < m_colsCount); j++ ) {
// This will depend on your implementation and how you
// construct this matrix based off of your existing containers
m_mat.push_back(m_lines[i] & m_cols[j]);
}
}
m_size = m_mat.size();
}
std::size_t size() const { return m_size; }
std::size_t sizeRows() const { return m_lineCount; }
std::size_t sizelColumns() const { return m_colsCount; }
std::vector<U>& getMatrix() const { return m_mat; }
std::vector<T>& getLines() const { return m_lines; }
std::vector<T>& getColumns() const { return m_columns; }
bool operator[]( std::size_t idx ) { return m_mat[idx]; }
const bool& operator[]( std::size_t idx ) const { return m_mat[idx]; }
};
int main() {
std::vector<unsigned> v1{ 1, 0, 1, 1, 0 };
std::vector<unsigned> v2{ 0, 1, 1, 1, 0 };
Matrix<unsigned, bool> mat1( v1, v2 );
int line = 0;
for ( unsigned u = 0; u < mat1.size(); ++u ) {
line++;
std::cout << mat1[u] << " ";
if ( line == mat1.sizeRows() ) {
std::cout << "\n";
line = 0;
}
}
std::cout << "\nPress any key to quit.\n" << std::endl;
_getch();
return 0;
}
Output
0 1 1 1 0
0 0 0 0 0
0 1 1 1 0
0 1 1 1 0
0 0 0 0 0
With this template class you can create a matrix of any type U by passing in two vectors for type T. Now how you construct the matrix will be implementation dependent. But this class is reusable for different types.
You could have two vectors of type doubles, and construct a matrix of unsigned chars, or you could have two vectors of user defined class or struct types and generate a matrix of unsigned values. This may help you out in many situations.
Note: - This does generate a compiler warning, no errors though and it prints and displays properly, but the compiler warning generated by MSVS 2015 is warning C4800: unsigned int: forcing value to bool true or false (performance warning)
This is generated for I am doing a bit wise & operation on to unsigned values; but that is why I set my initial vectors to be passed to this class template's constructor to have all 1s & 0s as this is meant for demonstration only.
EDIT - I made an edit to the class because I noticed I had a default constructor and had no way to add vectors to it, so I added an extra member variable, and an addVectors function, and moved the implementation from the defined constructor to the new function and just ended up calling that function in the defined constructor.

Creating an array isn't that difficult :)
A matrix (2D/3D/...-array) is unfortunately a little bit different if you want to do it your way!
But first of all you should know about the stack and the heap!
Lets have a look at these 2:
Stack:
A stack variable/array/matrix/... is only valid between the nearest 2 -> {} <- which you normally call a "codeblock". The size of it was defined during the "compile time" (the time where the compiler translates your code into the machine language). That means the size of your array needs to be set.
Example:
#include <iostream>
#define MACRO 128
int arraySize(int size){
std::cin >> size;
return size;
}
int main() {
//this is valid
int intArray[128] = {}; //the size(here: 128) needs to be a number like
//or a macro like 'MACRO' which is
//compile-time-only as well
//this is valid
int intArray2[MACRO] = {};
//this is not valid!
int intArray[size()] = {};
return 0;
}
Heap:
A heap variable/array/matrix/... is valid until you delete it. That also means that a heap var is created during the run-time(from starting your program until you close/stop it)! This is allows you to define it's size.
Example:
#include <iostream>
#define MACRO 128
int arraySize(int size){
return size;
}
int main() {
//this is valid
int intArray[128] = {}; //the size(here: 128) needs to be a number like
//or a macro like 'MACRO' whic is
//compile-time-only as well
//this is valid
int intArray2[MACRO] = {};
//creating an array with a non-static size
//works like this:
//int can also be a 'bool'
int* intArray = new int[arraySize()];
// ^ the star means you are pointing to
//an adress inside of your memory which has
//the size of an int (per element)
//That's why they are called "pointers"!
//Right now it points to the beginning of the
//array.
// ^ the keyword "new" says that
//you are allocating memory on the heap.
// ^
//then you have to say which kind of array
//it is which is the same you gave the pointer
// ^
//now you give it the size of that array
//this time it can be return value or the size
//of a variable
//as I mentioned...you have to delete this array on your own
//if you dont do that your program will crash
//maybe not after starting but it will!
//SO NEVER NEVER NEVER... forget about it
delete intArray[];
//^ write delete
// ^
//then the name of your array
// ^
//at the end of it write these 2 brackets
//thex say you wanna remove the whole array!
//why? because you can also create/delete
//heap variables not only arrays.
return 0;
}
Creating a matrix on the heap is unfortunately not that easy.
But it is essential to know how a 1D-array works before going to further dimensions! That's why I did this tutorial!
Klick here to see how to create a matrix on the heap
Klick here to learn more about the heap
Klick here to choose the best result of this theme
I hope I could help you :)!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Improve Time Efficiency of Driver Program - c++

Related

Call C API with vector of strings from C++

Random access to array of raw buffers of different sizes?

how to transfer a unique_ptr from a priority queue to a queue

Reading in from CSV to template vector

C++ Making a 2D boolean matrix

Categories

Resources