Dynamic Array & Class Inheritance - c++

I am doing a homework assignment for my summer OO class and we need to write two classes. One is called Sale and the other is called Register. I've written my Sale class; here's the .h file:
enum ItemType {BOOK, DVD, SOFTWARE, CREDIT};
class Sale
{
public:
Sale(); // default constructor,
// sets numerical member data to 0
void MakeSale(ItemType x, double amt);
ItemType Item(); // Returns the type of item in the sale
double Price(); // Returns the price of the sale
double Tax(); // Returns the amount of tax on the sale
double Total(); // Returns the total price of the sale
void Display(); // outputs sale info
private:
double price; // price of item or amount of credit
double tax; // amount of sales tax
double total; // final price once tax is added in.
ItemType item; // transaction type
};
For the Register class we need to include a dynamic array of Sale objects in our member data.
So my two questions are:
Do I need to inherit from my Sale class into my Register class (and if so, how)?
Can I have a generic example of a dynamic array?
Edit: We cannot use vectors.

No inheritance is required. A generic example:
std::vector<Sale> sales;
Gotta love templates.

No, inheritance would not be appropriate in this case. You would want to keep track of the number of sales and the size of the array as fields in the Register class. The class definition would include this
class Register{
private:
int numSales;
int arraySize;
Sale* sales;
public:
Register();
~Register();
void MakeSale(Sale);
};
Register::Register(){
numSales = 0;
arraySize = 5;
sales = new Sale[arraySize];
}
void Register::MakeSale(Sale s){
if(numSales == arraySize){
arraySize += 5;
Sale * tempArray = new Sale[arraySize];
memcpy(tempArray, sales, numSales * sizeof(Sale));
delete [] sales;
sales = tempArray;
}
sales[numSales] = s;
++numSales;
}
Register::~Register()
{
delete [] sales;
}
This doesn't include bounds checking or whatever other stuff you need to do when you make a sale, but hopefully this should help.

If you CANNOT use vectors, then you can use a std::list. You really should use standard containers as much as possible - chances are that any home-rolled solution will be inferior. The standard library is extensively optimized and tested - do you really feel the need to make the same investment when you have better things to be doing than reinventing the wheel?
std::list should not allocate more space than necessary. However, it has some serious limitations. The fact that a vector and other forms of dynamic array is contiguous gives large performance advantages.
Not being able to use vectors seems like a very arbitrary limitation. The fact that they allocate more space than necessary is a feature, not a bug. It allows the container to amortize the expensive copy and/or move operations involved in a reallocation. An intelligent vector implementation should check for low memory situations and handle those gracefully. If it doesn't, you should submit patches to your standard library implementation or move to a new one. But imposing arbitrary constraints with no explanation, e.g.:
There should never be more than 5 unused slots in this array (i.e. the
number of allocated spaces may be at most 5 larger than the number of
slots that are actually filled with real data).
is bad practice. And where did the number FIVE come from? It's not even a power of two! If you want to hack around this limitation using std::vector::reserve, that's probably your best bet. But all that book-keeping should not need to be done.
And, agreed with everyone else: inheritance is not what you want here.

Related

Optimising usage of a frequently repopulated std::list<Class*> object in C++

OK, this is going to be a rather long question, because the issue might relate to the broader structure/organization of my code, rather than a specific line. Thanks in advance for those willing to take a look!
Model structure I have a computational biological model written in C++ which consists of multiple classes written in separate header and source files. I will try to simplify the problem as much as possible. Relevant here is that there are biol. cells (class Cell; see below if you prefer reading code), which have two compartments, A and B (i.e. two objects of class Compartment), each of which has a genome, G (i.e. single object of class Genome). Finally, the genome defines two lists of genes (pointers to Gene class). GeneList simply stores all genes that make up the genome (i.e. the essence of a genome). The second list, ExpressedGenes, contains only genes that are expressed, and this is what my question revolves around.
//simplification of Cell.hh
class Cell
{
public:
Compartment *A, *B;
};
//simplification of Compartment.hh
class Compartment
{
public:
Genome *G;
};
//simplification of Genome.hh
class Genome
{
public:
std::list<Gene*>* GeneList;
std::list<Gene*>* ExpressedGenes;
};
//simplification of Gene.cc
class Gene
{
public:
//...
int expression;
};
Model description To simulate a cell's behaviour, the core of what the model needs to do is calculate which genes become expressed the next timestep (expression set to 0 or 1), based on what genes are expressed during the current timestep. But, gene expression in say compartment A is not only determined by the genes in compartment A's genome, but also by those in compartment B's genome. We imagine that with a small probability the product of an expressed gene can be transported to the other compartment, influencing the subsequent gene expression in that compartment. Again, simplified, the entire update of a cell then appears in the following way.
//inside Cell.cc
Cell::UpdateCompartments()
{
std::list<Gene*>::iterator it;
A->G->NativeExpression(); //Create ExpressedGenes of A; see below.
B->G->NativeExpression(); //Create ExpressedGenes of B.
Transport(); //Potential movement (by splicing) of elements in ExpressedGenes of A to ExpressedGenes of B, and vice versa.
A->G->UpdateGeneExpression(); //Use ExpressedGenes of A to update expression for each gene in GeneList of A.
it = A->G->ExpressedGenes->erase(A->G->ExpressedGenes->begin(), A->G->ExpressedGenes->end()); //Erase ExpressedGenes of A, gene expression has been stored in the gene objects.
B->UpdateGeneExpression(); //Same but for B.
it = B->G->ExpressedGenes->erase(B->G->ExpressedGenes->begin(), B->G->ExpressedGenes->end());
}
//inside Genome.cc
void Genome::NativeExpression()
{
std::list<Gene*>::iterator it;
//Iterate through GeneList, if we find a gene that is expressed, store a pointer to it in ExpressedGenes.
it = GeneList->begin();
while (it != GeneList->end())
{
if ((*it)->kind==REGULATOR || (*it)->kind==EFFECTOR)
{
Gene* gene = dynamic_cast<Gene*>(*it);
if(gene->expression > 0) ExpressedGenes->push_back(gene);
}
it++;
}
}
Problem statement The problem is that the boring little piece of code above (i.e. the function NativeExpression) costs more time than anything else in the model (see results from GPROF analysis below). This seems weird to me because, ExpressedGenes being a list of pointers should not be so much overhead right? And I am just creating pointers to existing Gene objects, and removing them later... As shown by the GPROF output, time is largely wasted during the function itself (220.01s), not by any of the calls within it (i.e. iterating through the list, checking elements, adding new elements; total of 28.92s). Furthermore, I found no difference in the time profile when I replaced the dynamic_cast with a static_cast. These points lead me to believe that the inefficiency might be in the broader structure of my classes, or in how I use pointers between them. So, suggestions or other remarks will be much appreciated.
...

Avoid recomputation when data is not changed

Imagine you have a pretty big array of double and a simple function avg(double*,size_t) that computes the average value (just a simple example: both the array and the function could be whatever data structure and algorithm). I would like that if the function is called a second time and the array is not changed in the meanwhile, the return value comes directly from the previous one, without going through the unchanged data.
To hold the previous value looks simple, I just need a static variable inside the function, right? But what about detecting the changes in the array? Do I need to write an interface to access the array which sets a flag to be read by the function? Can something smarter and more portable be done?
As Kerrek SB so astutely put it, this is known as "memoization." I'll cover my personal favorite method at the end (both with double* array and the much easier DoubleArray), so you can skip to there if you just want to see code. However, there are many ways to solve this problem, and I wanted to cover them all, including those suggested by others. Skip to the horizontal rule if you just want to see code.
The first part is some theory and alternate approaches. There are fundamentally four parts to the problem:
Prove the function is idempotent (calling a function once is the same as calling it any number of times)
Cache results keyed to the inputs
Search cached results given a new set of inputs
Invalidating cached results which are no longer accurate/current
The first step is easy for you: average is idempotent. It has no side effects.
Caching the results is a fun step. You obviously are going to create some "key" for the inputs that you can compare against the cached "keys." In Kerrek SB's memoization example, the key is a tuple of all of the arguments, compared against other keys with ==. In your system, the equivalent solution would be to have the key be the contents of the entire array. This means each key comparison is O(n), which is expensive. If the function was more expensive to calculate than the average function is, this price may be acceptable. However in the case of averaging, this key is terribly expensive.
This leads one on the open-ended search for good keys. Dieter Lücking's answer was to key the array pointer. This is O(1), and wicked fast to boot. However, it also makes the assumption that once you've calculated the average for an array, that array's values never change, and that memory address is never re-used for another array. Solutions for this come later, in the invalidation portion of the task.
Another popular key is HotLick's (1) in the comments. You use a unique identifier for the array (pointer or, better yet, a unique integer idx that will never be used again) as your key. Each array then has a "dirty bit for avg" that they are expected to set to true whenever a value is changed. Caches first look for the dirty bit. If it is true, they ignore the cached value, calculate the new value, cache the new value, then clear the dirty bit indicating that the cached value is now valid. (this is really invalidation, but it fit well in this part of the answer)
This technique assumes that there are more calls to avg than updates to the data. If the array is constantly dirty, then avg still has to keep recalculating, but we still pay the price of setting the dirty bit on every write (slowing it down).
This technique also assumes that there is only one function, avg, which needs cached results. If you have many functions, it starts to get expensive to keep all of the dirty bits up to date. The solution is an "epoch" counter. Instead of a dirty bit, you have an integer, which starts at 0. Every write increments it. When you cache a result, you cache not only the identity of the array, but its epoch as well. When you check to see if you have a cached value, you also check to see if the epoch changed. If it did change, you can't prove your old results are current, and have to throw them out.
Storing the results is an interesting task. It is very easy to write a storing algorithm which uses up gobs of memory by remembering hundreds of thousands of old results to avg. Generally speaking, there needs to be a way to let the caching code know that an array has been destroyed, or a way to slowly remove old unused cache results. In the former case, the deallocator of the double arrays needs to let the cache code know that that array is being deallocated. In the latter case, it is common to limit a cache to 10 or 100 entries, and have evict old cache results.
The last piece is invalidation of caches. I spoke earlier regarding the dirty bit. The general pattern for this is that a value inside a cache must be marked invalid if the key it was stored in didn't change, but the values in the array did change. This can obviously never happen if the key is a copy of the array, but it can occur when the key is an identifing integer or a pointer.
Generally speaking, invalidation is a way to add a requirement to your caller: if you want to use avg with caching, here's the extra work you are required to do to help the caching code.
Recently I implemented a system with such caching invalidation scheme. It was very simple, and stemmed from one philosophy: the code which is calling avg is in a better position to determine if the array has changed than avg is itself.
There were two versions of the equvalent of avg: double avg(double* array, int n) and double avg(double* array, int n, CacheValidityObject& validity).
Calling the 2 argument version of avg never cached, because it had no guarantees that array had not changed.
Calling the 3 argument version of avg activated caching. The caller guarentees that, if it passes the same CacheValidityObject to avg without marking it dirty, then the arrays must be the same.
Putting the onus on the caller makes average trivial. CacheValidityObject is a very simple class to hold on to the results
class CacheValidityObject
{
public:
CacheValidityObject(); // creates a new dirty CacheValidityObject
void invalidate(); // marks this object as dirty
// this function is used only by the `avg` algorithm. "friend" may
// be used here, but this example makes it public
boost::shared_ptr<void>& getData();
private:
boost::shared_ptr<void> mData;
};
inline void CacheValidityObject::invalidate()
{
mData.reset(); // blow away any cached data
}
double avg(double* array, int n); // defined as usual
double avg(double* array, int n, CacheValidityObject& validity)
{
// this function assumes validity.mData is null or a shared_ptr to a double
boost::shared_ptr<void>& data = validity.getData();
if (data) {
// The cached result, stored on the validity object, is still valid
return *static_pointer_cast<double>(data);
} else {
// There was no cached result, or it was invalidated
double result = avg(array, n);
data = make_shared<double>(result); // cache the result
return result;
}
}
// usage
{
double data[100];
fillWithRandom(data, 100);
CacheValidityObject dataCacheValidity;
double a = avg(data, 100, dataCacheValidity); // caches the aveerage
double b = avg(data, 100, dataCacheValidity); // cache hit... uses cached result
data[0] = 0;
dataCacheValidity.invalidate();
double c = avg(data, 100, dataCacheValidity); // dirty.. caches new result
double d = avg(data, 100, dataCacheValidity); // cache hit.. uses cached result
// CacheValidityObject::~CacheValidityObject() will destroy the shared_ptr,
// freeing the memory used to cache the result
}
Advantages
Nearly the fastest caching possible (within a few opcodes)
Trivial to implement
Doesn't leak memory, saving cached values only when the caller thinks it may want to use them again
Disadvantages
Requires the caller to handle caching, instead of doing it implicitly for them.
If you wrap the double* array in a class, you can minimize the disadvantage. Assign each algorithm an index (can be done at run time) Have the DoubleArray class maintain a map of cached values. Each modification to DoubleArray invalidates the cached results. This is the most easy to use version, but doesn't work with a naked array... you need a class to help you out
class DoubleArray
{
public:
// all of the getters and setters and constructors.
// Special note: all setters MUST call invalidate()
CacheValidityObject getCache(int inIdx)
{
return mCaches[inIdx];
}
void setCache(int inIdx, const CacheValidityObject& inObj)
{
mCaches[inIdx] = inObj;
}
private:
void invalidate()
{
mCaches.clear();
}
std::map<int, CacheValidityObject> mCaches;
double* mArray;
int mSize;
};
inline int getNextAlgorithmIdx()
{
static int nextIdx = 1;
return nextIdx++;
}
static const int avgAlgorithmIdx = getNextAlgorithmIdx();
double avg(DoubleArray& inArray)
{
CacheValidityObject valid = inArray.getCache(avgAlgorithmIdx);
// use the 3 argument avg in the previous example
double result = avg(inArray.getArray(), inArray.getSize(), valid);
inArray.setCache(avgAlgorithmIdx, valid);
return result;
}
// usage
DoubleArray array(100);
fillRandom(array);
double a = avg(array); // calculates, and caches
double b = avg(array); // cache hit
array.set(0, 5); // invalidates caches
double c = avg(array); // calculates, and caches
double d = avg(array); // cache hit
#include <limits>
#include <map>
// Note: You have to manage cached results - release it with avg(p, 0)!
double avg(double* p, std::size_t n) {
typedef std::map<double*, double> map;
static map results;
map::iterator pos = results.find(p);
if(n) {
// Calculate or get a cached value
if(pos == results.end()) {
pos = results.insert(map::value_type(p, 0.5)).first; // calculate it
}
return pos->second;
}
// Erase a cached value
results.erase(pos);
return std::numeric_limits<double>::quiet_NaN();
}

Vector of Pointers to Same Class Type Design Feasibility

If I wanted a class that had a vector of pointers to other classes of the same type that could allow a cyclical cycle, how dangerous is it? For example, say I have a text file that looks like this:
city=Detroit
{
sister=Toyota
sister=Dubai
...
}
...
First the file is read into a series of temp classes, ParsedCity, where the name of the city and the names to the sister cities are held. After I have all the cities in the file I create the actual City class.
class City
{
private:
std::string name;
std::vector<City*> sisterCities;
public:
City(const std::string& aName);
CreateRelations(const ParsedCity& pcs);
std::string Name() const { return name; }
};
//If this were to represent Detroit, pc would contain a vector of strings
//containing Toyota and Dubai. Cities contain the actual classes that sister
//cities should point to. It holds all cities of the world.
City::CreateRelations(const ParsedCity& pc, std::vector<City>& cities)
{
for (unsigned int i = 0; i < pc.ParsedSisterCities().size(); i++)
{
for (unsigned int j = 0; j < cities.size(); j++)
{
if (pc.ParsedSisterCities()[i] == cities[j].Name())
{
sisterCities.push_back(&cities[j]);
break;
}
}
}
}
My worry is that if more cities are pushed into the main City vector, that the vector will re-size, relocate to somewhere else, and all my Cities will be pointing to sisterCities that are dangling pointers. At least this is my thought that will happen based on my knowledge of the vector class. If all the cities in the world and sister cities were stored in a linked-list would this solve my problems? I would like a guarantee that once a city is built it does not move (in memory. Bad pun?)
This seems like a tricky problem to me. As if I call a sister city of Detroit, I can call it's sister cities, etc. Then I could end up back at Detroit! If Topeka changes its name to Google, all of Topeka's sister cities should automatically know (as they are all pointing at the same spot in memory that Topeka is located).
Any advice is appreciated!
If you have a vector of pointers, and the vector resizes, the pointees locations in memory are not affected, so all your pointers remain valid.
The biggest problem with this solution is that any recursive algorithm you apply to your data-structure will have to have some mechanism to detect cycles, otherwise you will end up with stack-overflows due to infinite recursion.
Edit:
I just realized that I misread your question originally. If the cities-vector resizes, any pointer to its elements will become invalid. The best alternative would be store pointers to cities in that vector. To make this more manageable, I suggest you use a boost::ptr_vector. This has the benefit that pointers to cities remain valid even if you delete a city from your vector, or you reorder the cities in your vector (for instance if you want to sort them by name for faster lookup).
This seems like a tricky problem to me. As if I call a sister city of
Detroit, I can call it's sister cities, etc. Then I could end up back
at Detroit!
That's a "circular reference". This is not necessarily a problem in C++, since objects are deleted manually. However, it may introduce complications in your design, as you percieve.

How to avoid code duplication or multiple iterations?

Consider the code given below:
struct Person{
enum sex{male,female};
int salary;
};
struct PersonSSN:public Person{
int ssn;
};
I have a container which contains either Person or PersonSSN only, (known at compile time) sorted in the ascending order of salary value. I have to write a function myfunc() which does the following.
void myfunc(){
if the container contains Person:
print the number of females between two consecutive males.
else if the container contains PersonSSN:
print the number of females between two consecutive males
and
the ssn of the males.
}
I have two solutions for this problem but both have some drawbacks.
Solution 1: If I write a function for printing the number of females between males and another function for printing the ssn, I have to iterate through the data twice which is costly.
Solution 2: I can write two classes, Myfunc, and MyfuncSSN derived from Myfunc and have a virtual function process(). But then the code segment which prints the number females has to be copied from the process() method of the Myfunc class into MyfuncSSN class. Here code re-use is not there.
What is a better solution?
If you are talking about object recognition at compile time, then the answer can be only one - templates. Depending on which kind of container you use it will vary a bit, but if you use std::list it would be
#include <list>
template <typename T>
void myfunc(std::list<T>);
template <>
void myfunc(std::list<Person> lst){
print the number of females between two consecutive males.
}
template <>
void myfunc(std::list<PersonSSN> lst){
print the number of females between two consecutive males
and
the ssn of the males.
}
EDIT:
if you want to ommit double iteration the only thing i can imagine to do would be to use signgle template function for iterating and printing the number of females between two consecutive males calling another templated function for ssn printing:
#include <list>
template <typename T>
void printperson(T p){}
template <>
void printperson(Person p){
// Do nothing - perhaps you might skip it and use generic implementation instead
}
template <>
void printperson(PersonSSN p){
print ssn of the person p if it is male.
}
template <typename T>
void myfunc(std::list<T>){
print the number of females between two consecutive males.
and while doing so call printperson(list_element);
}
This might work for this simple example, but i am sure that for more complicated examples - say you want to print addionally number of males between females for PersonSSN - it might come short, as those two operations (while similar) might turn out to be impossible to separate into part with functionality for different types. Then it will need code doubling or double iteration - don't think there is way around it.
Note: you might (as suggested in comments) switch to const-references in function-args - i am more used to qt-containers which use implicite sharing and therefore dont need it.
This example is so wrong on so many different levels :)
Ideally, "Person" would be a class; "name", "sex" and "SSN" would all be members of the base class, and "process()" would be either a method() or a virtual method().
Q: Is there any chance of changing Person and PersonSSN into classes, and making "process()" a method?
Q: How does your program "know" whether it's got a "Person" record, or a "PersonSSN" record? Can you make this a parameter into your "process()" function?
ADDENDUM 9.16.2011:
The million$ question is "How can your code distinguish between a 'Person' and a 'PersonSSN'?"
If you use a class, you can use "typeof" (unsatisfactory), or you can tie the class-specific behavor to a class method (preferred, and what was suggested with the "template" suggestion).
You also need at least THREE different classes: the "Person" class (which looks and behaves like a person), a "PersonSSN" class (which has the extra data and possibly extra behavior) ... and an "ueber-class" that knows how to COUNT Persons and PersonSSN's.
So yes, I'm suggesting there should be some class that HAS, or that USES "Persons" and "PersonSSNs".
And yes, you can factor your code that one class uses "Process-count-consecutive", and another calls the parent "Process-count-consecutive", and adds a new "print ssn".

Trying to keep age/name pairs matched after sorting

I'm writing a program where the user inputs names and then ages. The program then sorts the list alphabetically and outputs the pairs. However, I'm not sure how to keep the ages matched up with the names after sorting them alphabetically. All I've got so far is...
Edit: Changed the code to this -
#include "std_lib_facilities.h"
struct People{
string name;
int age;
};
int main()
{
vector<People>nameage;
cout << "Enter name then age until done. Press enter, 0, enter to continue.:\n";
People name;
People age;
while(name != "0"){
cin >> name;
nameage.push_back(name);
cin >> age;
nameage.push_back(age);}
vector<People>::iterator i = (nameage.end()-1);
nameage.erase(i);
}
I get compiler errors for the != operator and the cin operators. Not sure what to do.
Rather than two vectors (one for names, and one for ages), have a vector of a new type that contains both:
struct Person
{
string name;
double age;
};
vector<Person> people;
edit for comments:
Keep in mind what you're now pushing onto the vector. You must push something of type Person. You can do this in a couple of ways:
Push back a default constructed person and then set the name and age fields:
people.push_back(Person());
people.back().name = name;
people.back().age = age;
Give Person a constructor that takes a name and an age, and push a Person with some values:
struct Person
{
Person(const string& name_, double age_) : name(name_), age(age_) {}
string name;
double age;
};
people.push_back(Person(name, age));
Create a Person, give it some values, and push that into the vector:
Person person;
person.name = name;
person.age = age;
people.push_back(person);
Or more simply:
Person person = { name, age };
people.push_back(person);
(thanks avakar)
In addition to the solution posted by jeje and luke, you can also insert the pairs into a map (or multimap, in case duplicate names are allowed).
assert(names.size() == ages.size());
map<string, double> people;
for (size_t i = 0; i < names.size(); ++i)
people[names[i]] = ages[i];
// The sequence [people.begin(), people.end()) is now sorted
Note that using vector<person> will be faster if you fill it up only once in advance. map will be faster if you decide to add/remove people dynamically.
You should consider putting names and ages together in structured record.
Then sort the records.
J.
You could have a vector of structs/classes, where each one has both a name and an age. When sorting, use a custom comparator that only looks at the name field.
Alternately, build an additional vector of integers [0,names.size()-1]. Sort that, with a custom comparator that instead of comparing a < b compares names[a] < names[b]. After sorting, the integer vector will give you the permutation that you can apply to both the names and ages vectors.
You either need to swap elements in both vectors at the same time (the FORTRAN way), or store a vector of structs or pairs. The later approach is more idiomatic for c-like languages.
You should use the pair<> utility template. Reference here.
G'day,
Given how you're trying to model this, my gut feeling is that you haven't approached the problem from an OO perspective. Try using a class instead of a struct.
struct's are soooo K&R! (-:
Think of a Person as an object and they have attributes that are tightly coupled, e.g. Name and Age. Maybe even address, email, Twitter, weight, height, etc.
Then add to your objects the functions that are meaningful, e.g. comparing ages, weights, etc. Writing a < operator for email addresses or Twitter id's is a bit bizarre though.
OOA is just looking at what attributes your "objects" have in real life and that gives you a good starting point for designing your objects.
To get a better idea of OOA have a look at the excellent book "Object Oriented Systems Analysis: Modeling the World in Data" by Sally Shlaer and Stephen Mellor (sanitised Amazon link). Don't faint at the Amazon price though $83.33 indeed! At least it's $0.01 second hand... (-:
HTH
cheers,