I have a dynamic programming algorithm for Knapsack in C++. When it was implemented as a function and accessing variables passed into it, it was taking 22 seconds to run on a particular instance. When I made it the member function of my class KnapsackInstance and had it use variables that were data members of that class, it started taking 37 seconds to run. As far as I know, only accessing member functions goes through the vtable so I'm at a loss to explain what might be happening.
Here's the code of the function
int KnapsackInstance::dpSolve() {
int i; // Current item number
int d; // Current weight
int * tbl; // Array of size weightLeft
int toret;
tbl = new int[weightLeft+1];
if (!tbl) return -1;
memset(tbl, 0, (weightLeft+1)*sizeof(int));
for (i = 1; i <= numItems; ++i) {
for (d = weightLeft; d >= 0; --d) {
if (profitsWeights.at(i-1).second <= d) {
/* Either add this item or don't */
int v1 = profitsWeights.at(i-1).first + tbl[d-profitsWeights.at(i-1).second];
int v2 = tbl[d];
tbl[d] = (v1 < v2 ? v2 : v1);
}
}
}
toret = tbl[weightLeft];
delete[] tbl;
return toret;
}
tbl is one column of the DP table. We start from the first column and go on until the last column. The profitsWeights variable is a vector of pairs, the first element of which is the profit and the second the weight. toret is the value to return.
Here is the code of the original function :-
int dpSolve(vector<pair<int, int> > profitsWeights, int weightLeft, int numItems) {
int i; // Current item number
int d; // Current weight
int * tbl; // Array of size weightLeft
int toret;
tbl = new int[weightLeft+1];
if (!tbl) return -1;
memset(tbl, 0, (weightLeft+1)*sizeof(int));
for (i = 1; i <= numItems; ++i) {
for (d = weightLeft; d >= 0; --d) {
if (profitsWeights.at(i-1).second <= d) {
/* Either add this item or don't */
int v1 = profitsWeights.at(i-1).first + tbl[d-profitsWeights.at(i-1).second];
int v2 = tbl[d];
tbl[d] = (v1 < v2 ? v2 : v1);
}
}
}
toret = tbl[weightLeft];
delete[] tbl;
return toret;
}
This was run on Debian Lenny with g++-4.3.2 and -O3 -DNDEBUG turned on
Thanks
In a typical implementation, a member function receives a pointer to the instance data as a hidden parameter (this). As such, access to member data is normally via a pointer, which may account for the slow-down you're seeing.
On the other hand, it's hard to do more than guess with only one version of the code to look at.
After looking at both pieces of code, I think I'd write the member function more like this:
int KnapsackInstance::dpSolve() {
std::vector<int> tbl(weightLeft+1, 0);
std::vector<pair<int, int> > weights(profitWeights);
int v1;
for (int i = 0; i <numItems; ++i)
for (int d = weightLeft; d >= 0; --d)
if ((weights[i+1].second <= d) &&
((v1 = weights[i].first + tbl[d-weights[i-1].second])>tbl[d]))
tbl[d] = v1;
return tbl[weightLeft];
}
Related
The task (from a Bulgarian judge, click on "Език" to change it to English):
I am given the size of the first (S1 = A) of N corals. The size of every subsequent coral (Si, where i > 1) is calculated using the formula (B*Si-1 + C)%D, where A, B, C and D are some constants. I am told that Nemo is nearby the Kth coral (when the sizes of all corals are sorted in ascending order).
What is the size of the above-mentioned Kth coral ?
I will have T tests and for every one of them I will be given N, K, A, B, C and D and prompted to output the size of the Kth coral.
The requirements:
1 ≤ T ≤ 3
1 ≤ K ≤ N ≤ 107
0 ≤ A < D ≤ 1018
1 ≤ C, B*D ≤ 1018
Memory available is 64 MB
Time limit is 1.9 sec
The problem I have:
For the worst case scenario I will need 107*8B which is 76 MB.
The solution If the memory available was at least 80 MB would be:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
std::vector<biggie>::iterator it_ans;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
std::vector<biggie> lut{ a };
lut.reserve(n);
for (j = 1; j != n; ++j) {
lut.emplace_back((b * lut.back() + c) % d);
}
it_ans = std::next(lut.begin(), k - 1);
std::nth_element(lut.begin(), it_ans, lut.end());
std::cout << *it_ans << '\n';
}
return 0;
}
Question 1: How can I approach this CP task given the requirements listed above ?
Question 2: Is it somehow possible to use std::nth_element to solve it since I am not able to store all N elements ? I mean using std::nth_element in a sliding window technique (If this is possible).
# Christian Sloper
#include <iostream>
#include <queue>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j, j_lim;
biggie a, b, c, d, prev, curr;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
if (k < n - k + 1) {
std::priority_queue<biggie, std::vector<biggie>, std::less<biggie>> q;
q.push(a);
prev = a;
for (j = 1; j != k; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr < q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
else {
std::priority_queue<biggie, std::vector<biggie>, std::greater<biggie>> q;
q.push(a);
prev = a;
for (j = 1, j_lim = n - k + 1; j != j_lim; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr > q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
}
return 0;
}
This gets accepted (Succeeds all 40 tests. Largest time 1.4 seconds, for a test with T=3 and D≤10^9. Largest time for a test with larger D (and thus T=1) is 0.7 seconds.).
#include <iostream>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
biggie prefix = 0;
for (int shift = d > 1000000000 ? 40 : 20; shift >= 0; shift -= 20) {
biggie prefix_mask = ((biggie(1) << (40 - shift)) - 1) << (shift + 20);
int count[1 << 20] = {0};
biggie s = a;
int rank = 0;
for (j = 0; j != n; ++j) {
biggie s_vs_prefix = s & prefix_mask;
if (s_vs_prefix < prefix)
++rank;
else if (s_vs_prefix == prefix)
++count[(s >> shift) & ((1 << 20) - 1)];
s = (b * s + c) % d;
}
int i = -1;
while (rank < k)
rank += count[++i];
prefix |= biggie(i) << shift;
}
std::cout << prefix << '\n';
}
return 0;
}
The result is a 60 bits number. I first determine the high 20 bits with one pass through the numbers, then the middle 20 bits in another pass, then the low 20 bits in another.
For the high 20 bits, generate all the numbers and count how often each high 20 bits pattern occurrs. After that, add up the counts until you reach K. The pattern where you reach K, that pattern covers the K-th largest number. In other words, that's the result's high 20 bits.
The middle and low 20 bits are computed similarly, except we take the by then known prefix (the high 20 bits or high+middle 40 bits) into account. As a little optimization, when D is small, I skip computing the high 20 bits. That got me from 2.1 seconds down to 1.4 seconds.
This solution is like user3386109 described, except with bucket size 2^20 instead of 10^6 so I can use bit operations instead of divisions and think of bit patterns instead of ranges.
For the memory constraint you hit:
(B*Si-1 + C)%D
requires only the value (Si-2) before itself. So you can compute them in pairs, to use only 1/2 of total you need. This only needs indexing even values and iterating once for odd values. So you can just use half-length LUT and compute the odd value in-flight. Modern CPUs are fast enough to do extra calculations like these.
std::vector<biggie> lut{ a_i,a_i_2,a_i_4,... };
a_i_3=computeOddFromEven(lut[1]);
You can make a longer stride like 4,8 too. If dataset is large, RAM latency is big. So it's like having checkpoints in whole data search space to balance between memory and core usage. 1000-distance checkpoints would put a lot of cpu cycles into re-calculations but then the array would fit CPU's L2/L1 cache which is not bad. When sorting, the maximum re-calc iteration per element would be n=1000 now. O(1000 x size) maybe it's a big constant but maybe somehow optimizable by compiler if some constants really const?
If CPU performance becomes problem again:
write a compiling function that writes your source code with all the "constant" given by user to a string
compile the code using command-line (assuming target computer has some accessible from command line like g++ from main program)
run it and get results
Compiler should enable more speed/memory optimizations when those are really constant in compile-time rather than depending on std::cin.
If you really need to add a hard-limit to the RAM usage, then implement a simple cache with the backing-store as your heavy computations with brute-force O(N^2) (or O(L x N) with checkpoints every L elements as in first method where L=2 or 4, or ...).
Here's a sample direct-mapped cache with 8M long-long value space:
int main()
{
std::vector<long long> checkpoints = {
a_0, a_16, a_32,...
};
auto cacheReadMissFunction = [&](int key){
// your pure computational algorithm here, helper meant to show variable
long long result = checkpoints[key/16];
for(key - key%16 times)
result = iterate(result);
return result;
};
auto cacheWriteMissFunction = [&](int key, long long value){
/* not useful for your algorithm as it doesn't change behavior per element */
// backing_store[key] = value;
};
// due to special optimizations, size has to be 2^k
int cacheSize = 1024*1024*8;
DirectMappedCache<int, long long> cache(cacheSize,cacheReadMissFunction,cacheWriteMissFunction);
std::cout << cache.get(20)<<std::endl;
return 0;
}
If you use a cache-friendly sorting-algorithm, a direct cache access would make a lot of re-use for nearly all the elements in comparisons if you fill the output buffer/terminal with elements one by one by following something like a bitonic-sort-path (that is known in compile-time). If that doesn't work, then you can try accessing files as a "backing-store" of cache for sorting whole array at once. Is file system prohibited for use? Then the online-compiling method above won't work either.
Implementation of a direct mapped cache (don't forget to call flush() after your algorithm finishes, if you use any cache.set() method):
#ifndef DIRECTMAPPEDCACHE_H_
#define DIRECTMAPPEDCACHE_H_
#include<vector>
#include<functional>
#include<mutex>
#include<iostream>
/* Direct-mapped cache implementation
* Only usable for integer type keys in range [0,maxPositive-1]
*
* CacheKey: type of key (only integers: int, char, size_t)
* CacheValue: type of value that is bound to key (same as above)
*/
template< typename CacheKey, typename CacheValue>
class DirectMappedCache
{
public:
// allocates buffers for numElements number of cache slots/lanes
// readMiss: cache-miss for read operations. User needs to give this function
// to let the cache automatically get data from backing-store
// example: [&](MyClass key){ return redis.get(key); }
// takes a CacheKey as key, returns CacheValue as value
// writeMiss: cache-miss for write operations. User needs to give this function
// to let the cache automatically set data to backing-store
// example: [&](MyClass key, MyAnotherClass value){ redis.set(key,value); }
// takes a CacheKey as key and CacheValue as value
// numElements: has to be integer-power of 2 (e.g. 2,4,8,16,...)
DirectMappedCache(CacheKey numElements,
const std::function<CacheValue(CacheKey)> & readMiss,
const std::function<void(CacheKey,CacheValue)> & writeMiss):size(numElements),sizeM1(numElements-1),loadData(readMiss),saveData(writeMiss)
{
// initialize buffers
for(size_t i=0;i<numElements;i++)
{
valueBuffer.push_back(CacheValue());
isEditedBuffer.push_back(0);
keyBuffer.push_back(CacheKey()-1);// mapping of 0+ allowed
}
}
// get element from cache
// if cache doesn't find it in buffers,
// then cache gets data from backing-store
// then returns the result to user
// then cache is available from RAM on next get/set access with same key
inline
const CacheValue get(const CacheKey & key) noexcept
{
return accessDirect(key,nullptr);
}
// only syntactic difference
inline
const std::vector<CacheValue> getMultiple(const std::vector<CacheKey> & key) noexcept
{
const int n = key.size();
std::vector<CacheValue> result(n);
for(int i=0;i<n;i++)
{
result[i]=accessDirect(key[i],nullptr);
}
return result;
}
// thread-safe but slower version of get()
inline
const CacheValue getThreadSafe(const CacheKey & key) noexcept
{
std::lock_guard<std::mutex> lg(mut);
return accessDirect(key,nullptr);
}
// set element to cache
// if cache doesn't find it in buffers,
// then cache sets data on just cache
// writing to backing-store only happens when
// another access evicts the cache slot containing this key/value
// or when cache is flushed by flush() method
// then returns the given value back
// then cache is available from RAM on next get/set access with same key
inline
void set(const CacheKey & key, const CacheValue & val) noexcept
{
accessDirect(key,&val,1);
}
// thread-safe but slower version of set()
inline
void setThreadSafe(const CacheKey & key, const CacheValue & val) noexcept
{
std::lock_guard<std::mutex> lg(mut);
accessDirect(key,&val,1);
}
// use this before closing the backing-store to store the latest bits of data
void flush()
{
try
{
std::lock_guard<std::mutex> lg(mut);
for (size_t i=0;i<size;i++)
{
if (isEditedBuffer[i] == 1)
{
isEditedBuffer[i]=0;
auto oldKey = keyBuffer[i];
auto oldValue = valueBuffer[i];
saveData(oldKey,oldValue);
}
}
}catch(std::exception &ex){ std::cout<<ex.what()<<std::endl; }
}
// direct mapped access
// opType=0: get
// opType=1: set
CacheValue const accessDirect(const CacheKey & key,const CacheValue * value, const bool opType = 0)
{
// find tag mapped to the key
CacheKey tag = key & sizeM1;
// compare keys
if(keyBuffer[tag] == key)
{
// cache-hit
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
valueBuffer[tag]=*value;
}
// cache hit value
return valueBuffer[tag];
}
else // cache-miss
{
CacheValue oldValue = valueBuffer[tag];
CacheKey oldKey = keyBuffer[tag];
// eviction algorithm start
if(isEditedBuffer[tag] == 1)
{
// if it is "get"
if(opType==0)
{
isEditedBuffer[tag]=0;
}
saveData(oldKey,oldValue);
// "get"
if(opType==0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else /* "set" */
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
else // not edited
{
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
}
// "get"
if(opType == 0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else // "set"
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
}
}
private:
const CacheKey size;
const CacheKey sizeM1;
std::mutex mut;
std::vector<CacheValue> valueBuffer;
std::vector<unsigned char> isEditedBuffer;
std::vector<CacheKey> keyBuffer;
const std::function<CacheValue(CacheKey)> loadData;
const std::function<void(CacheKey,CacheValue)> saveData;
};
#endif /* DIRECTMAPPEDCACHE_H_ */
You can solve this problem using a Max-heap.
Insert the first k elements into the max-heap. The largest element of these k will now be at the root.
For each remaining element e:
Compare e to the root.
If e is larger than the root, discard it.
If e is smaller than the root, remove the root and insert e into the heap structure.
After all elements have been processed, the k-th smallest element is at the root.
This method uses O(K) space and O(n log n) time.
There’s an algorithm that people often call LazySelect that I think would be perfect here.
With high probability, we make two passes. In the first pass, we save a random sample of size n much less than N. The answer will be around index (K/N)n in the sorted sample, but due to the randomness, we have to be careful. Save the values a and b at (K/N)n ± r instead, where r is the radius of the window. In the second pass, we save all of the values in [a, b], count the number of values less than a (let it be L), and select the value with index K−L if it’s in the window (otherwise, try again).
The theoretical advice on choosing n and r is fine, but I would be pragmatic here. Choose n so that you use most of the available memory; the bigger the sample, the more informative it is. Choose r fairly large as well, but not quite as aggressively due to the randomness.
C++ code below. On the online judge, it’s faster than Kelly’s (max 1.3 seconds on the T=3 tests, 0.5 on the T=1 tests).
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <limits>
#include <optional>
#include <random>
#include <vector>
namespace {
class LazySelector {
public:
static constexpr std::int32_t kTargetSampleSize = 1000;
explicit LazySelector() { sample_.reserve(1000000); }
void BeginFirstPass(const std::int32_t n, const std::int32_t k) {
sample_.clear();
mask_ = n / kTargetSampleSize;
mask_ |= mask_ >> 1;
mask_ |= mask_ >> 2;
mask_ |= mask_ >> 4;
mask_ |= mask_ >> 8;
mask_ |= mask_ >> 16;
}
void FirstPass(const std::int64_t value) {
if ((gen_() & mask_) == 0) {
sample_.push_back(value);
}
}
void BeginSecondPass(const std::int32_t n, const std::int32_t k) {
sample_.push_back(std::numeric_limits<std::int64_t>::min());
sample_.push_back(std::numeric_limits<std::int64_t>::max());
const double p = static_cast<double>(sample_.size()) / n;
const double radius = 2 * std::sqrt(sample_.size());
const auto lower =
sample_.begin() + std::clamp<std::int32_t>(std::floor(p * k - radius),
0, sample_.size() - 1);
const auto upper =
sample_.begin() + std::clamp<std::int32_t>(std::ceil(p * k + radius), 0,
sample_.size() - 1);
std::nth_element(sample_.begin(), upper, sample_.end());
std::nth_element(sample_.begin(), lower, upper);
lower_ = *lower;
upper_ = *upper;
sample_.clear();
less_than_lower_ = 0;
equal_to_lower_ = 0;
equal_to_upper_ = 0;
}
void SecondPass(const std::int64_t value) {
if (value < lower_) {
++less_than_lower_;
} else if (upper_ < value) {
} else if (value == lower_) {
++equal_to_lower_;
} else if (value == upper_) {
++equal_to_upper_;
} else {
sample_.push_back(value);
}
}
std::optional<std::int64_t> Select(std::int32_t k) {
if (k < less_than_lower_) {
return std::nullopt;
}
k -= less_than_lower_;
if (k < equal_to_lower_) {
return lower_;
}
k -= equal_to_lower_;
if (k < sample_.size()) {
const auto kth = sample_.begin() + k;
std::nth_element(sample_.begin(), kth, sample_.end());
return *kth;
}
k -= sample_.size();
if (k < equal_to_upper_) {
return upper_;
}
return std::nullopt;
}
private:
std::default_random_engine gen_;
std::vector<std::int64_t> sample_ = {};
std::int32_t mask_ = 0;
std::int64_t lower_ = std::numeric_limits<std::int64_t>::min();
std::int64_t upper_ = std::numeric_limits<std::int64_t>::max();
std::int32_t less_than_lower_ = 0;
std::int32_t equal_to_lower_ = 0;
std::int32_t equal_to_upper_ = 0;
};
} // namespace
int main() {
int t;
std::cin >> t;
for (int i = t; i > 0; --i) {
std::int32_t n;
std::int32_t k;
std::int64_t a;
std::int64_t b;
std::int64_t c;
std::int64_t d;
std::cin >> n >> k >> a >> b >> c >> d;
std::optional<std::int64_t> ans = std::nullopt;
LazySelector selector;
do {
{
selector.BeginFirstPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.FirstPass(s);
s = (b * s + c) % d;
}
}
{
selector.BeginSecondPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.SecondPass(s);
s = (b * s + c) % d;
}
}
ans = selector.Select(k - 1);
} while (!ans);
std::cout << *ans << '\n';
}
}
I have an assignment where image composition is done using SAD. And another task is to use MSE instead of SAD in the code. Im struggling with it so can anyone help me with this? Here is the code for SAD.
find_motion(my_image_comp *ref, my_image_comp *tgt,
int start_row, int start_col, int block_width, int block_height)
/* This function finds the motion vector which best describes the motion
between the `ref' and `tgt' frames, over a specified block in the
`tgt' frame. Specifically, the block in the `tgt' frame commences
at the coordinates given by `start_row' and `start_col' and extends
over `block_width' columns and `block_height' rows. The function finds
the translational offset (the returned vector) which describes the
best matching block of the same size in the `ref' frame, where
the "best match" is interpreted as the one which minimizes the sum of
absolute differences (SAD) metric. */
{
mvector vec, best_vec;
int sad, best_sad=256*block_width*block_height;
for (vec.y=-8; vec.y <= 8; vec.y++)
for (vec.x=-8; vec.x <= 8; vec.x++)
{
int ref_row = start_row-vec.y;
int ref_col = start_col-vec.x;
if ((ref_row < 0) || (ref_col < 0) ||
((ref_row+block_height) > ref->height) ||
((ref_col+block_width) > ref->width))
continue; // Translated block not containe within reference frame
int r, c;
int *rp = ref->buf + ref_row*ref->stride + ref_col;
int *tp = tgt->buf + start_row*tgt->stride + start_col;
for (sad=0, r=block_height; r > 0; r--,
rp+=ref->stride, tp+=tgt->stride)
for (c=0; c < block_width; c++)
{
int diff = tp[c] - rp[c];
sad += (diff < 0)?(-diff):diff;
}
if (sad < best_sad)
{
best_sad = sad;
best_vec = vec;
}
}
return best_vec;
}
I got the answer myself I think.
its,
for (mse = 0, r = block_height; r > 0; r--,
rp+=ref->stride, tp+=tgt->stride)
for (c=0; c < block_width; c++)
{
int diff = tp[c] - rp[c];
temp = (diff*diff) / (block_height*block_width);
mse += temp;
temp = 0;
}
if (mse < best_mse)
{
best_mse = mse;
best_vec = vec;
}
}
return best_vec;
}
I am a C++ newbie.
Context: I found this third-party snippet of code that seems to work, but based on my (very limited) knowledge of C++ I suspect it will cause problems. The snippet is as follows:
int aVariable;
int anInt = 1;
int anotherInt = 2;
int lastInt = 3;
aVariable = CHAIN(anInt, anotherInt, lastInt);
Where CHAIN is defined as follows (this is part of a library):
int CHAIN(){ Map(&CHAIN, MakeProcInstance(&_CHAIN), MAP_IPTR_VPN); }
int _CHAIN(int i, int np, int p){ return ASMAlloc(np, p, &chainproc); }
int keyalloc[16384], kpos, alloc_locked, tmp[4];
int ASMAlloc(int np, int p, alias proc)
{
int v, x;
// if(alloc_locked) return 0 & printf("WARNING: you can declare compound key statements (SEQ, CHAIN, EXEC, TEMPO, AXIS) only inside main() call, and not during an event.\xa");
v = elements(&keyalloc) - kpos - 4;
if(v < np | !np) return 0; // not enough allocation space or no parameters
Map(&v, p); Dim(&v, np); // v = params array
keyalloc[kpos] = np + 4; // size
keyalloc[kpos+1] = &proc; // function
keyalloc[kpos+2] = kpos + 2 + np; // parameters index
while(x < np)
{
keyalloc[kpos+3+x] = v[x];
x = x+1;
}
keyalloc[kpos+3+np] = kpos + 3 | JUMP;
x = ASMFind(kpos);
if(x == kpos) kpos = kpos + np + 4;
return x + 1 | PROC; // skip block size
}
int ASMFind(int x)
{
int i, j, k; while(i < x)
{
k = i + keyalloc[i]; // next
if(keyalloc[i] == keyalloc[x]) // size
if(keyalloc[i+1] == keyalloc[x+1]) // proc
{
j = x-i;
i = i+3;
while(keyalloc[i] == keyalloc[j+i]) i = i+1; // param
if((keyalloc[i] & 0xffff0000) == JUMP) return x-j;
}
i = k;
}
return x;
}
EDIT:
The weird thing is that running
CHAIN(aVariable);
effectively executes
CHAIN(anInt, anotherInt, lastInt);
Somehow. This is what led me to believe that aVariable is, in fact, a pointer.
QUESTION:
Is it correct to store a parametrized function call into an integer variable like so? Does "aVariable" work just as a pointer, or is this likely to corrupt random memory areas?
You're calling a function (through an obfuscated interface), and storing the result in an integer. It might or might not cause problems, depending on how you use the value / what you expect it to mean.
Your example contains too many undefined symbols for the reader to provide any better answer.
Also, I think this is C, not C++ code.
I've managed to write my algorithm in recursive way:
int fib(int n) {
if(n == 1)
return 3
elseif (n == 2)
return 2
else
return fib(n – 2) + fib(n – 1)
}
Currently I'm trying to convert it to iterative approach without success:
int fib(int n) {
int i = 0, j = 1, k, t;
for (k = 1; k <= n; ++k)
{
if(n == 1) {
j = 3;
}
else if(n == 2) {
j = 2;
}
else {
t = i + j;
i = j;
j = t;
}
}
return j;
}
So how can I rectify my code to reach my goal?
Solving this problem by a general convert-to-iterative is a bad idea. But, that is what you asked.
None of these are good ways to solve fib: there are closed form solutions for fib, and/or iterative solutions that are cleaner, and/or recursive memoized solutions. Rather, I'm showing relatively mechanical techniques for taking a recursive function (that isn't tail-recursive or otherwise simple to solve), and solving it without using the automatic storage stack (recursion).
I have had code that does too deep a recursive nesting and blows the stack in medium-high complexity cases; when refactored to iterative, the problem went away. These are the kinds of solutions required when what you have is a recursive solution you half understand, and you need it to be iterative.
The general means to convert a recursive to an iterative solution is to manage the stack manually.
In this case, I'll also memoize return values.
We cache the return values in retvals.
If we cannot immediately solve a problem, we state what problems we first need to solve in order to solve our problem (in particular, the n-1 and n-2 cases). Then we queue up solving our problem again (by which point, we will have what we need ready go).
int fib( int n ) {
std::map< int, int > retvals {
{1,3},
{2,2}
};
std::vector<int> arg;
arg.push_back(n);
while( !arg.empty() ) {
int n = arg.back();
arg.pop_back();
// have we solved this already? If so, stop.
if (retvals.count(n)>0)
continue;
// are we done? If so, calculate the result:
if (retvals.count(n-1)>0 && retvals.count(n-2)>0) {
retvals[n] = retvals[n-1] + retvals[n-2];
continue;
}
// to calculate n, first calculate n-1 and n-2:
arg.push_back(n); arg.push_back(n-1); arg.push_back(n-2);
}
return retvals[n];
}
No recursion, just a loop.
A "dumber" way to do this is to take the function and make it a pseudo-coroutine.
First, rewrite your recursive code to do one thing per line:
int fib(int n) {
if(n == 1)
return 3
if (n == 2)
return 2
int a = fib(n-2);
int b = fib(n-1);
return a+b;
}
Next, create a struct with all of the functions' state:
struct fib_data {
int n, a, b, r;
};
and add labels at each point where we make a recursive call, and an enum with similar names:
enum Calls {
e1, e2
};
int fib(int n) {
fib_data d;
d.n = n;
if(d.n == 1)
return 3
if (d.n == 2)
return 2
d.a = fib(n-2);
CALL1:
d.b = fib(n-1);
CALL2:
d.r = d.a+d.b;
return d.r;
}
add CALLS to your fib_data.
Next create a stack of fib_data:
enum Calls {
e0, e1, e2
};
struct fib_data {
Calls loc = Calls::e0;
int n, a, b, r;
};
int fib(int n) {
std::vector<fib_data> stack;
stack.push_back({n});
if(stack.back().n == 1)
return 3
if (stack.back().n == 2)
return 2
stack.back().a = fib(stack.back().n-2);
CALL1:
stack.back().b = fib(stack.back().n-1);
CALL2:
stack.back().r = stack.back().a + stack.back().b;
return stack.back().r;
}
now create a loop. Instead of recursively calling, set the return location in your fib_data, push a fib_data onto the stack with an n and an e0 location, then continue the loop. At the top of the loop, switch on the top of the stack's location.
To return: Create a function local variable r to store return values. To return, set r, pop the stack, and continue the loop.
If the stack is empty at the start of the loop, return r from the function.
enum Calls {
e0, e1, e2
};
struct fib_data {
int n, a, b, r;
Calls loc = Calls::e0;
};
int fib(int n) {
std::vector<fib_data> stack;
stack.push_back({n});
int r;
while (!stack.empty()) {
switch(stack.back().loc) {
case e0: break;
case e1: goto CALL1;
case e2: goto CALL2;
};
if(stack.back().n == 1) {
r = 3;
stack.pop_back();
continue;
}
if (stack.back().n == 2){
r = 2;
stack.pop_back();
continue;
}
stack.back().loc = e1;
stack.push_back({stack.back().n-2});
continue;
CALL1:
stack.back().a = r;
stack.back().loc = e2;
stack.push_back({stack.back().n-1});
continue;
CALL2:
stack.back().b = r;
stack.back().r = stack.back().a + stack.back().b;
r = stack.back().r;
stack.pop_back();
continue;
}
}
Then note that b and r do not have to be in the stack -- remove it, and make it local.
This "dumb" transformation emulates what the C++ compiler does when you recurse, but the stack is stored in the free store instead of automatic storage, and can reallocate.
If pointers to the local variables need to persist, using a std::vector for the stack won't work. Replace the pointers with offsets into the standard vector, and it will work.
This should be fib(0) = 0, fib(1) = 1, fib(2) = 1, fib(3) = 2, fib(4) = 3, fib(5) = 5, fib(6) = 8, ... .
fib(n)
{
int f0, f1, t;
if(n < 2)
return n;
n -= 2;
f0 = 1;
f1 = 1;
while(n--){
t = f1+f0;
f0 = f1;
f1 = t;
}
return f1;
}
or you can unfold the loop a bit, and get rid of the temp variable:
int fib(int n)
{
int f0, f1;
if(n < 2)
return n;
f0 = 1-(n&1);
f1 = 1;
while(0 < (n -= 2)){
f0 += f1;
f1 += f0;
}
return f1;
}
This is a classic problem. you can not simply get rid of the recursion if you are given n and you want to calculate down.
the solution is Dynamic programming. basically you want to create an array of size n, then starting from index 0 fill it up until you reach index n-1;
something like this:
int fib(int n)
{
int buffer[n+1];
buffer[0]=3;
buffer[1]=2;
for(int i=2;i<=n; ++i)
{
buffer[i] = buffer[i-1] + buffer[i-2];
}
return buffer[n];
}
alternatively to save memory and not use a big array you can use:
int fib(int n)
{
int buffer [2];
buffer[0] = 3;
buffer[1] = 2;
for(int i=3; i<=n; i++)
{
int tmp = buffer[0] + buffer[1];
buffer[0] = buffer[1];
buffer[1] = temp;
}
return buffer[1];
}
For a sake of completeness here is the iterative solution with O(1) space complexity:
int fib(n)
{
int i;
int a0 = 3;
int a1 = 2;
int tmp;
if (n == 1)
return a0;
for (i = 3; i <=n; i++ )
{
tmp = a0 + a1;
a0 = a1;
a1 = tmp;
}
return a1;
}
I am making a 3D application where a boat has to drive through buoy tracks. I also need to store the tracks in groups or "layouts". The buoys class is basically a list of "buoy layouts" inside of which is a list of "buoy tracks", inside of which is a list of buoys.
I checked the local variable watcher and all memory allocations in the constructor appear to work. Later when the calculateCoordinates function is called it enters a for loop. On the first iteration of the for loop the functions pointer is used and works fine, but then on this line
ctMain[j+1][1] = 0;
the function pointers are set to NULL. I am guessing it has something to with the structs not being allocated or addressed correctly. I am not sure what to do from here. Maybe I am not understanding how malloc is working.
Update
I replaced the M3DVector3d main_track with double ** main_track, thinking maybe malloc is not handling the typedefs correctly. But I am getting the same error when trying to access the main_track variable later in calculateCoordinates.
Update
It ended up being memory corruption caused by accessing a pointer wrong in the line
rotatePointD(&(cTrack->main_track[j]), rotation);
It only led to an error later when I tried to access it.
// Buoys.h
////////////////////////////////////////////
struct buoy_layout_t;
struct buoy_track_t;
typedef double M3DVector3d[3];
class Buoys {
public:
Buoys();
struct buoy_layout_t ** buoyLayouts;
int nTrackLayouts;
int currentLayoutID;
void calculateCoordinates();
};
struct buoy_track_t {
int nMain, nYellow, nDistract;
M3DVector3d * main_track,
yellow_buoys,
distraction_buoys;
double (*f)(double x);
double (*fp)(double x);
double thickness;
M3DVector3d start, end;
};
struct buoy_layout_t {
int nTracks;
buoy_track_t ** tracks;
};
// Buoys.cpp
/////////////////////////////
// polynomial and its derivative, for shape of track
double buoyfun1(double x) {return (1.0/292.0)*x*(x-12.0)*(x-24.0);}
double buoyfun1d(double x) {return (1.0/292.0)*((3.0*pow(x,2))-(72.0*x)+288.0);}
// ... rest of buoy shape functions go here ...
Buoys::Buoys() {
struct buoy_layout_t * cLayout;
struct buoy_track_t * cTrack;
nTrackLayouts = 1;
buoyLayouts = (buoy_layout_t **) malloc(nTrackLayouts*sizeof(*buoyLayouts));
for (int i = 0; i < nTrackLayouts; i++) {
buoyLayouts[i] = (buoy_layout_t *) malloc(sizeof(*(buoyLayouts[0])));
}
currentLayoutID = 0;
// ** Layout 1 **
cLayout = buoyLayouts[0];
cLayout->nTracks = 1;
cLayout->tracks = (buoy_track_t **) malloc(sizeof(*(cLayout->tracks)));
for (int i = 0; i < 1; i++) {
cLayout->tracks[i] = (buoy_track_t *) malloc (sizeof(*(cLayout->tracks)));
}
cTrack = cLayout->tracks[0];
cTrack->main_track = (M3DVector3d *) malloc(30*sizeof(*(cTrack->main_track)));
cTrack->nMain = 30;
cTrack->f = buoyfun1;
cTrack->fp = buoyfun1d;
cTrack->thickness = 5.5;
cTrack->start[0] = 0; cTrack->start[1] = 0; cTrack->start[2] = 0;
cTrack->end[0] = 30; cTrack->end[1] = 0; cTrack->end[2] = -19;
// ... initialize rest of layouts here ...
// ** Layout 2 **
// ** Layout 3 **
// ...
// ** Layout N **
calculateCoordinates();
}
void Buoys::calculateCoordinates()
{
int i, j;
buoy_layout_t * cLayout = buoyLayouts[0];
for (i = 0; i < (cLayout->nTracks); i++) {
buoy_track_t * cTrack = cLayout->tracks[i];
M3DVector3d * ctMain = cTrack->main_track;
double thickness = cTrack->thickness;
double rotation = getAngleD(cTrack->start[0], cTrack->start[2],
cTrack->end[0], cTrack->end[2]);
double full_disp = sqrt(pow((cTrack->end[0] - cTrack->start[0]), 2)
+ pow((cTrack->end[2] - cTrack->start[2]), 2));
// nBuoys is nBuoys per side. So one side has nBuoys/2 buoys.
for (j=0; j < cTrack->nMain; j+=2) {
double id = j*((full_disp)/(cTrack->nMain));
double y = (*(cTrack->f))(id);
double yp = (*(cTrack->fp))(id);
double normal, normal_a;
if (yp!=0) {
normal = -1.0/yp;
}
else {
normal = 999999999;
}
if (normal > 0) {
normal_a = atan(normal);
}
else {
normal_a = atan(normal) + PI;
}
ctMain[j][0] = id + ((thickness/2.0)*cos(normal_a));
ctMain[j][1] = 0;
ctMain[j][2] = y + ((thickness/2.0)*sin(normal_a));
ctMain[j+1][0] = id + ((thickness/2.0)*cos(normal_a+PI));
ctMain[j+1][1] = 0; // function pointers get set to null here
ctMain[j+1][2] = y + ((thickness/2.0)*sin(normal_a+PI));
}
for (j=0; j < cTrack->nMain; j++) {
rotatePointD(&(cTrack->main_track[j]), rotation);
}
}
}
Unless there are requirements for learning pointers or you cannot use STL, given you are using C++ I'd strongly recommend you use more STL, it is your friend. But anyways...
First, the type of ctMain is *M3DVector3D. So you can safely access ctMain[0], but you cannot access ctMain[1], maybe you meant for the type of ctMain to be **M3DVector3D, in which case the line for initialization you had written which is:
cTrack->main_track = (M3DVector3d *) malloc(30*sizeof(*(cTrack->main_track)));
would make sense.
More Notes
Why are you allocating 30 of these here?
cTrack->main_track = (M3DVector3d *) malloc(30*sizeof(*(cTrack->main_track)));
Given the type of main_track, you only need:
cTrack->main_track = (M3DVector3d *) malloc(sizeof(M3DVector3d));
In addition, for organizational purposes, when doing sizeof you may want to give the actual type to check the sizeof, as opposed to the variable (there should be no difference, just organizational), these two changes:
buoyLayouts = (buoy_layout_t **) malloc(nTrackLayouts*sizeof(buoy_layout_t*));
for (int i = 0; i < nTrackLayouts; i++) {
buoyLayouts[i] = (buoy_layout_t *) malloc(sizeof(buoy_layout_t));
}
cLayout->tracks = (buoy_track_t **) malloc(clayout->nTracks * sizeof(buoy_track_t*));
for (int i = 0; i < 1; i++) {
cLayout->tracks[i] = (buoy_track_t *) malloc(sizeof(buoy_track_t));
}