Let's have a look at the following code:
tbb::blocked_range<int> range(0, a.rows);
uint64_t positive = tbb::parallel_reduce(range, 0, // <- initial value
[&](const tbb::blocked_range<int>& r, uint64_t v)->uint64_t {
for (int y = r.begin(); y < r.end(); ++y) {
auto rA = a[y], rB = b[y];
for (int x = 0; x < a.cols; ++x) {
auto A = rA[x], B = rB[x];
for (int l = y; l < a.rows; ++l) {
auto rAA = a[l], rBB = b[l];
for (int m = x; m < a.cols; ++m) {
if (l == y && m == x)
continue;
auto AA = rAA[m], BB = rBB[m];
if ((A == AA) && (B == BB))
v++; // <- value is changed
if ((A != AA) && (B != BB))
v++; // <- value is changed
}
}
}
}
return v;
}, [](uint64_t first, uint64_t second)->uint64_t {
std::cerr << first << ' + ' << second; // <- wrong values occur
return first+second;
}
);
This is a parallel reduce operation where the initial value is 0. Then, in each parallel computation, based on the initial value, we count up (local variable v in the first lambda function). The second lambda function aggregates the results from parallel workers.
Interestingly enough, this code does not work as expected. The output of the second lambda function will show enormous figures that result from integer overflows.
The code works correctly when replacing the second line with:
uint64_t positive = tbb::parallel_reduce(range, (uint64_t)0, // <- initial value
Now I wonder. Wouldn't the definition of the first lambda (uint64_t v) enforce this cast and how can a function that is supposed to operate on uint64_t operate on int instead?
The compiler is GCC 6.
It doesn't matter what argument the lambda takes. According to the docs, everything is based on the type of the 2nd argument:
template<typename Range, typename Value,
typename Func, typename Reduction>
Value parallel_reduce( const Range& range, const Value& identity,
const Func& func, const Reduction& reduction,
[, partitioner[, task_group_context& group]] );
with pseudo-signatures of:
Value Func::operator()(const Range& range, const Value& x)
Value Reduction::operator()(const Value& x, const Value& y)
So a Value is passed into Func and into Reduction and returned. If you want uint64_ts everywhere, you'll need to ensure that Value is uint64_t. Which is why your (uint64_t)0 works but your 0 doesn't (and is actually undefined behavior to boot).
Note that this is the same problem that you would get with just normal accumulate:
std::vector<uint64_t> vs{0x7fffffff, 0x7fffffff, 0x7fffffff};
uint64_t sum = std::accumulate(vs.begin(), vs.end(), 0, std::plus<uint64_t>{});
// ^^^ oops, int 0!
// even though I'm using plus<uint64_t>!
assert(sum == 0x17ffffffd); // fails because actually sum is truncated
// and is just 0x7ffffffd
Related
The task (from a Bulgarian judge, click on "Език" to change it to English):
I am given the size of the first (S1 = A) of N corals. The size of every subsequent coral (Si, where i > 1) is calculated using the formula (B*Si-1 + C)%D, where A, B, C and D are some constants. I am told that Nemo is nearby the Kth coral (when the sizes of all corals are sorted in ascending order).
What is the size of the above-mentioned Kth coral ?
I will have T tests and for every one of them I will be given N, K, A, B, C and D and prompted to output the size of the Kth coral.
The requirements:
1 ≤ T ≤ 3
1 ≤ K ≤ N ≤ 107
0 ≤ A < D ≤ 1018
1 ≤ C, B*D ≤ 1018
Memory available is 64 MB
Time limit is 1.9 sec
The problem I have:
For the worst case scenario I will need 107*8B which is 76 MB.
The solution If the memory available was at least 80 MB would be:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
std::vector<biggie>::iterator it_ans;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
std::vector<biggie> lut{ a };
lut.reserve(n);
for (j = 1; j != n; ++j) {
lut.emplace_back((b * lut.back() + c) % d);
}
it_ans = std::next(lut.begin(), k - 1);
std::nth_element(lut.begin(), it_ans, lut.end());
std::cout << *it_ans << '\n';
}
return 0;
}
Question 1: How can I approach this CP task given the requirements listed above ?
Question 2: Is it somehow possible to use std::nth_element to solve it since I am not able to store all N elements ? I mean using std::nth_element in a sliding window technique (If this is possible).
# Christian Sloper
#include <iostream>
#include <queue>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j, j_lim;
biggie a, b, c, d, prev, curr;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
if (k < n - k + 1) {
std::priority_queue<biggie, std::vector<biggie>, std::less<biggie>> q;
q.push(a);
prev = a;
for (j = 1; j != k; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr < q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
else {
std::priority_queue<biggie, std::vector<biggie>, std::greater<biggie>> q;
q.push(a);
prev = a;
for (j = 1, j_lim = n - k + 1; j != j_lim; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr > q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
}
return 0;
}
This gets accepted (Succeeds all 40 tests. Largest time 1.4 seconds, for a test with T=3 and D≤10^9. Largest time for a test with larger D (and thus T=1) is 0.7 seconds.).
#include <iostream>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
biggie prefix = 0;
for (int shift = d > 1000000000 ? 40 : 20; shift >= 0; shift -= 20) {
biggie prefix_mask = ((biggie(1) << (40 - shift)) - 1) << (shift + 20);
int count[1 << 20] = {0};
biggie s = a;
int rank = 0;
for (j = 0; j != n; ++j) {
biggie s_vs_prefix = s & prefix_mask;
if (s_vs_prefix < prefix)
++rank;
else if (s_vs_prefix == prefix)
++count[(s >> shift) & ((1 << 20) - 1)];
s = (b * s + c) % d;
}
int i = -1;
while (rank < k)
rank += count[++i];
prefix |= biggie(i) << shift;
}
std::cout << prefix << '\n';
}
return 0;
}
The result is a 60 bits number. I first determine the high 20 bits with one pass through the numbers, then the middle 20 bits in another pass, then the low 20 bits in another.
For the high 20 bits, generate all the numbers and count how often each high 20 bits pattern occurrs. After that, add up the counts until you reach K. The pattern where you reach K, that pattern covers the K-th largest number. In other words, that's the result's high 20 bits.
The middle and low 20 bits are computed similarly, except we take the by then known prefix (the high 20 bits or high+middle 40 bits) into account. As a little optimization, when D is small, I skip computing the high 20 bits. That got me from 2.1 seconds down to 1.4 seconds.
This solution is like user3386109 described, except with bucket size 2^20 instead of 10^6 so I can use bit operations instead of divisions and think of bit patterns instead of ranges.
For the memory constraint you hit:
(B*Si-1 + C)%D
requires only the value (Si-2) before itself. So you can compute them in pairs, to use only 1/2 of total you need. This only needs indexing even values and iterating once for odd values. So you can just use half-length LUT and compute the odd value in-flight. Modern CPUs are fast enough to do extra calculations like these.
std::vector<biggie> lut{ a_i,a_i_2,a_i_4,... };
a_i_3=computeOddFromEven(lut[1]);
You can make a longer stride like 4,8 too. If dataset is large, RAM latency is big. So it's like having checkpoints in whole data search space to balance between memory and core usage. 1000-distance checkpoints would put a lot of cpu cycles into re-calculations but then the array would fit CPU's L2/L1 cache which is not bad. When sorting, the maximum re-calc iteration per element would be n=1000 now. O(1000 x size) maybe it's a big constant but maybe somehow optimizable by compiler if some constants really const?
If CPU performance becomes problem again:
write a compiling function that writes your source code with all the "constant" given by user to a string
compile the code using command-line (assuming target computer has some accessible from command line like g++ from main program)
run it and get results
Compiler should enable more speed/memory optimizations when those are really constant in compile-time rather than depending on std::cin.
If you really need to add a hard-limit to the RAM usage, then implement a simple cache with the backing-store as your heavy computations with brute-force O(N^2) (or O(L x N) with checkpoints every L elements as in first method where L=2 or 4, or ...).
Here's a sample direct-mapped cache with 8M long-long value space:
int main()
{
std::vector<long long> checkpoints = {
a_0, a_16, a_32,...
};
auto cacheReadMissFunction = [&](int key){
// your pure computational algorithm here, helper meant to show variable
long long result = checkpoints[key/16];
for(key - key%16 times)
result = iterate(result);
return result;
};
auto cacheWriteMissFunction = [&](int key, long long value){
/* not useful for your algorithm as it doesn't change behavior per element */
// backing_store[key] = value;
};
// due to special optimizations, size has to be 2^k
int cacheSize = 1024*1024*8;
DirectMappedCache<int, long long> cache(cacheSize,cacheReadMissFunction,cacheWriteMissFunction);
std::cout << cache.get(20)<<std::endl;
return 0;
}
If you use a cache-friendly sorting-algorithm, a direct cache access would make a lot of re-use for nearly all the elements in comparisons if you fill the output buffer/terminal with elements one by one by following something like a bitonic-sort-path (that is known in compile-time). If that doesn't work, then you can try accessing files as a "backing-store" of cache for sorting whole array at once. Is file system prohibited for use? Then the online-compiling method above won't work either.
Implementation of a direct mapped cache (don't forget to call flush() after your algorithm finishes, if you use any cache.set() method):
#ifndef DIRECTMAPPEDCACHE_H_
#define DIRECTMAPPEDCACHE_H_
#include<vector>
#include<functional>
#include<mutex>
#include<iostream>
/* Direct-mapped cache implementation
* Only usable for integer type keys in range [0,maxPositive-1]
*
* CacheKey: type of key (only integers: int, char, size_t)
* CacheValue: type of value that is bound to key (same as above)
*/
template< typename CacheKey, typename CacheValue>
class DirectMappedCache
{
public:
// allocates buffers for numElements number of cache slots/lanes
// readMiss: cache-miss for read operations. User needs to give this function
// to let the cache automatically get data from backing-store
// example: [&](MyClass key){ return redis.get(key); }
// takes a CacheKey as key, returns CacheValue as value
// writeMiss: cache-miss for write operations. User needs to give this function
// to let the cache automatically set data to backing-store
// example: [&](MyClass key, MyAnotherClass value){ redis.set(key,value); }
// takes a CacheKey as key and CacheValue as value
// numElements: has to be integer-power of 2 (e.g. 2,4,8,16,...)
DirectMappedCache(CacheKey numElements,
const std::function<CacheValue(CacheKey)> & readMiss,
const std::function<void(CacheKey,CacheValue)> & writeMiss):size(numElements),sizeM1(numElements-1),loadData(readMiss),saveData(writeMiss)
{
// initialize buffers
for(size_t i=0;i<numElements;i++)
{
valueBuffer.push_back(CacheValue());
isEditedBuffer.push_back(0);
keyBuffer.push_back(CacheKey()-1);// mapping of 0+ allowed
}
}
// get element from cache
// if cache doesn't find it in buffers,
// then cache gets data from backing-store
// then returns the result to user
// then cache is available from RAM on next get/set access with same key
inline
const CacheValue get(const CacheKey & key) noexcept
{
return accessDirect(key,nullptr);
}
// only syntactic difference
inline
const std::vector<CacheValue> getMultiple(const std::vector<CacheKey> & key) noexcept
{
const int n = key.size();
std::vector<CacheValue> result(n);
for(int i=0;i<n;i++)
{
result[i]=accessDirect(key[i],nullptr);
}
return result;
}
// thread-safe but slower version of get()
inline
const CacheValue getThreadSafe(const CacheKey & key) noexcept
{
std::lock_guard<std::mutex> lg(mut);
return accessDirect(key,nullptr);
}
// set element to cache
// if cache doesn't find it in buffers,
// then cache sets data on just cache
// writing to backing-store only happens when
// another access evicts the cache slot containing this key/value
// or when cache is flushed by flush() method
// then returns the given value back
// then cache is available from RAM on next get/set access with same key
inline
void set(const CacheKey & key, const CacheValue & val) noexcept
{
accessDirect(key,&val,1);
}
// thread-safe but slower version of set()
inline
void setThreadSafe(const CacheKey & key, const CacheValue & val) noexcept
{
std::lock_guard<std::mutex> lg(mut);
accessDirect(key,&val,1);
}
// use this before closing the backing-store to store the latest bits of data
void flush()
{
try
{
std::lock_guard<std::mutex> lg(mut);
for (size_t i=0;i<size;i++)
{
if (isEditedBuffer[i] == 1)
{
isEditedBuffer[i]=0;
auto oldKey = keyBuffer[i];
auto oldValue = valueBuffer[i];
saveData(oldKey,oldValue);
}
}
}catch(std::exception &ex){ std::cout<<ex.what()<<std::endl; }
}
// direct mapped access
// opType=0: get
// opType=1: set
CacheValue const accessDirect(const CacheKey & key,const CacheValue * value, const bool opType = 0)
{
// find tag mapped to the key
CacheKey tag = key & sizeM1;
// compare keys
if(keyBuffer[tag] == key)
{
// cache-hit
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
valueBuffer[tag]=*value;
}
// cache hit value
return valueBuffer[tag];
}
else // cache-miss
{
CacheValue oldValue = valueBuffer[tag];
CacheKey oldKey = keyBuffer[tag];
// eviction algorithm start
if(isEditedBuffer[tag] == 1)
{
// if it is "get"
if(opType==0)
{
isEditedBuffer[tag]=0;
}
saveData(oldKey,oldValue);
// "get"
if(opType==0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else /* "set" */
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
else // not edited
{
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
}
// "get"
if(opType == 0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else // "set"
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
}
}
private:
const CacheKey size;
const CacheKey sizeM1;
std::mutex mut;
std::vector<CacheValue> valueBuffer;
std::vector<unsigned char> isEditedBuffer;
std::vector<CacheKey> keyBuffer;
const std::function<CacheValue(CacheKey)> loadData;
const std::function<void(CacheKey,CacheValue)> saveData;
};
#endif /* DIRECTMAPPEDCACHE_H_ */
You can solve this problem using a Max-heap.
Insert the first k elements into the max-heap. The largest element of these k will now be at the root.
For each remaining element e:
Compare e to the root.
If e is larger than the root, discard it.
If e is smaller than the root, remove the root and insert e into the heap structure.
After all elements have been processed, the k-th smallest element is at the root.
This method uses O(K) space and O(n log n) time.
There’s an algorithm that people often call LazySelect that I think would be perfect here.
With high probability, we make two passes. In the first pass, we save a random sample of size n much less than N. The answer will be around index (K/N)n in the sorted sample, but due to the randomness, we have to be careful. Save the values a and b at (K/N)n ± r instead, where r is the radius of the window. In the second pass, we save all of the values in [a, b], count the number of values less than a (let it be L), and select the value with index K−L if it’s in the window (otherwise, try again).
The theoretical advice on choosing n and r is fine, but I would be pragmatic here. Choose n so that you use most of the available memory; the bigger the sample, the more informative it is. Choose r fairly large as well, but not quite as aggressively due to the randomness.
C++ code below. On the online judge, it’s faster than Kelly’s (max 1.3 seconds on the T=3 tests, 0.5 on the T=1 tests).
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <limits>
#include <optional>
#include <random>
#include <vector>
namespace {
class LazySelector {
public:
static constexpr std::int32_t kTargetSampleSize = 1000;
explicit LazySelector() { sample_.reserve(1000000); }
void BeginFirstPass(const std::int32_t n, const std::int32_t k) {
sample_.clear();
mask_ = n / kTargetSampleSize;
mask_ |= mask_ >> 1;
mask_ |= mask_ >> 2;
mask_ |= mask_ >> 4;
mask_ |= mask_ >> 8;
mask_ |= mask_ >> 16;
}
void FirstPass(const std::int64_t value) {
if ((gen_() & mask_) == 0) {
sample_.push_back(value);
}
}
void BeginSecondPass(const std::int32_t n, const std::int32_t k) {
sample_.push_back(std::numeric_limits<std::int64_t>::min());
sample_.push_back(std::numeric_limits<std::int64_t>::max());
const double p = static_cast<double>(sample_.size()) / n;
const double radius = 2 * std::sqrt(sample_.size());
const auto lower =
sample_.begin() + std::clamp<std::int32_t>(std::floor(p * k - radius),
0, sample_.size() - 1);
const auto upper =
sample_.begin() + std::clamp<std::int32_t>(std::ceil(p * k + radius), 0,
sample_.size() - 1);
std::nth_element(sample_.begin(), upper, sample_.end());
std::nth_element(sample_.begin(), lower, upper);
lower_ = *lower;
upper_ = *upper;
sample_.clear();
less_than_lower_ = 0;
equal_to_lower_ = 0;
equal_to_upper_ = 0;
}
void SecondPass(const std::int64_t value) {
if (value < lower_) {
++less_than_lower_;
} else if (upper_ < value) {
} else if (value == lower_) {
++equal_to_lower_;
} else if (value == upper_) {
++equal_to_upper_;
} else {
sample_.push_back(value);
}
}
std::optional<std::int64_t> Select(std::int32_t k) {
if (k < less_than_lower_) {
return std::nullopt;
}
k -= less_than_lower_;
if (k < equal_to_lower_) {
return lower_;
}
k -= equal_to_lower_;
if (k < sample_.size()) {
const auto kth = sample_.begin() + k;
std::nth_element(sample_.begin(), kth, sample_.end());
return *kth;
}
k -= sample_.size();
if (k < equal_to_upper_) {
return upper_;
}
return std::nullopt;
}
private:
std::default_random_engine gen_;
std::vector<std::int64_t> sample_ = {};
std::int32_t mask_ = 0;
std::int64_t lower_ = std::numeric_limits<std::int64_t>::min();
std::int64_t upper_ = std::numeric_limits<std::int64_t>::max();
std::int32_t less_than_lower_ = 0;
std::int32_t equal_to_lower_ = 0;
std::int32_t equal_to_upper_ = 0;
};
} // namespace
int main() {
int t;
std::cin >> t;
for (int i = t; i > 0; --i) {
std::int32_t n;
std::int32_t k;
std::int64_t a;
std::int64_t b;
std::int64_t c;
std::int64_t d;
std::cin >> n >> k >> a >> b >> c >> d;
std::optional<std::int64_t> ans = std::nullopt;
LazySelector selector;
do {
{
selector.BeginFirstPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.FirstPass(s);
s = (b * s + c) % d;
}
}
{
selector.BeginSecondPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.SecondPass(s);
s = (b * s + c) % d;
}
}
ans = selector.Select(k - 1);
} while (!ans);
std::cout << *ans << '\n';
}
}
I am using functors to generate compile time calculated code in the following way (I apologize for the long code, but it is the only way I have found to reproduce the behavior):
#include <array>
#include <tuple>
template <int order>
constexpr auto compute (const double h)
{
std::tuple<std::array<double,order>,
std::array<double,order> > paw{};
auto xtab = std::get<0>(paw).data();
auto weight = std::get<1>(paw).data();
if constexpr ( order == 3 )
{
xtab[0] = - 1.0E+00;
xtab[1] = 0.0E+00;
xtab[2] = 1.0E+00;
weight[0] = 1.0 / 3.0E+00;
weight[1] = 4.0 / 3.0E+00;
weight[2] = 1.0 / 3.0E+00;
}
else if constexpr ( order == 4 )
{
xtab[0] = - 1.0E+00;
xtab[1] = - 0.447213595499957939281834733746E+00;
xtab[2] = 0.447213595499957939281834733746E+00;
xtab[3] = 1.0E+00;
weight[0] = 1.0E+00 / 6.0E+00;
weight[1] = 5.0E+00 / 6.0E+00;
weight[2] = 5.0E+00 / 6.0E+00;
weight[3] = 1.0E+00 / 6.0E+00;
}
for (auto & el : std::get<0>(paw))
el = (el + 1.)/2. * h ;
for (auto & el : std::get<1>(paw))
el = el/2. * h ;
return paw;
}
template <std::size_t n>
class Basis
{
public:
constexpr Basis(const double h_) :
h(h_),
paw(compute<n>(h)),
coeffs(std::array<double,n>())
{}
const double h ;
const std::tuple<std::array<double,n>,
std::array<double,n> > paw ;
const std::array<double,n> coeffs ;
constexpr double operator () (int i, double x) const
{
return 1. ;
}
};
template <std::size_t n,std::size_t p,typename Ltype,typename number=double>
class Functor
{
public:
constexpr Functor(const Ltype L_):
L(L_)
{}
const Ltype L ;
constexpr auto operator()(const auto v) const
{
const auto l = L;
// const auto l = L();
std::array<std::array<number,p+1>,p+1> CM{},CM0{},FM{};
const auto basis = Basis<p+1>(l);
typename std::remove_const<typename std::remove_reference<decltype(v)>::type>::type w{};
for (auto i = 0u; i < p + 1; ++i)
CM0[i][0] += l;
for (auto i = 0u ; i < p+1 ; ++i)
for (auto j = 0u ; j < p+1 ; ++j)
{
w[i] += CM0[i][j]*v[j];
}
for (auto b = 1u ; b < n-1 ; ++b)
for (auto i = 0u ; i < p+1 ; ++i)
for (auto j = 0u ; j < p+1 ; ++j)
{
w[b*(p+1)+i] += CM[i][j]*v[b*(p+1)+j];
w[b*(p+1)+i] += FM[i][j]*v[(b+1)*(p+1)+j];
}
return w ;
}
};
int main(int argc,char *argv[])
{
const auto nel = 4u;
const auto p = 2u;
std::array<double,nel*(p+1)> x{} ;
constexpr auto L = 1.;
// constexpr auto L = [](){return 1.;};
const auto A = Functor<nel,p,decltype(L)>(L);
const volatile auto y = A(x);
return 0;
}
I compile using GCC 8.2.0 with the flags:
-march=native -std=c++1z -fconcepts -Ofast -Wa,-adhln
And when looking at the generated assembly, the calculation is being executed at runtime.
If I change the two lines that are commented for the lines immediately below, I find that the code is indeed being executed at compile time and just the value of the volatile variable is placed in the assembly.
I tried to generate a smaller example that reproduces the behavior but small changes in the code indeed calculate at compile time.
I somehow understand why providing constexpr lambdas helps, but I would like to understand why providing a double would not work in this case. Ideally I wouldn't like to provide lambdas because it makes my frontend messier.
This code is part of a very large code base, so please disregard what the code is actually calculating, I created this example to show the behavior and nothing more.
What would be the right way to provide a double to the functor and store it as a const member variable without changing the compile-time behavior?
Why do small modifications in the compute() function (for instance, other small changes do so as well) do indeed produce compile time code?
I would like to understand what are the actual conditions for GCC to provide these compile-time calculations, as the actual application I am working in requires it.
Thanks!
Non sure to understand when your code is executed run-time and when is executed compile-time, anyway the rule of the C++ language (not only g++ and ignoring the as-if rule) is that a constexpr function
can be executed run-time and must be executed run-time when compute values know run-time (by example: values coming from standard input)
can be executed compile-time and must be executed compile-time when the result goes where a compile-time know value is strictly required (by example: initialization of constexpr variable, not-type template arguments, C-style arrays dimensions, static_assert() tests)
there is a grey area -- when the compiler know the value involved in computation compile time but the computed value doesn't goes where a compile-time value is strictly required -- where the compiler can choose if compute compile-time or run-time.
If you're interested in
const volatile auto y = A(x);
it seems to me we are in the grey area and the compiler can choose if compute the initial value for y compile time or run-time.
If you want a y initialized compile-time, I suppose you can obtain this defining it (and also preceding variables) constexpr
constexpr auto nel = 4u;
constexpr auto p = 2u;
constexpr std::array<double,nel*(p+1)> x{} ;
constexpr auto L = 1.;
// constexpr auto L = [](){return 1.;};
constexpr auto A = Functor<nel,p,decltype(L)>(L);
constexpr volatile auto y = A(x);
for (auto i = 0u; i < p + 1; ++i)
CM0[i][0] += l;
when l is a stateless lambda type, this converts l to a function type, then to bool (an integral type). This two-step conversion is allowed because only one is "user defined".
This conversion always produces 1, and does not depend on the state of l.
I am a C++ newbie.
Context: I found this third-party snippet of code that seems to work, but based on my (very limited) knowledge of C++ I suspect it will cause problems. The snippet is as follows:
int aVariable;
int anInt = 1;
int anotherInt = 2;
int lastInt = 3;
aVariable = CHAIN(anInt, anotherInt, lastInt);
Where CHAIN is defined as follows (this is part of a library):
int CHAIN(){ Map(&CHAIN, MakeProcInstance(&_CHAIN), MAP_IPTR_VPN); }
int _CHAIN(int i, int np, int p){ return ASMAlloc(np, p, &chainproc); }
int keyalloc[16384], kpos, alloc_locked, tmp[4];
int ASMAlloc(int np, int p, alias proc)
{
int v, x;
// if(alloc_locked) return 0 & printf("WARNING: you can declare compound key statements (SEQ, CHAIN, EXEC, TEMPO, AXIS) only inside main() call, and not during an event.\xa");
v = elements(&keyalloc) - kpos - 4;
if(v < np | !np) return 0; // not enough allocation space or no parameters
Map(&v, p); Dim(&v, np); // v = params array
keyalloc[kpos] = np + 4; // size
keyalloc[kpos+1] = &proc; // function
keyalloc[kpos+2] = kpos + 2 + np; // parameters index
while(x < np)
{
keyalloc[kpos+3+x] = v[x];
x = x+1;
}
keyalloc[kpos+3+np] = kpos + 3 | JUMP;
x = ASMFind(kpos);
if(x == kpos) kpos = kpos + np + 4;
return x + 1 | PROC; // skip block size
}
int ASMFind(int x)
{
int i, j, k; while(i < x)
{
k = i + keyalloc[i]; // next
if(keyalloc[i] == keyalloc[x]) // size
if(keyalloc[i+1] == keyalloc[x+1]) // proc
{
j = x-i;
i = i+3;
while(keyalloc[i] == keyalloc[j+i]) i = i+1; // param
if((keyalloc[i] & 0xffff0000) == JUMP) return x-j;
}
i = k;
}
return x;
}
EDIT:
The weird thing is that running
CHAIN(aVariable);
effectively executes
CHAIN(anInt, anotherInt, lastInt);
Somehow. This is what led me to believe that aVariable is, in fact, a pointer.
QUESTION:
Is it correct to store a parametrized function call into an integer variable like so? Does "aVariable" work just as a pointer, or is this likely to corrupt random memory areas?
You're calling a function (through an obfuscated interface), and storing the result in an integer. It might or might not cause problems, depending on how you use the value / what you expect it to mean.
Your example contains too many undefined symbols for the reader to provide any better answer.
Also, I think this is C, not C++ code.
I have a std::vector<PLY> that holds a number of structs:
struct PLY {
int x;
int y;
int greyscale;
}
Some of the PLY's could be duplicates in terms of their position x and y but not necessarily in terms of their greyscale value. What is the best way to find those (position-) duplicates and replace them with a single PLY instace which has a greyscale value that represents the average greyscale of all duplicates?
E.g: PLY a{1,1,188} is a duplicate of PLY b{1,1,255}. Same (x,y) position possibly different greyscale.
Based on your description of Ply you need these operators:
auto operator==(const Ply& a, const Ply& b)
{
return a.x == b.x && a.y == b.y;
}
auto operator<(const Ply& a, const Ply& b)
{
// whenever you can be lazy!
return std::make_pair(a.x, a.y) < std::make_pair(b.x, b.y);
}
Very important: if the definition "Two Ply are identical if their x and y are identical" is not general valid, then defining comparator operators that ignore greyscale is a bad ideea. In that case you should define separate function objects or non-operator functions and pass them around to function.
There is a nice rule of thumb that a function should not have more than a loop. So instead of a nested 2 for loops, we define this helper function which computes the average of consecutive duplicates and also returns the end of the consecutive duplicates range:
// prereq: [begin, end) has at least one element
// i.e. begin != end
template <class It>
auto compute_average_duplicates(It begin, It end) -> std::pair<int, It>
// (sadly not C++17) concepts:
//requires requires(It i) { {*i} -> Ply; }
{
auto it = begin + 1;
int sum = begin->greyscale;
for (; it != end && *begin == *it; ++it) {
sum += it->greyscale;
}
// you might need rounding instead of truncation:
return std::make_pair(sum / std::distance(begin, it), it);
}
With this we can have our algorithm:
auto foo()
{
std::vector<Ply> v = {{1, 5, 10}, {2, 4, 6}, {1, 5, 2}};
std::sort(std::begin(v), std::end(v));
for (auto i = std::begin(v); i != std::end(v); ++i) {
decltype(i) j;
int average;
std::tie(average, j) = compute_average_duplicates(i, std::end(v));
// C++17 (coming soon in a compiler near you):
// auto [average, j] = compute_average_duplicates(i, std::end(v));
if (i + 1 == j)
continue;
i->greyscale = average;
v.erase(i + 1, j);
// std::vector::erase Invalidates iterators and references
// at or after the point of the erase
// which means i remains valid, and `++i` (from the for) is correct
}
}
You can apply lexicographical sorting first. During sorting you should take care of overflowing greyscale. With current approach you will have some roundoff error, but it will be small as i first sum and only then average.
In the second part you need to remove duplicates from the array. I used additional array of indices to copy every element not more than once. If you have some forbidden value for x, y or greyscale you can use it and thus get along without additional array.
struct PLY {
int x;
int y;
int greyscale;
};
int main()
{
struct comp
{
bool operator()(const PLY &a, const PLY &b) { return a.x != b.x ? a.x < b.x : a.y < b.y; }
};
vector<PLY> v{ {1,1,1}, {1,2,2}, {1,1,2}, {1,3,5}, {1,2,7} };
sort(begin(v), end(v), comp());
vector<bool> ind(v.size(), true);
int s = 0;
for (int i = 1; i < v.size(); ++i)
{
if (v[i].x == v[i - 1].x &&v[i].y == v[i - 1].y)
{
v[s].greyscale += v[i].greyscale;
ind[i] = false;
}
else
{
int d = i - s;
if (d != 1)
{
v[s].greyscale /= d;
}
s = i;
}
}
s = 0;
for (int i = 0; i < v.size(); ++i)
{
if (ind[i])
{
if (s != i)
{
v[s] = v[i];
}
++s;
}
}
v.resize(s);
}
So you need to check, is PLY a1 { 1,1,1 }; duplicates PLY a2 {2,2,1};
So simple method is to override operator == to check a1.x == a2.x and a1.y == a2.y. After you can write own function removeDuplicates(std::vector<PLU>& mPLY); which will use iterators of this vector, compare and remove. But better to use std::list if you want to remove from middle of array too frequently.
I am using a simple function (y(x)), and I want to generate an x value from a certain y value. While typically reverse mapping does not give a single x value, I am using the maximum from my y values. This means that there will be a unique x value for the y value I input(the maximum). I don't understand how to code this in c++
If you don't need interpolation, only exact reverse lookup, then it's relatively straighforward:
std::map<YType, XType> lookup;
// (code to read the file goes here)
// for each x {
YType y = f(x);
if ((lookup.count(y) == 0) || (lookup[y] < x)) {
lookup[y] = x;
}
// }
Then your reverse lookup is just lookup[y], which will return 0 (or a default-constructed value where applicable) if y in fact was missing from the data.
Be aware that my code is a bit inefficient, it looks up y several times in the map, up to 3. You can optimize using iterators, but I'm concerned that obscures what's going on if you're not already familiar with them:
typedef std::map<YType, XType> maptype;
typedef std::pair<maptype::iterator, bool> resulttype;
resulttype result = lookup.insert(std::make_pair(y, x));
if (!result.second) {
// key already existed, so value was not inserted. Check for max.
maptype::iterator pos = result.first;
if ((*pos).second < x) {
(*pos).second = x;
}
}
If I understand correctly, you are given a finite range of values x, say x[0], x[1], ..., x[N], and a function f, and you want to find the index k for which f(x[k]) is the largest possible. In that case, a simple search will do:
size_t k = 0;
T m = f(x[k]);
T tmp;
for (size_t i = 1; i <= N; ++i)
{
if ((tmp = f(x[i])) > m)
{
k = i;
m = tmp;
}
}
// Maximum is (x[k], m)
Here T is the type such that f is T f(T);