Related
The task (from a Bulgarian judge, click on "Език" to change it to English):
I am given the size of the first (S1 = A) of N corals. The size of every subsequent coral (Si, where i > 1) is calculated using the formula (B*Si-1 + C)%D, where A, B, C and D are some constants. I am told that Nemo is nearby the Kth coral (when the sizes of all corals are sorted in ascending order).
What is the size of the above-mentioned Kth coral ?
I will have T tests and for every one of them I will be given N, K, A, B, C and D and prompted to output the size of the Kth coral.
The requirements:
1 ≤ T ≤ 3
1 ≤ K ≤ N ≤ 107
0 ≤ A < D ≤ 1018
1 ≤ C, B*D ≤ 1018
Memory available is 64 MB
Time limit is 1.9 sec
The problem I have:
For the worst case scenario I will need 107*8B which is 76 MB.
The solution If the memory available was at least 80 MB would be:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
std::vector<biggie>::iterator it_ans;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
std::vector<biggie> lut{ a };
lut.reserve(n);
for (j = 1; j != n; ++j) {
lut.emplace_back((b * lut.back() + c) % d);
}
it_ans = std::next(lut.begin(), k - 1);
std::nth_element(lut.begin(), it_ans, lut.end());
std::cout << *it_ans << '\n';
}
return 0;
}
Question 1: How can I approach this CP task given the requirements listed above ?
Question 2: Is it somehow possible to use std::nth_element to solve it since I am not able to store all N elements ? I mean using std::nth_element in a sliding window technique (If this is possible).
# Christian Sloper
#include <iostream>
#include <queue>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j, j_lim;
biggie a, b, c, d, prev, curr;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
if (k < n - k + 1) {
std::priority_queue<biggie, std::vector<biggie>, std::less<biggie>> q;
q.push(a);
prev = a;
for (j = 1; j != k; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr < q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
else {
std::priority_queue<biggie, std::vector<biggie>, std::greater<biggie>> q;
q.push(a);
prev = a;
for (j = 1, j_lim = n - k + 1; j != j_lim; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr > q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
}
return 0;
}
This gets accepted (Succeeds all 40 tests. Largest time 1.4 seconds, for a test with T=3 and D≤10^9. Largest time for a test with larger D (and thus T=1) is 0.7 seconds.).
#include <iostream>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
biggie prefix = 0;
for (int shift = d > 1000000000 ? 40 : 20; shift >= 0; shift -= 20) {
biggie prefix_mask = ((biggie(1) << (40 - shift)) - 1) << (shift + 20);
int count[1 << 20] = {0};
biggie s = a;
int rank = 0;
for (j = 0; j != n; ++j) {
biggie s_vs_prefix = s & prefix_mask;
if (s_vs_prefix < prefix)
++rank;
else if (s_vs_prefix == prefix)
++count[(s >> shift) & ((1 << 20) - 1)];
s = (b * s + c) % d;
}
int i = -1;
while (rank < k)
rank += count[++i];
prefix |= biggie(i) << shift;
}
std::cout << prefix << '\n';
}
return 0;
}
The result is a 60 bits number. I first determine the high 20 bits with one pass through the numbers, then the middle 20 bits in another pass, then the low 20 bits in another.
For the high 20 bits, generate all the numbers and count how often each high 20 bits pattern occurrs. After that, add up the counts until you reach K. The pattern where you reach K, that pattern covers the K-th largest number. In other words, that's the result's high 20 bits.
The middle and low 20 bits are computed similarly, except we take the by then known prefix (the high 20 bits or high+middle 40 bits) into account. As a little optimization, when D is small, I skip computing the high 20 bits. That got me from 2.1 seconds down to 1.4 seconds.
This solution is like user3386109 described, except with bucket size 2^20 instead of 10^6 so I can use bit operations instead of divisions and think of bit patterns instead of ranges.
For the memory constraint you hit:
(B*Si-1 + C)%D
requires only the value (Si-2) before itself. So you can compute them in pairs, to use only 1/2 of total you need. This only needs indexing even values and iterating once for odd values. So you can just use half-length LUT and compute the odd value in-flight. Modern CPUs are fast enough to do extra calculations like these.
std::vector<biggie> lut{ a_i,a_i_2,a_i_4,... };
a_i_3=computeOddFromEven(lut[1]);
You can make a longer stride like 4,8 too. If dataset is large, RAM latency is big. So it's like having checkpoints in whole data search space to balance between memory and core usage. 1000-distance checkpoints would put a lot of cpu cycles into re-calculations but then the array would fit CPU's L2/L1 cache which is not bad. When sorting, the maximum re-calc iteration per element would be n=1000 now. O(1000 x size) maybe it's a big constant but maybe somehow optimizable by compiler if some constants really const?
If CPU performance becomes problem again:
write a compiling function that writes your source code with all the "constant" given by user to a string
compile the code using command-line (assuming target computer has some accessible from command line like g++ from main program)
run it and get results
Compiler should enable more speed/memory optimizations when those are really constant in compile-time rather than depending on std::cin.
If you really need to add a hard-limit to the RAM usage, then implement a simple cache with the backing-store as your heavy computations with brute-force O(N^2) (or O(L x N) with checkpoints every L elements as in first method where L=2 or 4, or ...).
Here's a sample direct-mapped cache with 8M long-long value space:
int main()
{
std::vector<long long> checkpoints = {
a_0, a_16, a_32,...
};
auto cacheReadMissFunction = [&](int key){
// your pure computational algorithm here, helper meant to show variable
long long result = checkpoints[key/16];
for(key - key%16 times)
result = iterate(result);
return result;
};
auto cacheWriteMissFunction = [&](int key, long long value){
/* not useful for your algorithm as it doesn't change behavior per element */
// backing_store[key] = value;
};
// due to special optimizations, size has to be 2^k
int cacheSize = 1024*1024*8;
DirectMappedCache<int, long long> cache(cacheSize,cacheReadMissFunction,cacheWriteMissFunction);
std::cout << cache.get(20)<<std::endl;
return 0;
}
If you use a cache-friendly sorting-algorithm, a direct cache access would make a lot of re-use for nearly all the elements in comparisons if you fill the output buffer/terminal with elements one by one by following something like a bitonic-sort-path (that is known in compile-time). If that doesn't work, then you can try accessing files as a "backing-store" of cache for sorting whole array at once. Is file system prohibited for use? Then the online-compiling method above won't work either.
Implementation of a direct mapped cache (don't forget to call flush() after your algorithm finishes, if you use any cache.set() method):
#ifndef DIRECTMAPPEDCACHE_H_
#define DIRECTMAPPEDCACHE_H_
#include<vector>
#include<functional>
#include<mutex>
#include<iostream>
/* Direct-mapped cache implementation
* Only usable for integer type keys in range [0,maxPositive-1]
*
* CacheKey: type of key (only integers: int, char, size_t)
* CacheValue: type of value that is bound to key (same as above)
*/
template< typename CacheKey, typename CacheValue>
class DirectMappedCache
{
public:
// allocates buffers for numElements number of cache slots/lanes
// readMiss: cache-miss for read operations. User needs to give this function
// to let the cache automatically get data from backing-store
// example: [&](MyClass key){ return redis.get(key); }
// takes a CacheKey as key, returns CacheValue as value
// writeMiss: cache-miss for write operations. User needs to give this function
// to let the cache automatically set data to backing-store
// example: [&](MyClass key, MyAnotherClass value){ redis.set(key,value); }
// takes a CacheKey as key and CacheValue as value
// numElements: has to be integer-power of 2 (e.g. 2,4,8,16,...)
DirectMappedCache(CacheKey numElements,
const std::function<CacheValue(CacheKey)> & readMiss,
const std::function<void(CacheKey,CacheValue)> & writeMiss):size(numElements),sizeM1(numElements-1),loadData(readMiss),saveData(writeMiss)
{
// initialize buffers
for(size_t i=0;i<numElements;i++)
{
valueBuffer.push_back(CacheValue());
isEditedBuffer.push_back(0);
keyBuffer.push_back(CacheKey()-1);// mapping of 0+ allowed
}
}
// get element from cache
// if cache doesn't find it in buffers,
// then cache gets data from backing-store
// then returns the result to user
// then cache is available from RAM on next get/set access with same key
inline
const CacheValue get(const CacheKey & key) noexcept
{
return accessDirect(key,nullptr);
}
// only syntactic difference
inline
const std::vector<CacheValue> getMultiple(const std::vector<CacheKey> & key) noexcept
{
const int n = key.size();
std::vector<CacheValue> result(n);
for(int i=0;i<n;i++)
{
result[i]=accessDirect(key[i],nullptr);
}
return result;
}
// thread-safe but slower version of get()
inline
const CacheValue getThreadSafe(const CacheKey & key) noexcept
{
std::lock_guard<std::mutex> lg(mut);
return accessDirect(key,nullptr);
}
// set element to cache
// if cache doesn't find it in buffers,
// then cache sets data on just cache
// writing to backing-store only happens when
// another access evicts the cache slot containing this key/value
// or when cache is flushed by flush() method
// then returns the given value back
// then cache is available from RAM on next get/set access with same key
inline
void set(const CacheKey & key, const CacheValue & val) noexcept
{
accessDirect(key,&val,1);
}
// thread-safe but slower version of set()
inline
void setThreadSafe(const CacheKey & key, const CacheValue & val) noexcept
{
std::lock_guard<std::mutex> lg(mut);
accessDirect(key,&val,1);
}
// use this before closing the backing-store to store the latest bits of data
void flush()
{
try
{
std::lock_guard<std::mutex> lg(mut);
for (size_t i=0;i<size;i++)
{
if (isEditedBuffer[i] == 1)
{
isEditedBuffer[i]=0;
auto oldKey = keyBuffer[i];
auto oldValue = valueBuffer[i];
saveData(oldKey,oldValue);
}
}
}catch(std::exception &ex){ std::cout<<ex.what()<<std::endl; }
}
// direct mapped access
// opType=0: get
// opType=1: set
CacheValue const accessDirect(const CacheKey & key,const CacheValue * value, const bool opType = 0)
{
// find tag mapped to the key
CacheKey tag = key & sizeM1;
// compare keys
if(keyBuffer[tag] == key)
{
// cache-hit
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
valueBuffer[tag]=*value;
}
// cache hit value
return valueBuffer[tag];
}
else // cache-miss
{
CacheValue oldValue = valueBuffer[tag];
CacheKey oldKey = keyBuffer[tag];
// eviction algorithm start
if(isEditedBuffer[tag] == 1)
{
// if it is "get"
if(opType==0)
{
isEditedBuffer[tag]=0;
}
saveData(oldKey,oldValue);
// "get"
if(opType==0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else /* "set" */
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
else // not edited
{
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
}
// "get"
if(opType == 0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else // "set"
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
}
}
private:
const CacheKey size;
const CacheKey sizeM1;
std::mutex mut;
std::vector<CacheValue> valueBuffer;
std::vector<unsigned char> isEditedBuffer;
std::vector<CacheKey> keyBuffer;
const std::function<CacheValue(CacheKey)> loadData;
const std::function<void(CacheKey,CacheValue)> saveData;
};
#endif /* DIRECTMAPPEDCACHE_H_ */
You can solve this problem using a Max-heap.
Insert the first k elements into the max-heap. The largest element of these k will now be at the root.
For each remaining element e:
Compare e to the root.
If e is larger than the root, discard it.
If e is smaller than the root, remove the root and insert e into the heap structure.
After all elements have been processed, the k-th smallest element is at the root.
This method uses O(K) space and O(n log n) time.
There’s an algorithm that people often call LazySelect that I think would be perfect here.
With high probability, we make two passes. In the first pass, we save a random sample of size n much less than N. The answer will be around index (K/N)n in the sorted sample, but due to the randomness, we have to be careful. Save the values a and b at (K/N)n ± r instead, where r is the radius of the window. In the second pass, we save all of the values in [a, b], count the number of values less than a (let it be L), and select the value with index K−L if it’s in the window (otherwise, try again).
The theoretical advice on choosing n and r is fine, but I would be pragmatic here. Choose n so that you use most of the available memory; the bigger the sample, the more informative it is. Choose r fairly large as well, but not quite as aggressively due to the randomness.
C++ code below. On the online judge, it’s faster than Kelly’s (max 1.3 seconds on the T=3 tests, 0.5 on the T=1 tests).
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <limits>
#include <optional>
#include <random>
#include <vector>
namespace {
class LazySelector {
public:
static constexpr std::int32_t kTargetSampleSize = 1000;
explicit LazySelector() { sample_.reserve(1000000); }
void BeginFirstPass(const std::int32_t n, const std::int32_t k) {
sample_.clear();
mask_ = n / kTargetSampleSize;
mask_ |= mask_ >> 1;
mask_ |= mask_ >> 2;
mask_ |= mask_ >> 4;
mask_ |= mask_ >> 8;
mask_ |= mask_ >> 16;
}
void FirstPass(const std::int64_t value) {
if ((gen_() & mask_) == 0) {
sample_.push_back(value);
}
}
void BeginSecondPass(const std::int32_t n, const std::int32_t k) {
sample_.push_back(std::numeric_limits<std::int64_t>::min());
sample_.push_back(std::numeric_limits<std::int64_t>::max());
const double p = static_cast<double>(sample_.size()) / n;
const double radius = 2 * std::sqrt(sample_.size());
const auto lower =
sample_.begin() + std::clamp<std::int32_t>(std::floor(p * k - radius),
0, sample_.size() - 1);
const auto upper =
sample_.begin() + std::clamp<std::int32_t>(std::ceil(p * k + radius), 0,
sample_.size() - 1);
std::nth_element(sample_.begin(), upper, sample_.end());
std::nth_element(sample_.begin(), lower, upper);
lower_ = *lower;
upper_ = *upper;
sample_.clear();
less_than_lower_ = 0;
equal_to_lower_ = 0;
equal_to_upper_ = 0;
}
void SecondPass(const std::int64_t value) {
if (value < lower_) {
++less_than_lower_;
} else if (upper_ < value) {
} else if (value == lower_) {
++equal_to_lower_;
} else if (value == upper_) {
++equal_to_upper_;
} else {
sample_.push_back(value);
}
}
std::optional<std::int64_t> Select(std::int32_t k) {
if (k < less_than_lower_) {
return std::nullopt;
}
k -= less_than_lower_;
if (k < equal_to_lower_) {
return lower_;
}
k -= equal_to_lower_;
if (k < sample_.size()) {
const auto kth = sample_.begin() + k;
std::nth_element(sample_.begin(), kth, sample_.end());
return *kth;
}
k -= sample_.size();
if (k < equal_to_upper_) {
return upper_;
}
return std::nullopt;
}
private:
std::default_random_engine gen_;
std::vector<std::int64_t> sample_ = {};
std::int32_t mask_ = 0;
std::int64_t lower_ = std::numeric_limits<std::int64_t>::min();
std::int64_t upper_ = std::numeric_limits<std::int64_t>::max();
std::int32_t less_than_lower_ = 0;
std::int32_t equal_to_lower_ = 0;
std::int32_t equal_to_upper_ = 0;
};
} // namespace
int main() {
int t;
std::cin >> t;
for (int i = t; i > 0; --i) {
std::int32_t n;
std::int32_t k;
std::int64_t a;
std::int64_t b;
std::int64_t c;
std::int64_t d;
std::cin >> n >> k >> a >> b >> c >> d;
std::optional<std::int64_t> ans = std::nullopt;
LazySelector selector;
do {
{
selector.BeginFirstPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.FirstPass(s);
s = (b * s + c) % d;
}
}
{
selector.BeginSecondPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.SecondPass(s);
s = (b * s + c) % d;
}
}
ans = selector.Select(k - 1);
} while (!ans);
std::cout << *ans << '\n';
}
}
I am trying to do a simple image processing filter where the pixel values will be divided by half to reduce the intensity and I am trying to develop the hardware for the same. hence I am using vivado hls to generate the IP. As explained here https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/Float-numbers-with-hls-stream/m-p/942747 to send floating numbers in a hls stream , an union needs to be used and I did the same. However, the results don't seem to be matching for the red and green components of the image whereas it is matching for the blue component of the image. It is a very simple algorithm where a pixel value will be divided by half.
I have been trying to resolve it but I am not able to see where the problem is. I have attached all the files below, can someone can help me resolve it??
////header file
#include "ap_fixed.h"
#include "hls_stream.h"
typedef union {
unsigned int i;
float r;
float g;
float b;
} conv;
typedef hls::stream <unsigned int> Stream_t;
void ftest(Stream_t& Sin,Stream_t& Sout);
////testbench
#include "stream_check_h.hpp"
int main()
{
Mat img_rev = imread("C:/Users/20181217/Desktop/images/imgs/output_fwd_v3.png");//(256x512)
Mat final_img(img_rev.rows,img_rev.cols,CV_8UC3);
Mat ref_img(img_rev.rows,img_rev.cols,CV_8UC3);
Stream_t S1,S2;
int err_r = 0;
int err_g = 0;
int err_b = 0;
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
conv c;
c.r = (float)img_rev.at<Vec3b>(i,j)[0];
c.g = (float)img_rev.at<Vec3b>(i,j)[1];
c.b = (float)img_rev.at<Vec3b>(i,j)[2];
S1 << c.i;
}
}
ftest(S1,S2);
conv c;
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
S2 >> c.i;
final_img.at<Vec3b>(i,j)[0]=(unsigned char)c.r;
final_img.at<Vec3b>(i,j)[1]=(unsigned char)c.g;
final_img.at<Vec3b>(i,j)[2]=(unsigned char)c.b;
ref_img.at<Vec3b>(i,j)[0] = (unsigned char)(((float)img_rev.at<Vec3b>(i,j)[0])/2.0);
ref_img.at<Vec3b>(i,j)[1] = (unsigned char)(((float)img_rev.at<Vec3b>(i,j)[1])/2.0);
ref_img.at<Vec3b>(i,j)[2] = (unsigned char)(((float)img_rev.at<Vec3b>(i,j)[2])/2.0);
}
}
Mat diff;
cout<<diff;
diff= abs(final_img-ref_img);
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
if((int)diff.at<Vec3b>(i,j)[0] > 0)
{
err_r++;
cout<<"expected value: "<<(int)ref_img.at<Vec3b>(i,j)[0]<<", final_value: "<<(int)final_img.at<Vec3b>(i,j)[0]<<", actual value:"<<(int)img_rev.at<Vec3b>(i,j)[0]<<endl;
}
if((int)diff.at<Vec3b>(i,j)[1] > 0)
err_g++;
if((int)diff.at<Vec3b>(i,j)[2] > 0)
err_b++;
}
}
cout<<"number of errors: "<<err_r<<", "<<err_g<<", "<<err_b;
return 0;
}
////core
#include "stream_check_h.hpp"
void ftest(Stream_t& Sin,Stream_t& Sout)
{
conv cin,cout;
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
Sin >> cin.i;
cout.r = cin.r/2.0 ;
cout.g = cin.g/2.0 ;
cout.b = cin.b/2.0 ;
Sout << cout.i;
}
}
}
when I debugged, it showed that the blue components of the pixels are matching. for one red pixel it showed me the following:
expected value: 22, final_value: 14, actual value:45
and the total errors for red, green, and blue are:
number of errors: 126773, 131072, 0
I am not able to see why it is going wrong for red and green. I posted here hoping a fresh set of eyes would help my problem.
Thanks in advance
I'm assuming you're using a 32bit-wide stream with 3 RGB pixels 8bit unsigned (CV_8U3). I believe the problem with the union type in your case is the overlapping of its three members (not just like the one float value in the example you cite). This means that by doing the division, you're actually doing it over the whole 32bit data you're receiving.
I possible workaround I quickly cam up with would be to cast the unsigned int you're getting from the stream into an ap_uint<32> type, then chop it in the R, G, B chunks (with the range() method) and divide. Finally, assemble back the result and stream it back.
unsigned int packet;
Sin >> packet;
ap_uint<32> packet_uint32 = *((ap_uint<32>*)&packet); // casting (not elegant, but works)
ap_int<8> b = packet_uint32.range(7, 0);
ap_int<8> g = packet_uint32.range(15, 8);
ap_int<8> r = packet_uint32.range(23, 16); // In case they are in the wrong bit range/order, just flip the r, g, b assignements
b /= 2;
g /= 2;
r /= 2;
packet_uint32.range(7, 0) = b;
packet_uint32.range(15, 8) = g;
packet_uint32.range(23, 16) = r;
packet = packet_uint32.to_int();
Sout << packet;
NOTE: I've reused the same variables in the code above: HLS shouldn't complain about it and come out with a good RTL anyway. In case it shouldn't, just create new ones.
I am trying to convert vector values that are received form a file into two different formats. After converting into first format and printing the vector out, I want to use the original "read in" values to convert them into the second format.
At the moment, it seems that the second conversion is happening on the already converted values. However, I don't understand why it doesn't convert back to the original value then? And ultimately, how can I use the original values of the vector for the second conversion so the
else {
GetType();
GetXArg();
GetYArg();
}
works for the second time?
Here is the code snippet:
void Force::convToP() //converts to polar
{
if (forceType == 'c')
{
SetType('p');
SetXArg(sqrt(xArg * xArg + yArg * yArg));
SetYArg(atan(yArg / xArg));
}
else {
GetType(); //just return type, xArg and yArg in their original form
GetXArg();
GetYArg();
}
}
void Force::convToC() //converts to cartesian
{
if (forceType == 'p') {
SetType('c');
SetXArg(xArg * cos(yArg));
SetYArg(xArg * sin(yArg));
}
else {
GetType();
GetXArg();
GetYArg();
}
}
and the main function:
while (file >> type >> x >> y) {
Force f(type, x, y);
force.push_back(f);
}
for (int i = 0; i < force.size(); ++i) {
force[i].printforce();
}
cout << "Forces in Polar form: " << endl;
for (int i = 0; i < force.size(); ++i)
{
force[i].convToP();
force[i].printforce();
}
cout << "Forces in Cartesian form: " << endl;
for (int i = 0; i < force.size(); ++i) {
force[i].convToC();
force[i].printforce();
}
finally, the output at the moment is:
p 10 0.5
c 12 14
p 25 1
p 100 0.8
c 50 50
p 20 3.14
c -100 25
p 12 1.14
Forces in Polar form: <-first conversion. All works fine
p 10 0.5
p 18.4391 0.649399
p 25 1
p 100 0.8
p 70.7107 0.61548
p 20 3.14
p 103.078 0.237941
p 12 1.14
Forces in Cartesian form:
c 8.77583 4.20736 <-works fine
c 14.6858 8.8806 <-why doesn't convert back to c 12 14/ how to use the original values of vector
c 13.5076 11.3662
c 69.6707 49.9787
c 57.735 33.3333
c -20 -0.0318509
c 100.173 23.6111
c 5.01113 4.55328
Press any key to continue . . .
Very new to this, been baffled for some time, so would really appreciate any help and advice.
You are modifying xArg and then using the modified value to convert yArg. You need to do both conversions before either modification.
void Force::convToP() //converts to polar
{
if (forceType == 'c')
{
forceType = 'p';
decltype(xArg) newX = sqrt(xArg * xArg + yArg * yArg);
decltype(yArg) newY = atan(yArg / xArg);
xArg = newX;
yArg = newY
}
// no else needed
}
void Force::convToC() //converts to cartesian
{
if (forceType == 'p') {
forceType = 'c';
decltype(xArg) newX = xArg * cos(yArg);
decltype(yArg) newY = xArg * sin(yArg);
xArg = newX;
yArg = newY
}
// no else needed
}
You can verify the correct values with std::complex
#include <complex>
#include <vector>
#include <iostream>
int main() {
std::vector<std::complex<double>> nums
{
std::polar<double>(10, 0.5),
std::complex<double>(12, 14),
std::polar<double>(25, 1),
std::polar<double>(100, 0.8),
std::complex<double>(50, 50),
std::polar<double>(20, 3.14),
std::complex<double>(-100, 25),
std::polar<double>(12, 1.14)
};
for (auto num : nums)
{
std::cout << num << " (" << std::abs(num) << ", " << std::arg(num) << ")\n";
}
}
(8.77583,4.79426) (10, 0.5)
(12,14) (18.4391, 0.86217)
(13.5076,21.0368) (25, 1)
(69.6707,71.7356) (100, 0.8)
(50,50) (70.7107, 0.785398)
(-20,0.0318531) (20, 3.14)
(-100,25) (103.078, 2.89661)
(5.01113,10.9036) (12, 1.14)
It's often helpful to break sub-operations down into small single-concern functions.
The compiler will optimise away all the redundant copies, loads and stores:
#include <cmath>
#include <tuple>
auto computed(double xArg, double yArg)
{
return
std::make_tuple(
std::sqrt(xArg * xArg + yArg * yArg),
std::atan(yArg / xArg));
}
void modify(double& xArg, double& yArg)
{
std::tie(xArg, yArg) = computed(xArg, yArg);
}
Without knowing exactly what GetXArg() etc. do, it appears that you are modifying your original forces when you call force[i].convToP();. So after this loop all the forces in your array are converted to polar. If all you want to do is print the polar and cartesian representations without changing the originals, you should generate copies of the forces:
cout << "Forces in Polar form: " << endl;
for (int i = 0; i < force.size(); ++i)
{
Force tmpForce = force[i];
tmpForce.convToP();
tmpForce.printforce();
}
etc.
Edit: Looks like #Aconcagua beat me to it.
Your conversion function obviously changes the object it operates on. Then you need to be aware that the index operator ([]) returns a reference to the object in the vector. If you do not want to modify the original object, you'll have to make a copy of:
for (int i = 0; i < force.size(); ++i)
{
auto copy = force[i]
copy.convToP();
copy.printforce();
}
Now, you can operate on the unchanged values in the second run. Assuming you don't need the original values afterwards any more, you can just leave the second loop as is, otherwise, make copies again...
All a little easier with range based for loop:
for(auto c : force)
{
c.convToP();
c.printforce();
}
for(auto& c : force)
// ^ if you don't need original values any more, can use reference now
{
c.convToC();
c.printforce();
}
I was writing a program for finding roots for a class, and had finished and got it working perfectly. As I go to turn it in, I see the document requires the .cpp to compile using Visual Studio 2012 - so I try that out. I normally use Dev C++ - and I've come to find it allows me to compile "funky things" such as dynamically declaring arrays without using malloc or new operators.
So, after finding the error associated with how I wrongly defined my arrays - I tried to fix the problem using malloc, calloc, and new/delete and well - it kept giving me memory allocation errors. The whole 46981239487532-byte error.
Now, I tried to "return" the program to the way it used to be and now I can't even get that to work. I'm not even entirely sure how I set up the arrays to work in Dev C++ in the first place. Here the code:
#include <iostream>
#include <stdlib.h>
#include <math.h>
using namespace std;
float newton(float a, float b, float poly[],float n, float *fx, float *derfx);
float horner(float poly[], int n, float x, float *derx);
float bisection(float a, float b, float poly[], float n, float *fx);
int main(int argc, char *argv[])
{
float a, b, derr1 = 0, dummyvar = 0, fr1 = 0, fr0;
float constants[argc-3];
//float* constants = NULL;
//constants = new float[argc-3];
//constants = (float*)calloc(argc-3,sizeof(float));
//In order to get a and b from being a char to floating point, the following lines are used.
//The indexes are set relative to the size of argv array in order to allow for dynamically sized inputs. atof is a char to float converter.
a = atof(argv[argc-2]);
b = atof(argv[argc-1]);
//In order to get a easy to work with array for horners method,
//all of the values excluding the last two are put into a new floating point array
for (int i = 0; i <= argc - 3; i++){
constants[i] = atof(argv[i+1]);
}
bisection(a, b, constants, argc - 3, &fr0);
newton(a, b, constants, argc - 3, &fr1, &derr1);
cout << "f(a) = " << horner(constants,argc-3,a,&dummyvar);
cout << ", f(b) = " << horner(constants,argc-3,b,&dummyvar);
cout << ", f(Bisection Root) = " << fr0;
cout << ", f(Newton Root) = "<<fr1<<", f'(Newton Root) = "<<derr1<<endl;
return 0;
}
// Poly[] is the polynomial constants, n is the number of degrees of the polynomial (the size of poly[]), x is the value of the function we want the solution for.
float horner(float poly[], int n, float x, float *derx)
{
float fx[2] = {0, 0};
fx[0] = poly[0]; // Initialize fx to the largest degree constant.
float derconstant[n];
//float* derconstant = NULL;
//derconstant = new float[n];
//derconstant = (float*)calloc(n,sizeof(float));
derconstant[0] = poly[0];
// Each term is multiplied by the last by X, then you add the next poly constant. The end result is the function at X.
for (int i = 1; i < n; i++){
fx[0] = fx[0]*x + poly[i];
// Each itteration has the constant saved to form the derivative function, which is evaluated in the next for loop.
derconstant[i]=fx[0];
}
// The same method is used to calculate the derivative at X, only using n-1 instead of n.
fx[1] = derconstant[0]; // Initialize fx[1] to the largest derivative degree constant.
for (int i = 1; i < n - 1; i++){
fx[1] = fx[1]*x + derconstant[i];
}
*derx = fx[1];
return fx[0];
}
float bisection(float a, float b, float poly[], float n, float *fx)
{
float r0 =0, count0 = 0;
float c = (a + b)/2; // c is the midpoint from a to b
float fc, fa, fb;
int rootfound = 0;
float *derx;
derx = 0; // Needs to be defined so that my method for horner's method will work for bisection.
fa = horner(poly, n, a, derx); // The following three lines use horner's method to get fa,fb, and fc.
fb = horner(poly, n, b, derx);
fc = horner(poly, n, c, derx);
while ((count0 <= 100000) || (rootfound == 0)) { // The algorithm has a limit of 1000 itterations to solve the root.
if (count0 <= 100000) {
count0++;
if ((c == r0) && (fabs(fc) <= 0.0001)) {
rootfound=1;
cout << "Bisection Root: " << r0 << endl;
cout << "Iterations: " << count0+1 << endl;
*fx = fc;
break;
}
else
{
if (((fc > 0) && (fb > 0)) || ((fc < 0) && (fb < 0))) { // Checks if fb and fc are the same sign.
b = c; // If fc and fb have the same sign, thenb "moves" to c.
r0 = c; // Sets the current root approximation to the last c value.
c = (a + b)/2; // c is recalculated.
}
else
{
a=c; // Shift a to c for next itteration.
r0=c; // Sets the current root approximation to the last c value.
c=(a+b)/2; // Calculate next c for next itteration.
}
fa = horner(poly, n, a, derx); // The following three send the new a,b,and c values to horner's method for recalculation.
fb = horner(poly, n, b, derx);
fc = horner(poly, n, c, derx);
}
}
else
{
cout << "Bisection Method could not find root within 100000 itterations" << endl;
break;
}
}
return 0;
}
float newton(float a, float b, float poly[],float n, float *fx, float *derfx){
float x0, x1;
int rootfound1 = 1, count1 = 0;
x0 = (a + b)/2;
x1 = x0;
float fx0, derfx0;
fx0 = horner(poly, n, x0, &derfx0);
while ((count1 <= 100000) || (rootfound1 == 0)) {
count1++;
if (count1 <= 100000) {
if ((fabs(fx0) <= 0.0001)) {
rootfound1 = 1;
cout << "Newtons Root: " << x1 << endl;
cout << "Iterations: " << count1 << endl;
break;
}
else
{
x1 = x0 - (fx0/derfx0);
x0 = x1;
fx0 = horner(poly, n, x0, &derfx0);
*derfx = derfx0;
*fx = fx0;
}
}
else
{
cout << "Newtons Method could not find a root within 100000 itterations" << endl;
break;
}
}
return 0;
}
So I've spent the past several hours trying to sort this out, and ultimately, I've given in to asking.Everywhere I look has just said to define as
float* constants = NULL;
constants = new float[size];
but this keeps crashing my programs - presumably from allocating too much memory somehow. I've commented out the things I've tried in various ways and combonations. If you want a more tl;dr to the "trouble spots", they are at the very beginning of the main and horner functions.
Here is one issue, in main you allocate space for argc-3 floats (in various ways) for constants but your code in the loop writes past the end of the array.
Change:
for( int i = 0; i<=argc-3; i++){
to
for( int i = 0; i<argc-3; i++){
That alone could be enough to cause your allocation errors.
Edit: Also note if you allocate space for something using new, you need to delete it with delete, otherwise you will continue to use up memory and possibly run out (especially if you do it in a loop of 100,000).
Edit 2: As Galik mentions below, because you are using derconstant = new float[n] to allocate the memory, you need to use delete [] derconstant to free the memory. This is important when you start allocating space for class objects as the delete [] form will call the destructor of each element in the array.
I am building a class in C++ which can be used to store arbitrarily large integers. I am storing them as binary in a vector. I need to be able to print this vector in base 10 so it is easier for a human to understand. I know that I could convert it to an int and then output that int. However, my numbers will be much larger than any primitive types. How can I convert this directly to a string.
Here is my code so far. I am new to C++ so if you have any other suggestions that would be great too. I need help filling in the string toBaseTenString() function.
class BinaryInt
{
private:
bool lastDataUser = true;
vector<bool> * data;
BinaryInt(vector<bool> * pointer)
{
data = pointer;
}
public:
BinaryInt(int n)
{
data = new vector<bool>();
while(n > 0)
{
data->push_back(n % 2);
n = n >> 1;
}
}
BinaryInt(const BinaryInt & from)
{
from.lastDataUser = false;
this->data = from.data;
}
~BinaryInt()
{
if(lastDataUser)
delete data;
}
string toBinaryString();
string toBaseTenString();
static BinaryInt add(BinaryInt a, BinaryInt b);
static BinaryInt mult(BinaryInt a, BinaryInt b);
};
BinaryInt BinaryInt::add(BinaryInt a, BinaryInt b)
{
int aSize = a.data->size();
int bSize = b.data->size();
int newDataSize = max(aSize, bSize);
vector<bool> * newData = new vector<bool>(newDataSize);
bool carry = 0;
for(int i = 0; i < newDataSize; i++)
{
int sum = (i < aSize ? a.data->at(i) : 0) + (i < bSize ? b.data->at(i) : 0) + carry;
(*newData)[i] = sum % 2;
carry = sum >> 1;
}
if(carry)
newData->push_back(carry);
return BinaryInt(newData);
}
string BinaryInt::toBinaryString()
{
stringstream ss;
for(int i = data->size() - 1; i >= 0; i--)
{
ss << (*data)[i];
}
return ss.str();
}
string BinaryInt::toBaseTenString()
{
//Not sure how to do this
}
I know you said in your OP that "my numbers will be much larger than any primitive types", but just hear me out on this.
In the past, I've used std::bitset to work with binary representations of numbers and converting back and forth from various other representations. std::bitset is basically a fancy std::vector with some added functionality. You can read more about it here if it sounds interesting, but here's some small stupid example code to show you how it could work:
std::bitset<8> myByte;
myByte |= 1; // mByte = 00000001
myByte <<= 4; // mByte = 00010000
myByte |= 1; // mByte = 00010001
std::cout << myByte.to_string() << '\n'; // Outputs '00010001'
std::cout << myByte.to_ullong() << '\n'; // Outputs '17'
You can access the bitset by standard array notation as well. By the way, that second conversion I showed (to_ullong) converts to an unsigned long long, which I believe has a max value of 18,446,744,073,709,551,615. If you need larger values than that, good luck!
Just iterate (backwards) your vector<bool> and accumulate the corresponding value when the iterator is true:
int base10(const std::vector<bool> &value)
{
int result = 0;
int bit = 1;
for (vb::const_reverse_iterator b = value.rbegin(), e = value.rend(); b != e; ++b, bit <<= 1)
result += (*b ? bit : 0);
return result;
}
Live demo.
Beware! this code is only a guide, you will need to take care of int overflowing if the value is pretty big.
Hope it helps.