While taking input output in C++ I have only used scanf/printf and cin/cout. Now I recently came across this code taking I/O in a strange fashion.
Also note that this I/O method is causing the code to run extremely fast, as this code uses almost the same algorithm as most of the other codes but it executes in a much smaller time. Why is this I/O so fast and how does this work in general?
edit: code
#include <bits/stdtr1c++.h>
#define MAXN 200010
#define MAXQ 200010
#define MAXV 1000010
#define clr(ar) memset(ar, 0, sizeof(ar))
#define read() freopen("lol.txt", "r", stdin)
using namespace std;
const int block_size = 633;
long long res, out[MAXQ]; int n, q, ar[MAXN], val[MAXN], freq[MAXV];
namespace fastio{
int ptr, ye;
char temp[25], str[8333667], out[8333669];
void init(){
ptr = 0, ye = 0;
fread(str, 1, 8333667, stdin);
}
inline int number(){
int i, j, val = 0;
while (str[ptr] < 45 || str[ptr] > 57) ptr++;
while (str[ptr] > 47 && str[ptr] < 58) val = (val * 10) + (str[ptr++] - 48);
return val;
}
inline void convert(long long x){
int i, d = 0;
for (; ;){
temp[++d] = (x % 10) + 48;
x /= 10;
if (!x) break;
}
for (i = d; i; i--) out[ye++] = temp[i];
out[ye++] = 10;
}
inline void print(){
fwrite(out, 1, ye, stdout);
} }
struct query{
int l, r, d, i;
inline query() {}
inline query(int a, int b, int c){
i = c;
l = a, r = b, d = l / block_size;
}
inline bool operator < (const query& other) const{
if (d != other.d) return (d < other.d);
return ((d & 1) ? (r < other.r) : (r > other.r));
} } Q[MAXQ];
void compress(int n, int* in, int* out){
unordered_map <int, int> mp;
for (int i = 0; i < n; i++) out[i] = mp.emplace(in[i], mp.size()).first->second; }
inline void insert(int i){
res += (long long)val[i] * (1 + 2 * freq[ar[i]]++); }
inline void erase(int i){
res -= (long long)val[i] * (1 + 2 * --freq[ar[i]]); }
inline void run(){
sort(Q, Q + q);
int i, l, r, a = 0, b = 0;
for (res = 0, i = 0; i < q; i++){
l = Q[i].l, r = Q[i].r;
while (a > l) insert(--a);
while (b <= r) insert(b++);
while (a < l) erase(a++);
while (b > (r + 1)) erase(--b);
out[Q[i].i] = res;
}
for (i = 0; i < q; i++) fastio::convert(out[i]); }
int main(){
fastio::init();
int n, i, j, k, a, b;
n = fastio::number();
q = fastio::number();
for (i = 0; i < n; i++) val[i] = fastio::number();
compress(n, val, ar);
for (i = 0; i < q; i++){
a = fastio::number();
b = fastio::number();
Q[i] = query(a - 1, b - 1, i);
}
run();
fastio::print();
return 0; }
This solution, http://codeforces.com/contest/86/submission/22526466 (624 ms, 32 MB RAM uses) uses single fread and manual parsing of numbers from memory (so it uses more memory); many other solutions are slower and uses scanf (http://codeforces.com/contest/86/submission/27561563 1620 ms 9MB) or C++ iostream cin (http://codeforces.com/contest/86/submission/27558562 3118 ms, 15 MB). Not all difference of solutions comes from input-output and parsing (solutions methods have differences too), but some is.
fread(str, 1, 8333667, stdin);
This code uses single fread libcall to read up to 8MB, which is full file. The file may have up to 2 (n,t) + 200000 (a_i) + 2*200000 (l,r) 6/7-digit numbers with or without line breaks or separated by one (?) space, so around 8 chars max for number (6 or 7 for number, as 1000000 is allowed too, and 1 space or \n); max input file size is like 0.6 M * 8 bytes =~ 5 MB.
inline int number(){
int i, j, val = 0;
while (str[ptr] < 45 || str[ptr] > 57) ptr++;
while (str[ptr] > 47 && str[ptr] < 58) val = (val * 10) + (str[ptr++] - 48);
return val;
}
Then code uses manual code of parsing decimal int numbers. According to ascii table, http://www.asciitable.com/ decimal codes of 48...57 are decimal digits (second while loop): '0'...'9', and we can just subtract 48 from the letter code to get the digit; multiply partially read val by 10 and add current digit. And chr<45 || chr > 57 in the first while loops sound like skipping non-digits from input. This is incorrect, as this code will not parse codes 45, 46, 47 = '-', '.', '/', and no any number after these chars will be read.
n = fastio::number();
q = fastio::number();
for (i = 0; i < n; i++) val[i] = fastio::number();
for (i = 0; i < q; i++){
a = fastio::number();
b = fastio::number();
Actual reading uses this fastio::number() method; and other solutions uses calling of scanf or iostream operator << in loop:
for (int i = 0; i < N; i++) {
scanf("%d", &(arr[i]));
add(arr[i]);
}
or
for (int i = 1; i <= n; ++i)
cin >> a[i];
Both methods are more universal, but they do library call, which will read some chars from internal buffer (like 4KB) or call OS syscall for buffer refill, and every function does many checks and has error reporting: For every number of input scanf will reparse the same format string of first argument, and will do all the logic described in POSIX http://pubs.opengroup.org/onlinepubs/7908799/xsh/fscanf.html and all error-checking. C++ iostream has no format string, but it is still more universal: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/istream.tcc#L156 'operator>>(int& __n)'.
So, standard library functions have more logic inside, more calls, more branching; and they are more universal and much safer, and should be used in real-world programming. And this "sport programming" contest allow users to solve the task with standard library functions which are fast enough, if you can imagine the algorithm. Authors or task are required to write several solutions with standard i/o functions to check that timelimit of the task is correct and task can be solved. (The TopCoder system is better with i/o, you will not implement i/o, the data is already passed into your function in some language structs/collections).
Sometimes tasks in sport programming have tight limits on memory: input files several times bigger than allowed memory usage, and programmer can't read whole file into memory. For example: get 20 mln of digits of single verylong number from input file and add 1 to it, with memory limit of 2 MB; you can't read full input number from file in forward direction; it is very hard to do correct reading in chunks in backward direction; and you just need to forget standard method of addition (Columnar addition) and build FSM (Finite-state machine) with state, counting sequences of 9s.
Related
The task (from a Bulgarian judge, click on "Език" to change it to English):
I am given the size of the first (S1 = A) of N corals. The size of every subsequent coral (Si, where i > 1) is calculated using the formula (B*Si-1 + C)%D, where A, B, C and D are some constants. I am told that Nemo is nearby the Kth coral (when the sizes of all corals are sorted in ascending order).
What is the size of the above-mentioned Kth coral ?
I will have T tests and for every one of them I will be given N, K, A, B, C and D and prompted to output the size of the Kth coral.
The requirements:
1 ≤ T ≤ 3
1 ≤ K ≤ N ≤ 107
0 ≤ A < D ≤ 1018
1 ≤ C, B*D ≤ 1018
Memory available is 64 MB
Time limit is 1.9 sec
The problem I have:
For the worst case scenario I will need 107*8B which is 76 MB.
The solution If the memory available was at least 80 MB would be:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
std::vector<biggie>::iterator it_ans;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
std::vector<biggie> lut{ a };
lut.reserve(n);
for (j = 1; j != n; ++j) {
lut.emplace_back((b * lut.back() + c) % d);
}
it_ans = std::next(lut.begin(), k - 1);
std::nth_element(lut.begin(), it_ans, lut.end());
std::cout << *it_ans << '\n';
}
return 0;
}
Question 1: How can I approach this CP task given the requirements listed above ?
Question 2: Is it somehow possible to use std::nth_element to solve it since I am not able to store all N elements ? I mean using std::nth_element in a sliding window technique (If this is possible).
# Christian Sloper
#include <iostream>
#include <queue>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j, j_lim;
biggie a, b, c, d, prev, curr;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
if (k < n - k + 1) {
std::priority_queue<biggie, std::vector<biggie>, std::less<biggie>> q;
q.push(a);
prev = a;
for (j = 1; j != k; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr < q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
else {
std::priority_queue<biggie, std::vector<biggie>, std::greater<biggie>> q;
q.push(a);
prev = a;
for (j = 1, j_lim = n - k + 1; j != j_lim; ++j) {
curr = (b * prev + c) % d;
q.push(curr);
prev = curr;
}
for (; j != n; ++j) {
curr = (b * prev + c) % d;
if (curr > q.top()) {
q.pop();
q.push(curr);
}
prev = curr;
}
std::cout << q.top() << '\n';
}
}
return 0;
}
This gets accepted (Succeeds all 40 tests. Largest time 1.4 seconds, for a test with T=3 and D≤10^9. Largest time for a test with larger D (and thus T=1) is 0.7 seconds.).
#include <iostream>
using biggie = long long;
int main() {
int t;
std::cin >> t;
int i, n, k, j;
biggie a, b, c, d;
for (i = 0; i != t; ++i) {
std::cin >> n >> k >> a >> b >> c >> d;
biggie prefix = 0;
for (int shift = d > 1000000000 ? 40 : 20; shift >= 0; shift -= 20) {
biggie prefix_mask = ((biggie(1) << (40 - shift)) - 1) << (shift + 20);
int count[1 << 20] = {0};
biggie s = a;
int rank = 0;
for (j = 0; j != n; ++j) {
biggie s_vs_prefix = s & prefix_mask;
if (s_vs_prefix < prefix)
++rank;
else if (s_vs_prefix == prefix)
++count[(s >> shift) & ((1 << 20) - 1)];
s = (b * s + c) % d;
}
int i = -1;
while (rank < k)
rank += count[++i];
prefix |= biggie(i) << shift;
}
std::cout << prefix << '\n';
}
return 0;
}
The result is a 60 bits number. I first determine the high 20 bits with one pass through the numbers, then the middle 20 bits in another pass, then the low 20 bits in another.
For the high 20 bits, generate all the numbers and count how often each high 20 bits pattern occurrs. After that, add up the counts until you reach K. The pattern where you reach K, that pattern covers the K-th largest number. In other words, that's the result's high 20 bits.
The middle and low 20 bits are computed similarly, except we take the by then known prefix (the high 20 bits or high+middle 40 bits) into account. As a little optimization, when D is small, I skip computing the high 20 bits. That got me from 2.1 seconds down to 1.4 seconds.
This solution is like user3386109 described, except with bucket size 2^20 instead of 10^6 so I can use bit operations instead of divisions and think of bit patterns instead of ranges.
For the memory constraint you hit:
(B*Si-1 + C)%D
requires only the value (Si-2) before itself. So you can compute them in pairs, to use only 1/2 of total you need. This only needs indexing even values and iterating once for odd values. So you can just use half-length LUT and compute the odd value in-flight. Modern CPUs are fast enough to do extra calculations like these.
std::vector<biggie> lut{ a_i,a_i_2,a_i_4,... };
a_i_3=computeOddFromEven(lut[1]);
You can make a longer stride like 4,8 too. If dataset is large, RAM latency is big. So it's like having checkpoints in whole data search space to balance between memory and core usage. 1000-distance checkpoints would put a lot of cpu cycles into re-calculations but then the array would fit CPU's L2/L1 cache which is not bad. When sorting, the maximum re-calc iteration per element would be n=1000 now. O(1000 x size) maybe it's a big constant but maybe somehow optimizable by compiler if some constants really const?
If CPU performance becomes problem again:
write a compiling function that writes your source code with all the "constant" given by user to a string
compile the code using command-line (assuming target computer has some accessible from command line like g++ from main program)
run it and get results
Compiler should enable more speed/memory optimizations when those are really constant in compile-time rather than depending on std::cin.
If you really need to add a hard-limit to the RAM usage, then implement a simple cache with the backing-store as your heavy computations with brute-force O(N^2) (or O(L x N) with checkpoints every L elements as in first method where L=2 or 4, or ...).
Here's a sample direct-mapped cache with 8M long-long value space:
int main()
{
std::vector<long long> checkpoints = {
a_0, a_16, a_32,...
};
auto cacheReadMissFunction = [&](int key){
// your pure computational algorithm here, helper meant to show variable
long long result = checkpoints[key/16];
for(key - key%16 times)
result = iterate(result);
return result;
};
auto cacheWriteMissFunction = [&](int key, long long value){
/* not useful for your algorithm as it doesn't change behavior per element */
// backing_store[key] = value;
};
// due to special optimizations, size has to be 2^k
int cacheSize = 1024*1024*8;
DirectMappedCache<int, long long> cache(cacheSize,cacheReadMissFunction,cacheWriteMissFunction);
std::cout << cache.get(20)<<std::endl;
return 0;
}
If you use a cache-friendly sorting-algorithm, a direct cache access would make a lot of re-use for nearly all the elements in comparisons if you fill the output buffer/terminal with elements one by one by following something like a bitonic-sort-path (that is known in compile-time). If that doesn't work, then you can try accessing files as a "backing-store" of cache for sorting whole array at once. Is file system prohibited for use? Then the online-compiling method above won't work either.
Implementation of a direct mapped cache (don't forget to call flush() after your algorithm finishes, if you use any cache.set() method):
#ifndef DIRECTMAPPEDCACHE_H_
#define DIRECTMAPPEDCACHE_H_
#include<vector>
#include<functional>
#include<mutex>
#include<iostream>
/* Direct-mapped cache implementation
* Only usable for integer type keys in range [0,maxPositive-1]
*
* CacheKey: type of key (only integers: int, char, size_t)
* CacheValue: type of value that is bound to key (same as above)
*/
template< typename CacheKey, typename CacheValue>
class DirectMappedCache
{
public:
// allocates buffers for numElements number of cache slots/lanes
// readMiss: cache-miss for read operations. User needs to give this function
// to let the cache automatically get data from backing-store
// example: [&](MyClass key){ return redis.get(key); }
// takes a CacheKey as key, returns CacheValue as value
// writeMiss: cache-miss for write operations. User needs to give this function
// to let the cache automatically set data to backing-store
// example: [&](MyClass key, MyAnotherClass value){ redis.set(key,value); }
// takes a CacheKey as key and CacheValue as value
// numElements: has to be integer-power of 2 (e.g. 2,4,8,16,...)
DirectMappedCache(CacheKey numElements,
const std::function<CacheValue(CacheKey)> & readMiss,
const std::function<void(CacheKey,CacheValue)> & writeMiss):size(numElements),sizeM1(numElements-1),loadData(readMiss),saveData(writeMiss)
{
// initialize buffers
for(size_t i=0;i<numElements;i++)
{
valueBuffer.push_back(CacheValue());
isEditedBuffer.push_back(0);
keyBuffer.push_back(CacheKey()-1);// mapping of 0+ allowed
}
}
// get element from cache
// if cache doesn't find it in buffers,
// then cache gets data from backing-store
// then returns the result to user
// then cache is available from RAM on next get/set access with same key
inline
const CacheValue get(const CacheKey & key) noexcept
{
return accessDirect(key,nullptr);
}
// only syntactic difference
inline
const std::vector<CacheValue> getMultiple(const std::vector<CacheKey> & key) noexcept
{
const int n = key.size();
std::vector<CacheValue> result(n);
for(int i=0;i<n;i++)
{
result[i]=accessDirect(key[i],nullptr);
}
return result;
}
// thread-safe but slower version of get()
inline
const CacheValue getThreadSafe(const CacheKey & key) noexcept
{
std::lock_guard<std::mutex> lg(mut);
return accessDirect(key,nullptr);
}
// set element to cache
// if cache doesn't find it in buffers,
// then cache sets data on just cache
// writing to backing-store only happens when
// another access evicts the cache slot containing this key/value
// or when cache is flushed by flush() method
// then returns the given value back
// then cache is available from RAM on next get/set access with same key
inline
void set(const CacheKey & key, const CacheValue & val) noexcept
{
accessDirect(key,&val,1);
}
// thread-safe but slower version of set()
inline
void setThreadSafe(const CacheKey & key, const CacheValue & val) noexcept
{
std::lock_guard<std::mutex> lg(mut);
accessDirect(key,&val,1);
}
// use this before closing the backing-store to store the latest bits of data
void flush()
{
try
{
std::lock_guard<std::mutex> lg(mut);
for (size_t i=0;i<size;i++)
{
if (isEditedBuffer[i] == 1)
{
isEditedBuffer[i]=0;
auto oldKey = keyBuffer[i];
auto oldValue = valueBuffer[i];
saveData(oldKey,oldValue);
}
}
}catch(std::exception &ex){ std::cout<<ex.what()<<std::endl; }
}
// direct mapped access
// opType=0: get
// opType=1: set
CacheValue const accessDirect(const CacheKey & key,const CacheValue * value, const bool opType = 0)
{
// find tag mapped to the key
CacheKey tag = key & sizeM1;
// compare keys
if(keyBuffer[tag] == key)
{
// cache-hit
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
valueBuffer[tag]=*value;
}
// cache hit value
return valueBuffer[tag];
}
else // cache-miss
{
CacheValue oldValue = valueBuffer[tag];
CacheKey oldKey = keyBuffer[tag];
// eviction algorithm start
if(isEditedBuffer[tag] == 1)
{
// if it is "get"
if(opType==0)
{
isEditedBuffer[tag]=0;
}
saveData(oldKey,oldValue);
// "get"
if(opType==0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else /* "set" */
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
else // not edited
{
// "set"
if(opType == 1)
{
isEditedBuffer[tag]=1;
}
// "get"
if(opType == 0)
{
const CacheValue && loadedData = loadData(key);
valueBuffer[tag]=loadedData;
keyBuffer[tag]=key;
return loadedData;
}
else // "set"
{
valueBuffer[tag]=*value;
keyBuffer[tag]=key;
return *value;
}
}
}
}
private:
const CacheKey size;
const CacheKey sizeM1;
std::mutex mut;
std::vector<CacheValue> valueBuffer;
std::vector<unsigned char> isEditedBuffer;
std::vector<CacheKey> keyBuffer;
const std::function<CacheValue(CacheKey)> loadData;
const std::function<void(CacheKey,CacheValue)> saveData;
};
#endif /* DIRECTMAPPEDCACHE_H_ */
You can solve this problem using a Max-heap.
Insert the first k elements into the max-heap. The largest element of these k will now be at the root.
For each remaining element e:
Compare e to the root.
If e is larger than the root, discard it.
If e is smaller than the root, remove the root and insert e into the heap structure.
After all elements have been processed, the k-th smallest element is at the root.
This method uses O(K) space and O(n log n) time.
There’s an algorithm that people often call LazySelect that I think would be perfect here.
With high probability, we make two passes. In the first pass, we save a random sample of size n much less than N. The answer will be around index (K/N)n in the sorted sample, but due to the randomness, we have to be careful. Save the values a and b at (K/N)n ± r instead, where r is the radius of the window. In the second pass, we save all of the values in [a, b], count the number of values less than a (let it be L), and select the value with index K−L if it’s in the window (otherwise, try again).
The theoretical advice on choosing n and r is fine, but I would be pragmatic here. Choose n so that you use most of the available memory; the bigger the sample, the more informative it is. Choose r fairly large as well, but not quite as aggressively due to the randomness.
C++ code below. On the online judge, it’s faster than Kelly’s (max 1.3 seconds on the T=3 tests, 0.5 on the T=1 tests).
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <limits>
#include <optional>
#include <random>
#include <vector>
namespace {
class LazySelector {
public:
static constexpr std::int32_t kTargetSampleSize = 1000;
explicit LazySelector() { sample_.reserve(1000000); }
void BeginFirstPass(const std::int32_t n, const std::int32_t k) {
sample_.clear();
mask_ = n / kTargetSampleSize;
mask_ |= mask_ >> 1;
mask_ |= mask_ >> 2;
mask_ |= mask_ >> 4;
mask_ |= mask_ >> 8;
mask_ |= mask_ >> 16;
}
void FirstPass(const std::int64_t value) {
if ((gen_() & mask_) == 0) {
sample_.push_back(value);
}
}
void BeginSecondPass(const std::int32_t n, const std::int32_t k) {
sample_.push_back(std::numeric_limits<std::int64_t>::min());
sample_.push_back(std::numeric_limits<std::int64_t>::max());
const double p = static_cast<double>(sample_.size()) / n;
const double radius = 2 * std::sqrt(sample_.size());
const auto lower =
sample_.begin() + std::clamp<std::int32_t>(std::floor(p * k - radius),
0, sample_.size() - 1);
const auto upper =
sample_.begin() + std::clamp<std::int32_t>(std::ceil(p * k + radius), 0,
sample_.size() - 1);
std::nth_element(sample_.begin(), upper, sample_.end());
std::nth_element(sample_.begin(), lower, upper);
lower_ = *lower;
upper_ = *upper;
sample_.clear();
less_than_lower_ = 0;
equal_to_lower_ = 0;
equal_to_upper_ = 0;
}
void SecondPass(const std::int64_t value) {
if (value < lower_) {
++less_than_lower_;
} else if (upper_ < value) {
} else if (value == lower_) {
++equal_to_lower_;
} else if (value == upper_) {
++equal_to_upper_;
} else {
sample_.push_back(value);
}
}
std::optional<std::int64_t> Select(std::int32_t k) {
if (k < less_than_lower_) {
return std::nullopt;
}
k -= less_than_lower_;
if (k < equal_to_lower_) {
return lower_;
}
k -= equal_to_lower_;
if (k < sample_.size()) {
const auto kth = sample_.begin() + k;
std::nth_element(sample_.begin(), kth, sample_.end());
return *kth;
}
k -= sample_.size();
if (k < equal_to_upper_) {
return upper_;
}
return std::nullopt;
}
private:
std::default_random_engine gen_;
std::vector<std::int64_t> sample_ = {};
std::int32_t mask_ = 0;
std::int64_t lower_ = std::numeric_limits<std::int64_t>::min();
std::int64_t upper_ = std::numeric_limits<std::int64_t>::max();
std::int32_t less_than_lower_ = 0;
std::int32_t equal_to_lower_ = 0;
std::int32_t equal_to_upper_ = 0;
};
} // namespace
int main() {
int t;
std::cin >> t;
for (int i = t; i > 0; --i) {
std::int32_t n;
std::int32_t k;
std::int64_t a;
std::int64_t b;
std::int64_t c;
std::int64_t d;
std::cin >> n >> k >> a >> b >> c >> d;
std::optional<std::int64_t> ans = std::nullopt;
LazySelector selector;
do {
{
selector.BeginFirstPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.FirstPass(s);
s = (b * s + c) % d;
}
}
{
selector.BeginSecondPass(n, k);
std::int64_t s = a;
for (std::int32_t j = n; j > 0; --j) {
selector.SecondPass(s);
s = (b * s + c) % d;
}
}
ans = selector.Select(k - 1);
} while (!ans);
std::cout << *ans << '\n';
}
}
I write this code for show fibonacci series using recursion.But It not show correctly for n>43 (ex: for n=100 show:-980107325).
#include<stdio.h>
#include<conio.h>
void fibonacciSeries(int);
void fibonacciSeries(int n)
{
static long d = 0, e = 1;
long c;
if (n>1)
{
c = d + e;
d = e;
e = c;
printf("%d \n", c);
fibonacciSeries(n - 1);
}
}
int main()
{
long a, n;
long long i = 0, j = 1, f;
printf("How many number you want to print in the fibonnaci series :\n");
scanf("%d", &n);
printf("\nFibonacci Series: ");
printf("%d", 0);
fibonacciSeries(n);
_getch();
return 0;
}
The value of fib(100) is so large that it will overflow even a 64 bit number. To operate on such large values, you need to do arbitrary-precision arithmetic. Arbitrary-precision arithmetic is not provided by C nor C++ standard libraries, so you'll need to either implement it yourself or use a library written by someone else.
For smaller values that do fit your long long, your problem is that you use the wrong printf format specifier. To print a long long, you need to use %lld.
Code overflows the range of the integer used long.
Could use long long, but even that may not handle Fib(100) which needs at least 69 bits.
Code could use long double if 1.0/LDBL_EPSILON > 3.6e20
Various libraries exist to handle very large integers.
For this task, all that is needed is a way to add two large integers. Consider using a string. An inefficient but simply string addition follows. No contingencies for buffer overflow.
#include <stdio.h>
#include <string.h>
#include <assert.h>
char *str_revese_inplace(char *s) {
char *left = s;
char *right = s + strlen(s);
while (right > left) {
right--;
char t = *right;
*right = *left;
*left = t;
left++;
}
return s;
}
char *str_add(char *ssum, const char *sa, const char *sb) {
const char *pa = sa + strlen(sa);
const char *pb = sb + strlen(sb);
char *psum = ssum;
int carry = 0;
while (pa > sa || pb > sb || carry) {
int sum = carry;
if (pa > sa) sum += *(--pa) - '0';
if (pb > sb) sum += *(--pb) - '0';
*psum++ = sum % 10 + '0';
carry = sum / 10;
}
*psum = '\0';
return str_revese_inplace(ssum);
}
int main(void) {
char fib[3][300];
strcpy(fib[0], "0");
strcpy(fib[1], "1");
int i;
for (i = 2; i <= 1000; i++) {
printf("Fib(%3d) %s.\n", i, str_add(fib[2], fib[1], fib[0]));
strcpy(fib[0], fib[1]);
strcpy(fib[1], fib[2]);
}
return 0;
}
Output
Fib( 2) 1.
Fib( 3) 2.
Fib( 4) 3.
Fib( 5) 5.
Fib( 6) 8.
...
Fib(100) 3542248xxxxxxxxxx5075. // Some xx left in for a bit of mystery.
Fib(1000) --> 43466...about 200 more digits...8875
You can print some large Fibonacci numbers using only char, int and <stdio.h> in C.
There is some headers :
#include <stdio.h>
#define B_SIZE 10000 // max number of digits
typedef int positive_number;
struct buffer {
size_t index;
char data[B_SIZE];
};
Also some functions :
void init_buffer(struct buffer *buffer, positive_number n) {
for (buffer->index = B_SIZE; n; buffer->data[--buffer->index] = (char) (n % 10), n /= 10);
}
void print_buffer(const struct buffer *buffer) {
for (size_t i = buffer->index; i < B_SIZE; ++i) putchar('0' + buffer->data[i]);
}
void fly_add_buffer(struct buffer *buffer, const struct buffer *client) {
positive_number a = 0;
size_t i = (B_SIZE - 1);
for (; i >= client->index; --i) {
buffer->data[i] = (char) (buffer->data[i] + client->data[i] + a);
buffer->data[i] = (char) (buffer->data[i] - (a = buffer->data[i] > 9) * 10);
}
for (; a; buffer->data[i] = (char) (buffer->data[i] + a), a = buffer->data[i] > 9, buffer->data[i] = (char) (buffer->data[i] - a * 10), --i);
if (++i < buffer->index) buffer->index = i;
}
Example usage :
int main() {
struct buffer number_1, number_2, number_3;
init_buffer(&number_1, 0);
init_buffer(&number_2, 1);
for (int i = 0; i < 2500; ++i) {
number_3 = number_1;
fly_add_buffer(&number_1, &number_2);
number_2 = number_3;
}
print_buffer(&number_1);
}
// print 131709051675194962952276308712 ... 935714056959634778700594751875
Best C type is still char ? The given code is printing f(2500), a 523 digits number.
Info : f(2e5) has 41,798 digits, see also Factorial(10000) and Fibonacci(1000000).
Well, you could want to try implementing BigInt in C++ or C.
Useful Material:
How to implement big int in C++
For this purporse you need implement BigInteger. There is no such build-in support in current c++. You can view few advises on stack overflow
Or you also can use some libs like GMP
Also here is some implementation:
E-maxx - on Russian language description.
Or find some open implementation on GitHub
Try to use a different format and printf, use unsigned to get wider range of digits.
If you use unsigned long long you should get until 18 446 744 073 709 551 615 so until the 93th number for fibonacci serie 12200160415121876738 but after this one you will get incorrect result because the 94th number 19740274219868223167 is too big for unsigned long long.
Keep in mind that the n-th fibonacci number is (approximately) ((1 + sqrt(5))/2)^n.
This allows you to get the value for n that allows the result to fit in 32 /64 unsigned integers. For signed remember that you lose one bit.
I am doing a factorial program with strings because i need the factorial of Numbers greater than 250
I intent with:
string factorial(int n){
string fact="1";
for(int i=2; i<=n; i++){
b=atoi(fact)*n;
}
}
But the problem is that atoi not works. How can i convert my string in a integer.
And The most important Do I want to know if the program of this way will work with the factorial of 400 for example?
Not sure why you are trying to use string. Probably to save some space by not using integer vector? This is my solution by using integer vector to store factorial and print.Works well with 400 or any large number for that matter!
//Factorial of a big number
#include<iostream>
#include<vector>
using namespace std;
int main(){
int num;
cout<<"Enter the number :";
cin>>num;
vector<int> res;
res.push_back(1);
int carry=0;
for(int i=2;i<=num;i++){
for(int j=0;j<res.size();j++){
int tmp=res[j]*i;
res[j]=(tmp+carry)%10 ;
carry=(tmp+carry)/10;
}
while(carry!=0){
res.push_back(carry%10);
carry=carry/10;
}
}
for(int i=res.size()-1;i>=0;i--) cout<<res[i];
cout<<endl;
return 0;
}
Enter the number :400
Factorial of 400 :64034522846623895262347970319503005850702583026002959458684445942802397169186831436278478647463264676294350575035856810848298162883517435228961988646802997937341654150838162426461942352307046244325015114448670890662773914918117331955996440709549671345290477020322434911210797593280795101545372667251627877890009349763765710326350331533965349868386831339352024373788157786791506311858702618270169819740062983025308591298346162272304558339520759611505302236086810433297255194852674432232438669948422404232599805551610635942376961399231917134063858996537970147827206606320217379472010321356624613809077942304597360699567595836096158715129913822286578579549361617654480453222007825818400848436415591229454275384803558374518022675900061399560145595206127211192918105032491008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
There's a web site that will calculate factorials for you: http://www.nitrxgen.net/factorialcalc.php. It reports:
The resulting factorial of 250! is 493 digits long.
The result also contains 62 trailing zeroes (which constitutes to 12.58% of the whole number)
3232856260909107732320814552024368470994843717673780666747942427112823747555111209488817915371028199450928507353189432926730931712808990822791030279071281921676527240189264733218041186261006832925365133678939089569935713530175040513178760077247933065402339006164825552248819436572586057399222641254832982204849137721776650641276858807153128978777672951913990844377478702589172973255150283241787320658188482062478582659808848825548800000000000000000000000000000000000000000000000000000000000000
Many systems using C++ double only work up to 1E+308 or thereabouts; the value of 250! is too large to store in such numbers.
Consequently, you'll need to use some sort of multi-precision arithmetic library, either of your own devising using C++ string values, or using some other widely-used multi-precision library (GNU GMP for example).
The code below uses unsigned double long to calculate very large digits.
#include<iostream.h>
int main()
{
long k=1;
while(k!=0)
{
cout<<"\nLarge Factorial Calculator\n\n";
cout<<"Enter a number be calculated:";
cin>>k;
if (k<=33)
{
unsigned double long fact=1;
fact=1;
for(int b=k;b>=1;b--)
{
fact=fact*b;
}
cout<<"\nThe factorial of "<<k<<" is "<<fact<<"\n";
}
else
{
int numArr[10000];
int total,rem=0,count;
register int i;
//int i;
for(i=0;i<10000;i++)
numArr[i]=0;
numArr[10000]=1;
for(count=2;count<=k;count++)
{
while(i>0)
{
total=numArr[i]*count+rem;
rem=0;
if(total>9)
{
numArr[i]=total%10;
rem=total/10;
}
else
{
numArr[i]=total;
}
i--;
}
rem=0;
total=0;
i=10000;
}
cout<<"The factorial of "<<k<<" is \n\n";
for(i=0;i<10000;i++)
{
if(numArr[i]!=0 || count==1)
{
cout<<numArr[i];
count=1;
}
}
cout<<endl;
}
cout<<"\n\n";
}//while
return 0;
}
Output:
![Large Factorial Calculator
Enter a number be calculated:250
The factorial of 250 is
32328562609091077323208145520243684709948437176737806667479424271128237475551112
09488817915371028199450928507353189432926730931712808990822791030279071281921676
52724018926473321804118626100683292536513367893908956993571353017504051317876007
72479330654023390061648255522488194365725860573992226412548329822048491377217766
50641276858807153128978777672951913990844377478702589172973255150283241787320658
18848206247858265980884882554880000000000000000000000000000000000000000000000000
000000000000][1]
You can make atoi compile by adding c_str(), but it will be a long way to go till getting factorial. Currently you have no b around. And if you had, you still multiply int by int. So even if you eventually convert that to string before return, your range is still limited. Until you start to actually do multiplication with ASCII or use a bignum library there's no point to have string around.
Your factorial depends on conversion to int, which will overflow pretty fast, so you want be able to compute large factorials that way. To properly implement computation on big numbers you need to implement logic as for computation on paper, rules that you were tought in primary school, but treat long long ints as "atoms", not individual digits. And don't do it on strings, it would be painfully slow and full of nasty conversions
If you are going to solve factorial for numbers larger than around 12, you need a different approach than using atoi, since that just gives you a 32-bit integer, and no matter what you do, you are not going to get more than 2 billion (give or take) out of that. Even if you double the size of the number, you'll only get to about 20 or 21.
It's not that hard (relatively speaking) to write a string multiplication routine that takes a small(ish) number and multiplies each digit and ripples the results through to the the number (start from the back of the number, and fill it up).
Here's my obfuscated code - it is intentionally written such that you can't just take it and hand in as school homework, but it appears to work (matches the number in Jonathan Leffler's answer), and works up to (at least) 20000! [subject to enough memory].
std::string operator*(const std::string &s, int x)
{
int l = (int)s.length();
std::string r;
r.resize(l);
std::fill(r.begin(), r.end(), '0');
int b = 0;
int e = ~b;
const int c = 10;
for(int i = l+e; i != e;)
{
int d = (s[i]-0x30) * x, p = i + b;
while (d && p > e)
{
int t = r[p] - 0x30 + (d % c);
r[p] = (t % c) + 0x30;
d = t / c + d / c;
p--;
}
while (d)
{
r = static_cast<char>((d % c) +0x30)+r;
d /= c;
b++;
}
i--;
}
return r;
}
In C++, the largest integer type is 'long long', and it hold 64 bits of memory, so obviously you can't store 250! in an integer type. It is a clever idea to use strings, but what you are basically doing with your code is (I have never used the atoi() function, so I don't know if it even works with strings larger than 1 character, but it doesn't matter):
covert the string to integer (a string that if this code worked well, in one moment contains the value of 249!)
multiply the value of the string
So, after you are done multiplying, you don't even convert the integer back to string. And even if you did that, at one moment when you convert the string back to an integer, your program will crash, because the integer won't be able to hold the value of the string.
My suggestion is, to use some class for big integers. Unfortunately, there isn't one available in C++, so you'll have to code it by yourself or find one on the internet. But, don't worry, even if you code it by yourself, if you think a little, you'll see it's not that hard. You can even use your idea with the strings, which, even tough is not the best approach, for this problem, will still yield the results in the desired time not using too much memory.
This is a typical high precision problem.
You can use an array of unsigned long long instead of string.
like this:
struct node
{
unsigned long long digit[100000];
}
It should be faster than string.
But You still can use string unless you are urgent.
It may take you a few days to calculate 10000!.
I like use string because it is easy to write.
#include <bits/stdc++.h>
#pragma GCC optimize (2)
using namespace std;
const int MAXN = 90;
int n, m;
int a[MAXN];
string base[MAXN], f[MAXN][MAXN];
string sum, ans;
template <typename _T>
void Swap(_T &a, _T &b)
{
_T temp;
temp = a;
a = b;
b = temp;
}
string operator + (string s1, string s2)
{
string ret;
int digit, up = 0;
int len1 = s1.length(), len2 = s2.length();
if (len1 < len2) Swap(s1, s2), Swap(len1, len2);
while(len2 < len1) s2 = '0' + s2, len2++;
for (int i = len1 - 1; i >= 0; i--)
{
digit = s1[i] + s2[i] - '0' - '0' + up; up = 0;
if (digit >= 10) up = digit / 10, digit %= 10;
ret = char(digit + '0') + ret;
}
if (up) ret = char(up + '0') + ret;
return ret;
}
string operator * (string str, int p)
{
string ret = "0", f; int digit, mul;
int len = str.length();
for (int i = len - 1; i >= 0; i--)
{
f = "";
digit = str[i] - '0';
mul = p * digit;
while(mul)
{
digit = mul % 10 , mul /= 10;
f = char(digit + '0') + f;
}
for (int j = 1; j < len - i; j++) f = f + '0';
ret = ret + f;
}
return ret;
}
int main()
{
freopen("factorial.out", "w", stdout);
string ans = "1";
for (int i = 1; i <= 5000; i++)
{
ans = ans * i;
cout << i << "! = " << ans << endl;
}
return 0;
}
Actually, I know where the problem raised At the point where we multiply , there is the actual problem ,when numbers get multiplied and get bigger and bigger.
this code is tested and is giving the correct result.
#include <bits/stdc++.h>
using namespace std;
#define mod 72057594037927936 // 2^56 (17 digits)
// #define mod 18446744073709551616 // 2^64 (20 digits) Not supported
long long int prod_uint64(long long int x, long long int y)
{
return x * y % mod;
}
int main()
{
long long int n=14, s = 1;
while (n != 1)
{
s = prod_uint64(s , n) ;
n--;
}
}
Expexted output for 14! = 87178291200
The logic should be:
unsigned int factorial(int n)
{
unsigned int b=1;
for(int i=2; i<=n; i++){
b=b*n;
}
return b;
}
However b may get overflowed. So you may use a bigger integral type.
Or you can use float type which is inaccurate but can hold much bigger numbers.
But it seems none of the built-in types are big enough.
I am trying to solve a problem, a part of which requires me to calculate (2^n)%1000000007 , where n<=10^9. But my following code gives me output "0" even for input like n=99.
Is there anyway other than having a loop which multilplies the output by 2 every time and finding the modulo every time (this is not I am looking for as this will be very slow for large numbers).
#include<stdio.h>
#include<math.h>
#include<iostream>
using namespace std;
int main()
{
unsigned long long gaps,total;
while(1)
{
cin>>gaps;
total=(unsigned long long)powf(2,gaps)%1000000007;
cout<<total<<endl;
}
}
You need a "big num" library, it is not clear what platform you are on, but start here:
http://gmplib.org/
this is not I am looking for as this will be very slow for large numbers
Using a bigint library will be considerably slower pretty much any other solution.
Don't take the modulo every pass through the loop: rather, only take it when the output grows bigger than the modulus, as follows:
#include <iostream>
int main() {
int modulus = 1000000007;
int n = 88888888;
long res = 1;
for(long i=0; i < n; ++i) {
res *= 2;
if(res > modulus)
res %= modulus;
}
std::cout << res << std::endl;
}
This is actually pretty quick:
$ time ./t
./t 1.19s user 0.00s system 99% cpu 1.197 total
I should mention that the reason this works is that if a and b are equivalent mod m (that is, a % m = b % m), then this equality holds multiple k of a and b (that is, the foregoing equality implies (a*k)%m = (b*k)%m).
Chris proposed GMP, but if you need just that and want to do things The C++ Way, not The C Way, and without unnecessary complexity, you may just want to check this out - it generates few warnings when compiling, but is quite simple and Just Works™.
You can split your 2^n into chunks of 2^m. You need to find: `
2^m * 2^m * ... 2^(less than m)
Number m should be 31 is for 32-bit CPU. Then your answer is:
chunk1 % k * chunk2 * k ... where k=1000000007
You are still O(N). But then you can utilize the fact that all chunk % k are equal except last one and you can make it O(1)
I wrote this function. It is very inefficient but it works with very large numbers. It uses my self-made algorithm to store big numbers in arrays using a decimal like system.
mpfr2.cpp
#include "mpfr2.h"
void mpfr2::mpfr::setNumber(std::string a) {
for (int i = a.length() - 1, j = 0; i >= 0; ++j, --i) {
_a[j] = a[i] - '0';
}
res_size = a.length();
}
int mpfr2::mpfr::multiply(mpfr& a, mpfr b)
{
mpfr ans = mpfr();
// One by one multiply n with individual digits of res[]
int i = 0;
for (i = 0; i < b.res_size; ++i)
{
for (int j = 0; j < a.res_size; ++j) {
ans._a[i + j] += b._a[i] * a._a[j];
}
}
for (i = 0; i < a.res_size + b.res_size; i++)
{
int tmp = ans._a[i] / 10;
ans._a[i] = ans._a[i] % 10;
ans._a[i + 1] = ans._a[i + 1] + tmp;
}
for (i = a.res_size + b.res_size; i >= 0; i--)
{
if (ans._a[i] > 0) break;
}
ans.res_size = i+1;
a = ans;
return a.res_size;
}
mpfr2::mpfr mpfr2::mpfr::pow(mpfr a, mpfr b) {
mpfr t = a;
std::string bStr = "";
for (int i = b.res_size - 1; i >= 0; --i) {
bStr += std::to_string(b._a[i]);
}
int i = 1;
while (!0) {
if (bStr == std::to_string(i)) break;
a.res_size = multiply(a, t);
// Debugging
std::cout << "\npow() iteration " << i << std::endl;
++i;
}
return a;
}
mpfr2.h
#pragma once
//#infdef MPFR2_H
//#define MPFR2_H
// C standard includes
#include <iostream>
#include <string>
#define MAX 0x7fffffff/32/4 // 2147483647
namespace mpfr2 {
class mpfr
{
public:
int _a[MAX];
int res_size;
void setNumber(std::string);
static int multiply(mpfr&, mpfr);
static mpfr pow(mpfr, mpfr);
};
}
//#endif
main.cpp
#include <iostream>
#include <fstream>
// Local headers
#include "mpfr2.h" // Defines local mpfr algorithm library
// Namespaces
namespace m = mpfr2; // Reduce the typing a bit later...
m::mpfr tetration(m::mpfr, int);
int main() {
// Hardcoded tests
int x = 7;
std::ofstream f("out.txt");
m::mpfr t;
for(int b=1; b<x;b++) {
std::cout << "2^^" << b << std::endl; // Hardcoded message
t.setNumber("2");
m::mpfr res = tetration(t, b);
for (int i = res.res_size - 1; i >= 0; i--) {
std::cout << res._a[i];
f << res._a[i];
}
f << std::endl << std::endl;
std::cout << std::endl << std::endl;
}
char c; std::cin.ignore(); std::cin >> c;
return 0;
}
m::mpfr tetration(m::mpfr a, int b)
{
m::mpfr tmp = a;
if (b <= 0) return m::mpfr();
for (; b > 1; b--) tmp = m::mpfr::pow(a, tmp);
return tmp;
}
I created this for tetration and eventually hyperoperations. When the numbers get really big it can take ages to calculate and a lot of memory. The #define MAX 0x7fffffff/32/4 is the number of decimals one number can have. I might make another algorithm later to combine multiple of these arrays into one number. On my system the max array length is 0x7fffffff aka 2147486347 aka 2^31-1 aka int32_max (which is usually the standard int size) so I had to divide int32_max by 32 to make the creation of this array possible. I also divided it by 4 to reduce memory usage in the multiply() function.
- Jubiman
I'm probably going to ask this incorrectly and make myself look very stupid but here goes:
I'm trying to do some audio manipulate and processing on a .wav file. Now, I am able to read all of the data (including the header) but need the data to be in frequency, and, in order to this I need to use an FFT.
I searched the internet high and low and found one, and the example was taken out of the "Numerical Recipes in C" book, however, I amended it to use vectors instead of arrays. Ok so here's the problem:
I have been given (as an example to use) a series of numbers and a sampling rate:
X = {50, 206, -100, -65, -50, -6, 100, -135}
Sampling Rate : 8000
Number of Samples: 8
And should therefore answer this:
0Hz A=0 D=1.57079633
1000Hz A=50 D=1.57079633
2000HZ A=100 D=0
3000HZ A=100 D=0
4000HZ A=0 D=3.14159265
The code that I re-wrote compiles, however, when trying to input these numbers into the equation (function) I get a Segmentation fault.. Is there something wrong with my code, or is the sampling rate too high? (The algorithm doesn't segment when using a much, much smaller sampling rate). Here is the code:
#include <iostream>
#include <math.h>
#include <vector>
using namespace std;
#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr;
#define pi 3.14159
void ComplexFFT(vector<float> &realData, vector<float> &actualData, unsigned long sample_num, unsigned int sample_rate, int sign)
{
unsigned long n, mmax, m, j, istep, i;
double wtemp,wr,wpr,wpi,wi,theta,tempr,tempi;
// CHECK TO SEE IF VECTOR IS EMPTY;
actualData.resize(2*sample_rate, 0);
for(n=0; (n < sample_rate); n++)
{
if(n < sample_num)
{
actualData[2*n] = realData[n];
}else{
actualData[2*n] = 0;
actualData[2*n+1] = 0;
}
}
// Binary Inversion
n = sample_rate << 1;
j = 0;
for(i=0; (i< n /2); i+=2)
{
if(j > i)
{
SWAP(actualData[j], actualData[i]);
SWAP(actualData[j+1], actualData[i+1]);
if((j/2)<(n/4))
{
SWAP(actualData[(n-(i+2))], actualData[(n-(j+2))]);
SWAP(actualData[(n-(i+2))+1], actualData[(n-(j+2))+1]);
}
}
m = n >> 1;
while (m >= 2 && j >= m) {
j -= m;
m >>= 1;
}
j += m;
}
mmax=2;
while(n > mmax) {
istep = mmax << 1;
theta = sign * (2*pi/mmax);
wtemp = sin(0.5*theta);
wpr = -2.0*wtemp*wtemp;
wpi = sin(theta);
wr = 1.0;
wi = 0.0;
for(m=1; (m < mmax); m+=2) {
for(i=m; (i <= n); i += istep)
{
j = i*mmax;
tempr = wr*actualData[j-1]-wi*actualData[j];
tempi = wr*actualData[j]+wi*actualData[j-1];
actualData[j-1] = actualData[i-1] - tempr;
actualData[j] = actualData[i]-tempi;
actualData[i-1] += tempr;
actualData[i] += tempi;
}
wr = (wtemp=wr)*wpr-wi*wpi+wr;
wi = wi*wpr+wtemp*wpi+wi;
}
mmax = istep;
}
// determine if the fundamental frequency
int fundemental_frequency = 0;
for(i=2; (i <= sample_rate); i+=2)
{
if((pow(actualData[i], 2)+pow(actualData[i+1], 2)) > pow(actualData[fundemental_frequency], 2)+pow(actualData[fundemental_frequency+1], 2)) {
fundemental_frequency = i;
}
}
}
int main(int argc, char *argv[]) {
vector<float> numbers;
vector<float> realNumbers;
numbers.push_back(50);
numbers.push_back(206);
numbers.push_back(-100);
numbers.push_back(-65);
numbers.push_back(-50);
numbers.push_back(-6);
numbers.push_back(100);
numbers.push_back(-135);
ComplexFFT(numbers, realNumbers, 8, 8000, 0);
for(int i=0; (i < realNumbers.size()); i++)
{
cout << realNumbers[i] << "\n";
}
}
The other thing, (I know this sounds stupid) but I don't really know what is expected of the
"int sign" That is being passed through the ComplexFFT function, this is where I could be going wrong.
Does anyone have any suggestions or solutions to this problem?
Thank you :)
I think the problem lies in errors in how you translated the algorithm.
Did you mean to initialize j to 1 rather than 0?
for(i = 0; (i < n/2); i += 2) should probably be for (i = 1; i < n; i += 2).
Your SWAPs should probably be
SWAP(actualData[j - 1], actualData[i - 1]);
SWAP(actualData[j], actualData[i]);
What are the following SWAPs for? I don't think they're needed.
if((j/2)<(n/4))
{
SWAP(actualData[(n-(i+2))], actualData[(n-(j+2))]);
SWAP(actualData[(n-(i+2))+1], actualData[(n-(j+2))+1]);
}
The j >= m in while (m >= 2 && j >= m) should probably be j > m if you intended to do bit reversal.
In the code implementing the Danielson-Lanczos section, are you sure j = i*mmax; was not supposed to be an addition, i.e. j = i + mmax;?
Apart from that, there are a lot of things you can do to simplify your code.
Using your SWAP macro should be discouraged when you can just use std::swap... I was going to suggest std::swap_ranges, but then I realized you only need to swap the real parts, since your data is all reals (your time-series imaginary parts are all 0):
std::swap(actualData[j - 1], actualData[i - 1]);
You can simplify the entire thing using std::complex, too.
I reckon its down to the re-sizing of your vector.
One possibility: Maybe re-sizing will create temp objects on the stack before moving them back to heap i think.
The FFT in Numerical Recipes in C uses the Cooley-Tukey Algorithm, so in answer to your question at the end, the int sign being passed allows the same routine to be used to compute both the forward (sign=-1) and inverse (sign=1) FFT. This seems to be consistent with the way you are using sign when you define theta = sign * (2*pi/mmax).