std::vector.push_back() C++ - c++

I want to initialize a vector with alphabetical letters using push_back function. Is it the right way to do it?
vector<char> v;
char letter = 'A';
for(int i=0; i<26; i++)
{
v.push_back(letter+i);
}
It works. I am just wondering whether I should use a type cast letter to an int before adding i to it?
Or is there a more efficient way?

Note that your code relies on a character encoding scheme that encodes the letters contiguously, like e.g. ASCII.
If that assumption holds, you could create the vector using the correct size initially, and use std::iota to initialize all elements:
std::vector<char> v(26); // Create a vector of 26 (default-initialized) elements
std::iota(begin(v), end(v), 'A'); // Assign a letter to each element in the vector
If you want your code to be portable to systems where letters aren't contiguously encoded (like a system which uses EBCDIC) then you're better of to create a string using the letters explicitly:
std::string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; // Thanks Nathan Oliver :)
And if you have a string with all the letters, then perhaps you won't need the vector even.

Looks pretty good!
I guess maybe std::array() would be an option too, as compared to std::vector() for similar tasks:
#include <iostream>
#include <array>
#include <vector>
#include <chrono>
void function1() {
std::vector<char> alphabets;
for (unsigned int index = 0; index < 26; ++index) {
alphabets.push_back(index + 'A');
// std::cout << alphabets[index] << "\t";
}
// std::cout << "\n\n";
}
void function2() {
std::vector<char> alphabets;
for (unsigned int index = 0; index < 26; ++index) {
alphabets.emplace_back(index + 'A');
// std::cout << alphabets[index] << "\t";
}
// std::cout << "\n\n";
}
void function3() {
std::array<char, 26> alphabets;
for (unsigned int index = 0; index < 26; ++index) {
alphabets[index] = index + 'A';
// std::cout << alphabets[index] << "\t";
}
// std::cout << "\n\n";
}
int main() {
const auto t1 = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < 1000000; ++i) {
function1();
}
const auto t2 = std::chrono::high_resolution_clock::now();
const auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
std::cout << duration <<
" is the rough runtime of std::vector function with push_back\t💙💙💙\t😳\n\n";
const auto t3 = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < 1000000; ++i) {
function2();
}
const auto t4 = std::chrono::high_resolution_clock::now();
const auto duration2 = std::chrono::duration_cast<std::chrono::microseconds>( t4 - t3 ).count();
std::cout << duration2 <<
" is the rough runtime of std::vector function with emplace_back\t💙💙💙\t😳\n\n";
const auto t5 = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < 1000000; ++i) {
function3();
}
const auto t6 = std::chrono::high_resolution_clock::now();
const auto duration3 = std::chrono::duration_cast<std::chrono::microseconds>( t6 - t5 ).count();
std::cout << duration3 << " is the rough runtime of std::array function\t💙💙💙\t😳\n\n";
return 0;
};

Related

c++ Is it possible to have a process (outside of main) that returns an array whose size is determined by the input, without using pointers or macros?

Bellow is a simple program that works fine. It contains a function that is able to return a string of arbitrary size. The size of which is determined by the function input.
#include <iostream>
using namespace std;
string strFunc(int a){
string toBeReturned;
for(int i=0; i < a; i++){
toBeReturned += '!';
}
return toBeReturned;
}
int main(){
int x = 5;
cout << strFunc(x) << endl;
return 0;
}
If instead I wanted a function (or a single process to call in main) to return a 1-D array (int toBeReturned[size to be determined]) I had to use a function that returns a pointer and then include that function in a macro that constructs the array.
Is there a simpler way of doing this in c++?
If not can someone please explain why this only works for type string? I thought that a string is simply a 1-D array of type 'char'.
Thank you,
Daniel
A function can return any POD or class type by value.
A C++-style std::array is a fixed-sized array wrapped in a class type, and thus can be returned by value. However, a C-style fixed-sized array cannot be returned by value (but it can be stored as a member of a class type, which can then be returned by value, like std::array does).
A C-style array can't be sized dynamically (without using a non-standard compiler extension), which is why you would have to new[] it, return it by pointer, and then delete[] it when you are done using it.
Since you want your function to return a dynamic-sized array, you should use std::vector instead of a new[]'ed pointer, eg:
#include <iostream>
#include <vector>
using namespace std;
vector<int> strFunc(int a){
vector<int> toBeReturned(a);
for(size_t i = 0; i < a; ++i){
toBeReturned[i] = ...;
}
return toBeReturned;
}
int main(){
int x = 5;
vector<int> returned = strFunc(x);
for(size_t i = 0; i < x; ++i){
cout << returned[i] << ' ' << endl;
}
return 0;
}
You can use a vector of whatever type you need, and pass it into your function by reference.
#include <vector>
#include <chrono>
#include <iostream>
using namespace std;
void by_reference(vector<size_t>& v, size_t s)
{
v.clear();
v.resize(s, 0);
for (size_t i = 0; i < s; i++)
v[i] = i;
}
vector<size_t> by_return(size_t s)
{
vector<size_t> v(s, 0);
for (size_t i = 0; i < s; i++)
v[i] = i;
return v;
}
// Where s is large, by_reference is faster
// Where s is small, by_return is faster
// Use whichever works best for you and your situation
int main(void)
{
std::chrono::high_resolution_clock::time_point start_time, end_time;
std::chrono::duration<float, std::milli> elapsed;
start_time = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < 1000; i++)
{
vector<size_t> v;
const size_t s = 10000000;
by_reference(v, s);
for (size_t i = 0; i < s; i++)
v[i] = i;
}
end_time = std::chrono::high_resolution_clock::now();
elapsed = end_time - start_time;
cout << "Duration: " << elapsed.count() / 1000.0f << " seconds" << endl;
start_time = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < 1000; i++)
{
const size_t s = 10000000;
vector<size_t> v = by_return(s);
for (size_t i = 0; i < s; i++)
v[i] = i;
}
end_time = std::chrono::high_resolution_clock::now();
elapsed = end_time - start_time;
cout << "Duration: " << elapsed.count() / 1000.0f << " seconds" << endl;
return 0;
}

Neural network with static std::array is slower than neural network using dynamic C-array

There is a minimalistic (around 200 lines) neural network C library on github called Tinn.
Tinn uses dynamic C arrays for representing weights, biases, neurons. I tried to implement it partially in C++ but using static std::array. I thought the static std::array would be much faster. However it is exactly the opposite after doing some measurements. Could anybody tell me if I am doing something wrong or tell me a reason why static array is beaten by dynamic even with -O3 optimizations?
Neural network with static arrays MLP_1.h
#pragma once
#include <cmath>
#include <array>
#include <iostream>
#include <fstream>
template<class Type, size_t nIn, size_t nHid, size_t nOut>
class MLP_1
{
public:
static constexpr size_t nInputs = nIn;
static constexpr size_t nHiddens = nHid;
static constexpr size_t nOutputs = nOut;
static constexpr size_t nWeights = nHiddens * (nInputs + nOutputs);
static constexpr size_t nBiases = 2;
static constexpr size_t weightIndexOffset = nHiddens * nInputs;
std::array<Type, nWeights> weights;
std::array<Type, nBiases> biases;
std::array<Type, nHiddens> hiddenNeurons;
std::array<Type, nOut> outputNeurons;
static Type activationFunction(const Type x) noexcept
{
//return x / (1 + std::abs(x)); // faster
return 1.0 / (1.0 + std::exp(-x));
}
void forwardPropagation(const Type* const input) noexcept
{
// Calculate hidden layer neuron values.
for(size_t i = 0; i < nHiddens; ++i)
{
Type sum = 0.0;
for(size_t j = 0; j < nInputs; ++j)
{
const size_t weightIndex = (i * nInputs) + j;
sum += input[j] * weights[weightIndex];
}
hiddenNeurons[i] = activationFunction(sum + biases[0]);
}
// Calculate output layer neuron values.
for(size_t i = 0; i < nOutputs; ++i)
{
Type sum = 0.0;
for(size_t j = 0; j < nHiddens; ++j)
{
const size_t weightIndex = weightIndexOffset + (i * nHiddens) + j;
sum += hiddenNeurons[j] * weights[weightIndex];
}
outputNeurons[i] = activationFunction(sum + biases[1]);
}
}
const Type* const predict(const Type* const input) noexcept
{
forwardPropagation(input);
return outputNeurons.data();
}
const std::array<Type, nOutputs>& predict(const std::array<Type, nInputs>& inputArray)
{
forwardPropagation(inputArray.data());
return outputNeurons;
}
void load(const char* const path) noexcept
{
std::ifstream inputFile(path);
size_t nInputsFile, nHiddensFile, nOutputsFile;
std::string ignoreString;
inputFile >> nInputsFile >> nHiddensFile >> nOutputsFile;
if ((nInputs != nInputsFile) || (nHiddens != nHiddensFile) || (nOutputs != nOutputsFile))
{
std::cout << "Size missmatch.\n";
std::cout << nInputs << ", " << nHiddens << ", " << nOutputs << std::endl;
std::cout << nInputsFile << ", " << nHiddensFile << ", " << nOutputsFile << std::endl;
}
for (auto& bias : biases)
{
Type biasFile;
inputFile >> biasFile;
bias = biasFile;
}
for (auto& weight : weights)
{
Type weightFile;
inputFile >> weightFile;
weight = weightFile;
}
}
void printWeights() const
{
std::cout << "weights: ";
for (const auto& w : weights) { std::cout << w << " "; }
std::cout << "\n";
}
void printBiases() const
{
std::cout << "biases: ";
for (const auto& b : biases) { std::cout << b << " "; }
std::cout << "\n";
}
void print() const
{
printWeights();
printBiases();
}
};
Neural network with dynamic arrays - Tinn.h
#pragma once
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
typedef struct
{
// All the weights.
float* w;
// Hidden to output layer weights.
float* x;
// Biases.
float* b;
// Hidden layer.
float* h;
// Output layer.
float* o;
// Number of biases - always two - Tinn only supports a single hidden layer.
int nb;
// Number of weights.
int nw;
// Number of inputs.
int nips;
// Number of hidden neurons.
int nhid;
// Number of outputs.
int nops;
}
Tinn;
// Returns floating point random from 0.0 - 1.0.
static float frand()
{
return rand() / (float) RAND_MAX;
}
// Activation function.
static float act(const float a)
{
return 1.0f / (1.0f + expf(-a));
}
// Performs forward propagation.
static void fprop(const Tinn t, const float* const in)
{
// Calculate hidden layer neuron values.
for(int i = 0; i < t.nhid; i++)
{
float sum = 0.0f;
for(int j = 0; j < t.nips; j++)
sum += in[j] * t.w[i * t.nips + j];
t.h[i] = act(sum + t.b[0]);
}
// Calculate output layer neuron values.
for(int i = 0; i < t.nops; i++)
{
float sum = 0.0f;
for(int j = 0; j < t.nhid; j++)
sum += t.h[j] * t.x[i * t.nhid + j];
t.o[i] = act(sum + t.b[1]);
}
}
// Randomizes tinn weights and biases.
static void wbrand(const Tinn t)
{
for(int i = 0; i < t.nw; i++) t.w[i] = frand() - 0.5f;
for(int i = 0; i < t.nb; i++) t.b[i] = frand() - 0.5f;
}
// Returns an output prediction given an input.
float* xtpredict(const Tinn t, const float* const in)
{
fprop(t, in);
return t.o;
}
// Constructs a tinn with number of inputs, number of hidden neurons, and number of outputs
Tinn xtbuild(const int nips, const int nhid, const int nops)
{
Tinn t;
// Tinn only supports one hidden layer so there are two biases.
t.nb = 2;
t.nw = nhid * (nips + nops);
t.w = (float*) calloc(t.nw, sizeof(*t.w));
t.x = t.w + nhid * nips;
t.b = (float*) calloc(t.nb, sizeof(*t.b));
t.h = (float*) calloc(nhid, sizeof(*t.h));
t.o = (float*) calloc(nops, sizeof(*t.o));
t.nips = nips;
t.nhid = nhid;
t.nops = nops;
wbrand(t);
return t;
}
// Saves a tinn to disk.
void xtsave(const Tinn t, const char* const path)
{
FILE* const file = fopen(path, "w");
// Save header.
fprintf(file, "%d %d %d\n", t.nips, t.nhid, t.nops);
// Save biases and weights.
for(int i = 0; i < t.nb; i++) fprintf(file, "%f\n", (double) t.b[i]);
for(int i = 0; i < t.nw; i++) fprintf(file, "%f\n", (double) t.w[i]);
fclose(file);
}
// Loads a tinn from disk.
Tinn xtload(const char* const path)
{
FILE* const file = fopen(path, "r");
int nips = 0;
int nhid = 0;
int nops = 0;
// Load header.
fscanf(file, "%d %d %d\n", &nips, &nhid, &nops);
// Build a new tinn.
const Tinn t = xtbuild(nips, nhid, nops);
// Load biaes and weights.
for(int i = 0; i < t.nb; i++) fscanf(file, "%f\n", &t.b[i]);
for(int i = 0; i < t.nw; i++) fscanf(file, "%f\n", &t.w[i]);
fclose(file);
return t;
}
// Frees object from heap.
void xtfree(const Tinn t)
{
free(t.w);
free(t.b);
free(t.h);
free(t.o);
}
// Prints an array of floats. Useful for printing predictions.
void xtprint(const float* arr, const int size)
{
for(int i = 0; i < size; i++)
printf("%f ", (double) arr[i]);
printf("\n");
}
void xtprint(const Tinn& tinn)
{
printf("weights: ");
xtprint(tinn.w, tinn.nw);
printf("biases: ");
xtprint(tinn.b, tinn.nb);
}
Main with tests main.cpp
#include <iostream>
#include "MLP_1.h"
#include "Tinn.h"
#include <array>
#include <iterator>
#include <random>
#include <algorithm>
#include <chrono>
constexpr size_t in = 748;
constexpr size_t hid = 20;
constexpr size_t out = 5;
const char* const path = "tinn01.txt";
template< class Iter >
void fill_with_random_values( Iter start, Iter end, int min, int max)
{
static std::random_device rd; // you only need to initialize it once
static std::mt19937 mte(rd()); // this is a relative big object to create
std::uniform_real_distribution<float> dist(min, max);
std::generate(start, end, [&] () { return dist(mte); });
}
void testMLP(MLP_1<float, in, hid, out>& mlp, const std::array<float, in>& array)
{
std::cout << "------MLP------\n";
float sum = 0;
const float* data = array.data();
auto start = std::chrono::system_clock::now();
for (size_t i = 0; i < 60000; ++i)
{
const float* inputRes1 = mlp.predict(data);
sum += inputRes1[0];
}
auto end = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << "sum:" << sum << "\n";
std::cout << "elapsed time: " << elapsed.count() << "ms" << "\n";
std::cout << "------MLP------\n";
}
void testTinn(Tinn& tinn, const std::array<float, in>& array)
{
std::cout << "------TINN------\n";
float sum = 0;
const float* data = array.data();
auto start = std::chrono::system_clock::now();
for (size_t i = 0; i < 60000; ++i)
{
const float* inputRes1 = xtpredict(tinn, data);
sum += inputRes1[0];
}
auto end = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << "sum:" << sum << "\n";
std::cout << "elapsed time: " << elapsed.count() << "ms" << "\n";
std::cout << "------TINN------\n";
}
int main()
{
Tinn sTinn = xtbuild(in, hid, out);
xtsave(sTinn, path);
Tinn tinn1 = xtload(path);
MLP_1<float, in, hid, out> mlp;
mlp.load(path);
std::array<float, in> inputTest;
fill_with_random_values(inputTest.begin(), inputTest.end(), -10.0, 10.0);
testMLP(mlp, inputTest);
std::cout << "\n";
testTinn(tinn1, inputTest);
return 0;
}
With g++ -std=c++14 -O0 main.cpp I get:
------MLP------
sum:33171.4
elapsed time: 6524ms
------MLP------
------TINN------
sum:33171.4
elapsed time: 2256ms
------TINN------
With g++ -std=c++14 -O3 main.cpp I get:
------MLP------
sum:19567.4
elapsed time: 758ms
------MLP------
------TINN------
sum:19567.4
elapsed time: 739ms
------TINN------
With dynamic memory allocation, the slow part is allocating and freeing memory. There is no memory allocation in the loop you measure, so there is no reason to expect the dynamically allocated version to be slower. And indeed, with -O3 optimization, the runtimes are almost identical.
One difference between the programs that could affect runtime is the use of different random number generators. std::mt19937 is vastly better than rand(), but might be slower.

C++: Sampling from discrete distribution without replacement

I'd like to sample from a discrete distribution without replacement (i.e., without repetition).
With the function discrete_distribution, it is possible to sample with replacement. And, with this function, I implemented sampling without replacement in a very rough way:
#include <iostream>
#include <random>
#include <vector>
#include <array>
int main()
{
const int sampleSize = 8; // Size of the sample
std::vector<double> weights = {2,2,1,1,2,2,1,1,2,2}; // 10 possible outcome with different weights
std::random_device rd;
std::mt19937 generator(rd());
/// WITH REPLACEMENT
std::discrete_distribution<int> distribution(weights.begin(), weights.end());
std::array<int, 10> p ={};
for(int i=0; i<sampleSize; ++i){
int number = distribution(generator);
++p[number];
}
std::cout << "Discrete_distribution with replacement:" << std::endl;
for (int i=0; i<10; ++i)
std::cout << i << ": " << std::string(p[i],'*') << std::endl;
/// WITHOUT REPLACEMENT
p = {};
for(int i=0; i<sampleSize; ++i){
std::discrete_distribution<int> distribution(weights.begin(), weights.end());
int number = distribution(generator);
weights[number] = 0; // the weight associate to the sampled value is set to 0
++p[number];
}
std::cout << "Discrete_distribution without replacement:" << std::endl;
for (int i=0; i<10; ++i)
std::cout << i << ": " << std::string(p[i],'*') << std::endl;
return 0;
}
Have you ever coded such sampling without replacement? Probably in a more optimized way?
Thank you.
Cheers,
T.A.
This solution might be a bit shorter. Unfortunately, it needs to create a discrete_distribution<> object in every step, which might be prohibitive when drawing a lot of samples.
#include <iostream>
#include <boost/random/discrete_distribution.hpp>
#include <boost/random/mersenne_twister.hpp>
using namespace boost::random;
int main(int, char**) {
std::vector<double> w = { 2, 2, 1, 1, 2, 2, 1, 1, 2, 2 };
discrete_distribution<> dist(w);
int n = 10;
boost::random::mt19937 gen;
std::vector<int> samples;
for (auto i = 0; i < n; i++) {
samples.push_back(dist(gen));
w[*samples.rbegin()] = 0;
dist = discrete_distribution<>(w);
}
for (auto iter : samples) {
std::cout << iter << " ";
}
return 0;
}
Improved answer:
After carefully looking for a similar question on this site (Faster weighted sampling without replacement), I found a stunningly simple algorithm for weighted sampling without replacement, it is just a bit complicated to implement in C++. Note, that this is not the most efficient algorithm, but it seems to me the simplest one to implement.
In https://doi.org/10.1016/j.ipl.2005.11.003 the method is described in detail.
Especially, it is not efficient if the sample size is much smaller than the basic population.
#include <iostream>
#include <iterator>
#include <boost/random/uniform_01.hpp>
#include <boost/random/mersenne_twister.hpp>
using namespace boost::random;
int main(int, char**) {
std::vector<double> w = { 2, 2, 1, 1, 2, 2, 1, 1, 2, 10 };
uniform_01<> dist;
boost::random::mt19937 gen;
std::vector<double> vals;
std::generate_n(std::back_inserter(vals), w.size(), [&dist,&gen]() { return dist(gen); });
std::transform(vals.begin(), vals.end(), w.begin(), vals.begin(), [&](auto r, auto w) { return std::pow(r, 1. / w); });
std::vector<std::pair<double, int>> valIndices;
size_t index = 0;
std::transform(vals.begin(), vals.end(), std::back_inserter(valIndices), [&index](auto v) { return std::pair<double,size_t>(v,index++); });
std::sort(valIndices.begin(), valIndices.end(), [](auto x, auto y) { return x.first > y.first; });
std::vector<int> samples;
std::transform(valIndices.begin(), valIndices.end(), std::back_inserter(samples), [](auto v) { return v.second; });
for (auto iter : samples) {
std::cout << iter << " ";
}
return 0;
}
Easier answer
I just removed some of the STL functions and replaced it with simple for loops.
#include <iostream>
#include <iterator>
#include <boost/random/uniform_01.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <algorithm>
using namespace boost::random;
int main(int, char**) {
std::vector<double> w = { 2, 2, 1, 1, 2, 2, 1, 1, 2, 1000 };
uniform_01<> dist;
boost::random::mt19937 gen(342575235);
std::vector<double> vals;
for (auto iter : w) {
vals.push_back(std::pow(dist(gen), 1. / iter));
}
// Sorting vals, but retain the indices.
// There is unfortunately no easy way to do this with STL.
std::vector<std::pair<int, double>> valsWithIndices;
for (size_t iter = 0; iter < vals.size(); iter++) {
valsWithIndices.emplace_back(iter, vals[iter]);
}
std::sort(valsWithIndices.begin(), valsWithIndices.end(), [](auto x, auto y) {return x.second > y.second; });
std::vector<size_t> samples;
int sampleSize = 8;
for (auto iter = 0; iter < sampleSize; iter++) {
samples.push_back(valsWithIndices[iter].first);
}
for (auto iter : samples) {
std::cout << iter << " ";
}
return 0;
}
The existing answer by Aleph0 works the best of the ones I tested. I tried benchmarking the original solution, the one added by Aleph0, and a new one where you only make a new discrete_distribution when the existing one is over 50% already added items (redrawing when distribution produces an item already in the sample).
I tested with sample size == population size, and weights equal the index. I think the original solution in the question runs in O(n^2), my new one runs in O(n logn) and the one from the paper seems to run in O(n).
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
BM_Reuse 25252721 ns 25251731 ns 26
BM_NewDistribution 17338706125 ns 17313620000 ns 1
BM_SomePaper 6789525 ns 6779400 ns 100
Code:
#include <array>
#include <benchmark/benchmark.h>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_01.hpp>
#include <iostream>
#include <iterator>
#include <random>
#include <vector>
const int sampleSize = 20000;
using namespace boost::random;
static void BM_ReuseDistribution(benchmark::State &state) {
std::vector<double> weights;
weights.resize(sampleSize);
for (auto _ : state) {
for (int i = 0; i < sampleSize; i++) {
weights[i] = i + 1;
}
std::random_device rd;
std::mt19937 generator(rd());
int o[sampleSize];
std::discrete_distribution<int> distribution(weights.begin(),
weights.end());
int numAdded = 0;
int distSize = sampleSize;
for (int i = 0; i < sampleSize; ++i) {
if (numAdded > distSize / 2) {
distSize -= numAdded;
numAdded = 0;
distribution =
std::discrete_distribution<int>(weights.begin(), weights.end());
}
int number = distribution(generator);
if (!weights[number]) {
i -= 1;
continue;
} else {
weights[number] = 0;
o[i] = number;
numAdded += 1;
}
}
}
}
BENCHMARK(BM_ReuseDistribution);
static void BM_NewDistribution(benchmark::State &state) {
std::vector<double> weights;
weights.resize(sampleSize);
for (auto _ : state) {
for (int i = 0; i < sampleSize; i++) {
weights[i] = i + 1;
}
std::random_device rd;
std::mt19937 generator(rd());
int o[sampleSize];
for (int i = 0; i < sampleSize; ++i) {
std::discrete_distribution<int> distribution(weights.begin(),
weights.end());
int number = distribution(generator);
weights[number] = 0;
o[i] = number;
}
}
}
BENCHMARK(BM_NewDistribution);
static void BM_SomePaper(benchmark::State &state) {
std::vector<double> w;
w.resize(sampleSize);
for (auto _ : state) {
for (int i = 0; i < sampleSize; i++) {
w[i] = i + 1;
}
uniform_01<> dist;
boost::random::mt19937 gen;
std::vector<double> vals;
std::generate_n(std::back_inserter(vals), w.size(),
[&dist, &gen]() { return dist(gen); });
std::transform(vals.begin(), vals.end(), w.begin(), vals.begin(),
[&](auto r, auto w) { return std::pow(r, 1. / w); });
std::vector<std::pair<double, int>> valIndices;
size_t index = 0;
std::transform(
vals.begin(), vals.end(), std::back_inserter(valIndices),
[&index](auto v) { return std::pair<double, size_t>(v, index++); });
std::sort(valIndices.begin(), valIndices.end(),
[](auto x, auto y) { return x.first > y.first; });
std::vector<int> samples;
std::transform(valIndices.begin(), valIndices.end(),
std::back_inserter(samples),
[](auto v) { return v.second; });
}
}
BENCHMARK(BM_SomePaper);
BENCHMARK_MAIN();
Thanks for your question and others' nice answer, I meet a same qustion as you. I think you needn't new distribution every time, instead of
dist.param({ wts.begin(), wts.end() });
complete codes are as follows:
//STL改进方案
#include <iostream>
#include <vector>
#include <random>
#include <iomanip>
#include <map>
#include <set>
int main()
{
//随机数引擎采用默认引擎
std::default_random_engine rng;
//随机数引擎采用设备熵值保证随机性
auto gen = std::mt19937{ std::random_device{}() };
std::vector<int> wts(24); //存储权重值
std::vector<int> in(24); //存储总体
std::set<int> out; //存储抽样结果
std::map<int, int> count; //输出计数
int sampleCount = 0; //抽样次数计数
int index = 0; //抽取的下标
int sampleSize = 24; //抽取样本的数量
int sampleTimes = 100000; //抽取样本的次数
//权重赋值
for (int i = 0; i < 24; i++)
{
wts.at(i) = 48 - 2 * i;
}
//总体赋值并输出
std::cout << "总体为24个:" << std::endl;
//赋值
for (int i = 0; i < 24; i++)
{
in.at(i) = i + 1;
std::cout << in.at(i) << " ";
}
std::cout << std::endl;
//产生按照给定权重的离散分布
std::discrete_distribution<size_t> dist{ wts.begin(), wts.end() };
auto probs = dist.probabilities(); // 返回概率计算结果
//输出概率计算结果
std::cout << "总体中各数据的权重为:" << std::endl;
std::copy(probs.begin(), probs.end(), std::ostream_iterator<double>
{ std::cout << std::fixed << std::setprecision(5), “ ”});
std::cout << std::endl << std::endl;
//==========抽样测试==========
for (size_t j = 0; j < sampleTimes; j++)
{
index = dist(gen);
//std::cout << index << “ ”; //输出抽样结果
count[index] += 1; //抽样结果计数
}
double sum = 0.0; //用于概率求和
//输出抽样结果
std::cout << "总共抽样" << sampleTimes << "次," << "各下标的频数及频率为:" << std::endl;
for (size_t i = 0; i < 24; i++)
{
std::cout << i << "共有" << count[i] << "个 频率为:" << count[i] / double(sampleTimes) << std::endl;
sum += count[i] / double(sampleTimes);
}
std::cout << "总频率为:" << sum << std::endl << std::endl; //输出总概率
//==========抽样测试==========
//从总体中抽样放入集合中,直至集合大小达到样本数
while (out.size() < sampleSize - 1)
{
index = dist(gen); //抽取下标
out.insert(index); //插入集合
sampleCount += 1; //抽样次数增加1
wts.at(index) = 0; //将抽取到的下标索引的权重设置为0
dist.param({ wts.begin(), wts.end() });
probs = dist.probabilities(); // 返回概率计算结果
//输出概率计算结果
std::cout << "总体中各数据的权重为:" << std::endl;
std::copy(probs.begin(), probs.end(), std::ostream_iterator<double>
{ std::cout << std::fixed << std::setprecision(5), “ ”});
std::cout << std::endl << std::endl;
}
//最后一次抽取,单独出来是避免将所有权重都为0,的权重数组赋值给离散分布dist,避免报错
index = dist(gen); //抽取下标
out.insert(index); //插入集合
sampleCount += 1; //抽样次数增加1
//输出抽样结果
std::cout << "从总体中抽取的" << sampleSize << "个样本的下标索引为:" << std::endl;
for (auto iter : out)
{
std::cout << iter << “-”;
}
std::cout << std::endl;
//输出抽样次数
std::cout << "抽样次数为:" << sampleCount << std::endl;
out.clear(); //清空输出集合,为下次抽样做准备
std::cin.get(); //保留控制台窗口
return 0;
}

Sorting 32 bit ints is as fast as sorting 64 bit ints

Here are things I believe are facts:
Quick sort should be fairly cache friendly.
A cache line of 64 bytes can contain 16 32-bit ints, or 8 64-bit ints.
Hypothesis:
Sorting a vector of 32-bit integers should be faster than sorting a
vector of 64-bit integers.
But when I run the code below I get the result:
i16 = 7.5168
i32 = 7.3762
i64 = 7.5758
Why am I not getting the results I want?
C++:
#include <iostream>
#include <vector>
#include <cstdint>
#include <algorithm>
#include <chrono>
int main() {
const int vlength = 100'000'000;
const int maxI = 50'000;
std::vector<int16_t> v16;
for (int i = 0; i < vlength; ++i) {
v16.push_back(int16_t(i%maxI));
}
std::random_shuffle(std::begin(v16), std::end(v16));
std::vector<int32_t> v32;
std::vector<int64_t> v64;
for (int i = 0; i < vlength; ++i) {
v32.push_back(int32_t(v16[i]));
v64.push_back(int64_t(v16[i]));
}
auto t1 = std::chrono::high_resolution_clock::now();
std::sort(std::begin(v16), std::end(v16));
auto t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i16 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;
t1 = std::chrono::high_resolution_clock::now();
std::sort(std::begin(v32), std::end(v32));
t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i32 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;
t1 = std::chrono::high_resolution_clock::now();
std::sort(std::begin(v64), std::end(v64));
t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i64 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;
}
EDIT:
In order to avoid the question of how cache friendly sort is, I've also tried the following code:
template <typename T>
inline void function_speed(T& vec) {
for (auto& i : vec) {
++i;
}
}
int main() {
const int nIter = 1000;
std::vector<int16_t> v16(1000000);
std::vector<int32_t> v32(1000000);
std::vector<int64_t> v64(1000000);
auto t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < nIter; ++i) {
function_speed(v16);
}
auto t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i16 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;
t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < nIter; ++i) {
function_speed(v32);
}
t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i32 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;
t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < nIter; ++i) {
function_speed(v64);
}
t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i64 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;
}
Typical result:
i16 = 0.00618648
i32 = 0.00617911
i64 = 0.00606275
I know that proper benchmarking is a science of itself, perhaps I am doing to wrong.
EDIT2:
By avoiding overflowing I am now starting to get more interesting results:
template <typename T>
inline void function_speed(T& vec) {
for (auto& i : vec) {
++i;
i %= 1000;
}
}
Gives results such as:
i16 = 0.0143789
i32 = 0.00958941
i64 = 0.019691
If I instead do:
template <typename T>
inline void function_speed(T& vec) {
for (auto& i : vec) {
i = (i+1)%1000;
}
}
I get:
i16 = 0.00939448
i32 = 0.00913768
i64 = 0.019615
Mistaken assumption; all O(N log N) sorting algorithms have to be cache-unfriendly for the vast majority of the N! possible inputs.
Furtehrmore, I think an optimizing compiler can remove the sorts outright, and an unoptimized build will of course be pointless to benchmark.

Storing an array in a separate class file

I am coding for the Ludum Dare right now and I was trying to make a separate class that would give me an array as the return type of a function. I have an array set up, but I can't figure out how to make the return type an array so that I can use it in the main function. How would I go about returning an array and setting a variable in the main.cpp to that array?
Here are a couple of examples, each with their own advantages:
#include <iostream>
// C++11 #include <array>
#include <vector>
void myVectorFunc1(std::vector<int>& data)
{
for (unsigned i = 0; i < data.size(); ++i)
data[i] = 9;
data.push_back(1);
data.push_back(2);
data.push_back(3);
}
std::vector<int> myVectorFunc2(void)
{
std::vector<int> data;
data.push_back(1);
data.push_back(2);
data.push_back(3);
return data;
}
/* C++ 11
template<std::size_t S>
void myArrayFunc1(std::array<int, S>& arr)
{
for (auto it = arr.begin(); it != arr.end(); ++it)
*it = 9;
}
std::array<int,5> myArrayFunc2(void)
{
std::array<int,5> myArray = { 0, 1, 2, 3, 4 };
return myArray;
}
*/
int main(int argc, char** argv)
{
// Method 1: Pass a vector by reference
std::vector<int> myVector1(10, 2);
myVectorFunc1(myVector1);
std::cout << "myVector1: ";
for (unsigned i = 0; i < myVector1.size(); ++i)
std::cout << myVector1[i];
std::cout << std::endl;
// Method 2: Return a vector
std::vector<int> myVector2 = myVectorFunc2();
std::cout << "myVector2: ";
for (unsigned i = 0; i < myVector2.size(); ++i)
std::cout << myVector2[i];
std::cout << std::endl;
/* C++11
// Method 3: Pass array by reference
std::array<int, 3> myArray1;
std::cout << "myArray1: ";
myArrayFunc1(myArray1);
for (auto it = myArray1.begin(); it != myArray1.end(); ++it)
std::cout << *it;
std::cout << std::endl;
// Method 4: Return an array
std::cout << "myArray2: ";
std::array<int,5> myArray2 = myArrayFunc2();
for (auto it = myArray2.begin(); it != myArray2.end(); ++it)
std::cout << *it;
std::cout << std::endl;
*/
return 0;
}
# include <iostream>
int * func1()
{
int* array = (int *)malloc(sizeof(int) * 2);
array[0] = 1;
array[1] = 5;
return array;
}
int main()
{
int * arrayData = func1();
int len = sizeof(arrayData)/sizeof(int);
for (int i = 0; i < len; i++)
{
std::cout << arrayData[i] << std::endl;
}
}
Please check https://stackoverflow.com/a/5503643/1903116 to know why not to do this. and quoting from that answer
Functions shall not have a return type of type array or function,
although they may have a return type of type pointer or reference to
such things.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1905.pdf Page 159 - Section 6