actually I am working with tensor storage which dimension is 61872578*33 . Now I am trying to store this integer value in to a vector but unfortunately after a certain period the codeblocks show a message that is std::bad_alloc. Now my question is how can I solve this problem? Is there any solution?here is my code.
#include <iostream>
#include<vector>
#include <stdio.h>
#include <cstdlib>
#include <ctime>
#include <fstream>
#include <sstream>
using namespace std;
int main(){
ofstream bcrs_tensor;
bcrs_tensor.open("bcrs_tensor_Binary", ios::out | ios::binary);
int X,Y,Z,M;
printf("Enter size of 1st dimension X= ");
scanf("%d",&X);
printf("\n Enter size of 2nd dimension Y= ");
scanf("%d",&Y);
printf("\n Enter size of 3rd dimension Z= ");
scanf("%d",&Z);
printf("\n Enter size of 4th dimension M= ");
scanf("%d",&M);
printf("\n");
int new_dimension_1,new_dimension_2,new_x_1,new_x_2;
new_dimension_1=X*Z;
new_dimension_2=Y*M;
int* new_A = new int[ new_dimension_1*new_dimension_2 ];
/* // filup tensor with zero
for(int i =0; i<new_dimension_1; i++){
for(int j= 0; j< new_dimension_2; j++){
*(new_A + i*new_dimension_2 + j)=0;
}
}
*/
//read tensor value from file
ifstream read_tensor("Chicago_fourToTwo_d.txt");
int row,col,val;
if(read_tensor.is_open()){
while(read_tensor >> row >> col >> val){
*(new_A + row*new_dimension_2 + col)=val;
}
}
int x,block_ROW,block_COL;
for(x=11; x<=new_dimension_1;x++ ){
if(new_dimension_1%x == 0){
block_ROW=x;
printf("block ROW %d\n",block_ROW);
break;
}
}
for(x=13; x<=new_dimension_2;x++ ){
if(new_dimension_2%x == 0){
block_COL=x;
printf("block COL %d\n",block_COL);
break;
}
}
cout<<"here"<<endl;
int a,b,c,d,e,f,non_zero;
vector<int> block_value,CO_BCRS,RO_BCRS;
int NZB=0;
RO_BCRS.push_back(0);
for(a=0; a<new_dimension_1; a=a+block_ROW){
for(b=0; b<new_dimension_2; b=b+block_COL){
non_zero=0;
for(c=a; c<a+block_ROW; c++){
for(d=b; d<b+block_COL; d++){
printf("[%d][%d]\n",c,d);
if(*(new_A + c*new_dimension_2 + d)!=0){
non_zero++;
}
}
}
if(non_zero!=0){
for(e=a; e<a+block_ROW; e++){
for(f=b; f<b+block_COL; f++){
block_value.push_back(*(new_A + e*new_dimension_2 + f));
}
}
CO_BCRS.push_back(b);
NZB++;
}
}
RO_BCRS.push_back(NZB);
}
cout<<"Block value"<<endl;
for(vector<int>::iterator itr=block_value.begin();itr!=block_value.end();++itr){
cout<< " " << *itr ;
}
cout<<endl;
cout<<"CO_BCRS"<<endl;
for(vector<int>::iterator itr=CO_BCRS.begin();itr!=CO_BCRS.end();++itr){
cout<< " " << *itr ;
}
cout<<endl;
cout<<"RO_BCRS"<<endl;
for(vector<int>::iterator itr=RO_BCRS.begin();itr!=RO_BCRS.end();++itr){
cout<< " " << *itr ;
}
cout<<endl;
//block_value
int block_value_S=block_value.size();
cout<<"block_value_S "<< block_value_S <<endl;
int block_value_val;
for(int i=0; i<block_value_S;i++){
block_value_val = block_value[i];
bcrs_tensor.write((char *) &block_value_val, sizeof(int));
}
//CO_BCRS
int CO_BCRS_S=CO_BCRS.size();
cout<<"CO_BCRS_S "<< CO_BCRS_S <<endl;
int CO_BCRS_val;
for(int i=0; i<CO_BCRS_S;i++){
CO_BCRS_val = CO_BCRS[i];
bcrs_tensor.write((char *) &CO_BCRS_val, sizeof(int));
}
//RO_BCRS
int RO_BCRS_S=RO_BCRS.size();
cout<<"RO_BCRS_S "<< RO_BCRS_S <<endl;
int RO_BCRS_val;
for(int i=0; i<RO_BCRS_S;i++){
RO_BCRS_val = RO_BCRS[i];
bcrs_tensor.write((char *) &RO_BCRS_val, sizeof(int));
}
bcrs_tensor.close();
return 0;
}
The theoretical limit for size of a vector is given by std::vector::max_size(). However, this value just takes into account limitations of the implementation. Much sooner you will run out of memory.
Vectors typically increase their capacity by factors of 2 (3 or other factors can be used as well). Hence if you call push_back N-times, the memory footprint can be as big as 2*N. And as vectors store their elements in contiguous memory, you might get a bad_alloc on reallocation, even though you still have enough memory for the elements you want to store. You can ask for only the capacity you actually need by calling std::vector::reserve.
Other than that you cannot keep more in memory than you have memory. So you basically have only two options: Get more ram or find a way to store less elements in the vector.
If you have a tensor of 61872578 * 33 int values, then on most architectures you need 61872578*33*4 bytes to store it - that's 7.6 GB of RAM just for one tensor.
You have four choices:
Change your requirements so that you no longer need a tensor that big.
Install enough RAM to store, manipulate, and reallocate all this in memory.
Set up a swap partition or a swap file with enough capacity so that the operating system will be able to put parts of the tensor on the disk. Unless your algorithms traverse the data more-or-less sequentially (or chunks of millions of sequential elements at a time) this will be unreasonably slow. This may be so slow that you may think your computer or your program stuck.
Rewrite your algorithm, so that it will work on parts of your tensor at a time:
If you want to run a convolution for a CNN then you can create a sliding window of K rows (assuming the height of your weights-tensor is K), and get rid of an old row right before reading a new one. You will have to write the result tensor as you go instead of having it all in RAM. A similar approach will work with pooling.
If you want to multiply your tensor by another one then break up your tensor into many smaller tensors, then do the multiplication on the smaller tensors. You can find a rough idea at https://en.wikipedia.org/wiki/Block_matrix .
Adding two tensors requires almost no data to be stored in RAM, just sequentially read the numbers from both tensors, add them up, and immediately store the result to disk. The lower layers (C library and OS) will employ reasonable buffering in order to make this run OK.
Note that options 2 and 3 are acceptable only if a runtime of minutes is acceptable. If you want sub-second performance then only options 0 and 1 are relevant. I may be overly pessimistic regarding to the speed of option 3 - your millage may vary depending on the throughput of your disk (mostly for option 3) and/or its latency (possibly option 2).
Related
I have tried to solve the following problem: "n numbers are given. For every one of them, calculate the sum 1+(1+2)+(1+2+3)+...+(1+2+3+...+x) and output the results in a separate vector". I have gotten a formula for the sum and have implemented it into c++. The code seems to be working perfectly on vs code, but when I upload it to the website where I got the problem from I get 0 points, with the explanation: Caught fatal signal 11. I have read some articles but none of them have helped me crack the problem. Here is the code:
#include <iostream>
#define N 1000000
using namespace std;
int v[N], s[N];
int main()
{
int n, i;
cin >> n; // user inputs number of elements
for(i = 0; i < n; i++)
{
cin >> v[i]; // user inputs the elements
s[i] = v[i]*(v[i] + 1)*(v[i] + 2) / 6; // another vector is calculated using the formula
cout << s[i] << " ";
}
return 0;
}
Your problem is two fold. One, you're not using vectors, you're using C-style arrays. And two, you're running out of memory due to the size of your C-style arrays.
C-style arrays are allocated on the stack, and there is only limited memory on the stack. So if you allocate, say, two arrays of size, I don't know, 1000000, you might run into some memory issues.
On the other hand, C++ vectors are more modern, allow for dynamic resizing (useful for when you don't know how many elements you're going to have, I doubt you'll have 1000000), and it is also memory stored on the heap, giving you a lot more breathing room when it comes to memory.
Vectors work like this:
#include <vector>
std::vector<int> vec; // creates the vector
vec.push_back(5); // adds an element equal to 5 to the vector, vec size is now 1
vec.push_back(6); // adds an element equal to 6 to the vector, vec size is now 2
std::cout << vec[0]; // elements are accessed just like a C-style array
I am a beginner to C++ syntax. Now, I need to create an mxn 2D array in C++ to use it in another project. I have looked at other answers which involve using tools like vector, etc. Many tools are not working on my Visual Studio 15 i.e. for vector I can not define with std::vector without a message like vector is not in std. So, I have wrote the following code:
#include "stdafx.h"
#include <iostream>
using namespace std;
int main()
{
int i; int j; int row[5][10] = {};
for (int j = 0; j < 10;)
for (int i = 0; i < 5;)
{
row[i][j] = 500;
int printf(row[i][j]);
i++;
j++;
cout << "Array:" << row[i][j] << endl;
}
return 0;
}
Surely, this is not the correct syntax. So the output is beyond my expectation. I want to create an m*n array with all the elements being the same integer; 500 in this case. That is, if m=3, n=2, I should get
500 500 500
500 500 500
There's a couple things wrong with your current code.
The first for loop is missing curly brackets
You're redefining int i and int j in your for loop. Not a complilation issue but still an issue.
You're using printf incorrectly. printf is used to output strings to the console. The correct line would be printf("%d", row[i][j]);
If you want to use a vector, you have to include it using #include <vector>. You can use a vector very similar to an array, but you don't have to worry about size.
You seem to be learning. So, I did minimal correctios to make it work. I suggest you to make modifications as per your needs.
#include <iostream>
using namespace std;
int main()
{
int row[5][10] = {};
for (int j = 0; j < 10; j++) {
for (int i = 0; i < 5; i++) {
row[i][j] = 500;
cout << row[i][j] << " ";
}
cout << endl;
}
return 0;
}
Care and feeding of std::vector using OP's program as an example.
#include <iostream>
#include <vector> // needed to get the code that makes the vector work
int main()
{
int m, n; // declare m and n to hold the dimensions of the vector
if (std::cin >> m >> n) // get m and n from user
{ // m and n are good we can continue. Well sort of good. The user could
// type insane numbers that will explode the vector, but at least they
// are numbers.
// Always test ALL user input before you use it. Users are malicious
// and incompetent <expletive delteted>s, so never trust them.
// That said, input validation is a long topic and out of scope for this
// answer, so I'm going to let trapping bad numbers pass in the interests
// of keeping it simple
// declare row as a vector of vectors
std::vector<std::vector<int>> row(m, std::vector<int> (n, 500));
// breaking this down:
// std::vector<std::vector<int>> row
// row is a vector of vectors of ints
// row(m, std::vector<int> (n, 500));
// row's outer vector is m std::vector<int>s constructed with
// n ints all set to 500
for (int j = 0; j < n; j++) // note: j++ has been moved here. This is
// exactly what the third part of a for
// statement is for. Less surprises for
// everyone this way
// note to purists. I'm ignoring the possible advantages of ++j because
// explaining them would muddy the answer.
// Another note: This will output the transverse of row because of the
// ordering of i and j;
{
for (int i = 0; i < m; i++) // ditto I++ here
{
// there is no need to assign anything here. The vector did
// it for us
std::cout << " " << row[i][j]; // moved the line ending so that
// the line isn't ended with
// every column
}
std::cout << '\n'; // end the line on the end of a row
// Note: I also changed the type of line ending. endl ends the line
// AND writes the contents of the output stream to whatever media
// the stream represents (in this case the console) rather than
// buffering the stream and writing at a more opportune time. Too
// much endl can be a performance killer, so use it sparingly and
// almost certainly not in a loop
}
std::cout << std::endl; // ending the line again to demonstrate a better
// placement of endl. The stream is only forced
// to flush once, right at the end of the
// program
// even this may be redundant as the stream will
// flush when the program exits, assuming the
// program does not crash on exit.
}
else
{ // let the use know the input was not accepted. Prompt feedback is good
// otherwise the user may assume everything worked, or in the case of a
// long program, assume that it crashed or is misbehaving and terminate
// the program.
std::cout << "Bad input. Program exiting" << std::endl;
}
return 0;
}
One performance note a vector of vectors does not provide one long block of memory. It provides M+1 blocks of memory that may be anywhere in storage. Normally when a modern CPU reads a value from memory, it also reads values around it off the assumption that if you want the item at location X, you'll probably want the value at location X+1 shortly after. This allows the CPU to load up, "cache", many values at once. This doesn't work if you have to jump around through memory. This means the CPU may find itself spending more time retrieving parts of a vector of vectors than it does processing a vector of vectors. The typical solution is to fake a 2D data structure with a 1D structure and perform the 2D to 1D mapping yourself.
So:
std::vector<int> row(m*n, 500);
Much nicer looking, yes? Access looks a bit uglier, though
std::cout << " " << row[i * n + j];
Fun thing is, the work done behind the scenes converting row[j][i] to a memory address is almost identical to row[j*n+i] so even though you show more work, it doesn't take any longer. Add to this the benefits you get from the CPU successfully predicting and reading ahead and your program is often vastly faster.
My code is to extract odd number and even number in an 1D array.
#include <iostream>
using namespace std;
int main() {
int a[6] = {1,6,3,8,5,10};
int odd[]={};
int even[]={};
for (int i=0; i < 6; i++) {
cin >> a[i];
}
for (int i=0; i < 6; i++) {
if (a[i] % 2 == 1) {
odd[i] = a[i];
cout << odd[i] << endl;
}
}
cout << " " << endl;
for (int i=0; i < 6; i++) {
if (a[i] % 2 == 0) {
even[i] = a[i];
cout << even[i] << endl;
}
}
return 0;
}
the output is:
1
3
5
2
1
6
It shows that it successfully extract odd numbers but the same method applied to the even number. It comes with an issue while the even number is 4.
Could anyone help me find the cause here? Thanks.
You've got an Undefined Behavior, so result may be any, even random, even formatted hard drive.
int odd[] = {} is the same as int odd[/*count of elements inside {}*/] = {/*nothing*/}, so it's int odd[0];
Result is not defined when you're accessing elements besides the end of array.
You probably have to think about correct odd/even arrays size, or use another auto-sizeable data structure.
First, although not causing a problem, you initialize an array with data and then overwrite it. The code
int a[6] = {1,6,3,8,5,10};
can be replaced with
int a[6];
Also, as stated in the comments,
int odd[]={};
isn't valid. You should either allocate a buffer as big as the main buffer (6 ints) or use a vector (although I personally prefer c-style arrays for small sizes, because they avoid heap allocations and extra complexity). With the full-size buffer technique, you need a value like -1 (assuming you intend to only input positive numbers) to store after the list of values in the arrays to tell your output code to stop reading, or store the sizes somewhere. This is to prevent reading values that haven't been set.
I don't understand your problem when 4 is in the input. Your code looks fine except for your arrays.
You can use std::vector< int > odd; and then call only odd.push_back(elem) whem elem is odd.
I am interested in porting some existing code to use thrust to see if I can speed it up on the GPU with relative ease.
What I'm looking to accomplish is a stream compaction operation, where only nonzero elements will be kept. I have this mostly working, per the example code below. The part that I am unsure of how to tackle is dealing with all the extra fill space that is in d_res and thus h_res, after the compaction happens.
The example just uses a 0-99 sequence with all the even entries set to zero. This is just an example, and the real problem will be a general sparse array.
This answer here helped me greatly, although when it comes to reading out the data, the size is just known to be constant:
How to quickly compact a sparse array with CUDA C?
I suspect that I can work around this by counting the number of 0's in d_src, and then only allocating d_res to be that size, or doing the count after the compaction, and only copying that many element. Is that really the right way to do it?
I get the sense that there will be some easy fix for this, via clever use of iterators or some other feature of thrust.
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
//Predicate functor
struct is_not_zero
{
__host__ __device__
bool operator()(const int x)
{
return (x != 0);
}
};
using namespace std;
int main(void)
{
size_t N = 100;
//Host Vector
thrust::host_vector<int> h_src(N);
//Fill with some zero and some nonzero data, as an example
for (int i = 0; i < N; i++){
if (i % 2 == 0){
h_src[i] = 0;
}
else{
h_src[i] = i;
}
}
//Print out source data
cout << "Source:" << endl;
for (int i = 0; i < N; i++){
cout << h_src[i] << " ";
}
cout << endl;
//copies to device
thrust::device_vector<int> d_src = h_src;
//Result vector
thrust::device_vector<int> d_res(d_src.size());
//Copy non-zero elements from d_src to d_res
thrust::copy_if(d_src.begin(), d_src.end(), d_res.begin(), is_not_zero());
//Copy back to host
thrust::host_vector<int> h_res(d_res.begin(), d_res.end());
//thrust::host_vector<int> h_res = d_res; //Or just this?
//Show results
cout << "h_res size is " << h_res.size() << endl;
cout << "Result after remove:" << endl;
for (int i = 0; i < h_res.size(); i++){
cout << h_res[i] << " ";
}
cout << endl;
return 0;
}
Also, I am a novice with thrust, so if the above code has any obvious flaws that go against recommended practices for using thrust, please let me know.
Similarly, speed is always of interest. Reading some of the various thrust tutorials, it seems like little changes here and there can be big speed savers or wasters. So, please let me know if there is a smart way to speed this up.
What you have appeared to have overlooked is that copy_if returns an iterator which points to the end of the copied data from the stream compaction operation. So all that is required is this:
//copies to device
thrust::device_vector<int> d_src = h_src;
//Result vector
thrust::device_vector<int> d_res(d_src.size());
//Copy non-zero elements from d_src to d_res
auto result_end = thrust::copy_if(d_src.begin(), d_src.end(), d_res.begin(), is_not_zero());
//Copy back to host
thrust::host_vector<int> h_res(d_res.begin(), result_end);
Doing this sizes h_res to only hold the non zeroes and only copies the non zeroes from the output of the stream compaction. No extra computation is required.
I'm pretty new at C++ and would need some advice on this.
Here I have a code that I wrote to measure the number of times an arbitrary integer x occurs in an array and to output the comparisons made.
However I've read that by using multi-way branching("Divide and conqurer!") techniques, I could make the algorithm run faster.
Could anyone point me in the right direction how should I go about doing it?
Here is my working code for the other method I did:
#include <iostream>
#include <cstdlib>
#include <vector>
using namespace std;
vector <int> integers;
int function(int vectorsize, int count);
int x;
double input;
int main()
{
cout<<"Enter 20 integers"<<endl;
cout<<"Type 0.5 to end"<<endl;
while(true)
{
cin>>input;
if (input == 0.5)
break;
integers.push_back(input);
}
cout<<"Enter the integer x"<<endl;
cin>>x;
function((integers.size()-1),0);
system("pause");
}
int function(int vectorsize, int count)
{
if(vectorsize<0) //termination condition
{
cout<<"The number of times"<< x <<"appears is "<<count<<endl;
return 0;
}
if (integers[vectorsize] > x)
{
cout<< integers[vectorsize] << " > " << x <<endl;
}
if (integers[vectorsize] < x)
{
cout<< integers[vectorsize] << " < " << x <<endl;
}
if (integers[vectorsize] == x)
{
cout<< integers[vectorsize] << " = " << x <<endl;
count = count+1;
}
return (function(vectorsize-1,count));
}
Thanks!
If the array is unsorted, just use a single loop to compare each element to x. Unless there's something you're forgetting to tell us, I don't see any need for anything more complicated.
If the array is sorted, there are algorithms (e.g. binary search) that would have better asymptotic complexity. However, for a 20-element array a simple linear search should still be the preferred strategy.
If your array is a sorted one you can use a divide to conquer strategy:
Efficient way to count occurrences of a key in a sorted array
A divide and conquer algorithm is only beneficial if you can either eliminate some work with it, or if you can parallelize the divided work parts accross several computation units. In your case, the first option is possible with an already sorted dataset, other answers may have addressed the problem.
For the second solution, the algorithm name is map reduce, which split the dataset in several subsets, distribute the subsets to as many threads or processes, and gather the results to "compile" them (the term is actually "reduce") in a meaningful result. In your setting, it means that each thread will scan its own slice of the array to count the items, and return its result to the "reduce" thread, which will add them up to return the final result. This solution is only interesting for large datasets though.
There are questions dealing with mapreduce and c++ on SO, but I'll try to give you a sample implementation here:
#include <utility>
#include <thread>
#include <boost/barrier>
constexpr int MAP_COUNT = 4;
int mresults[MAP_COUNT];
boost::barrier endmap(MAP_COUNT + 1);
void mfunction(int start, int end, int rank ){
int count = 0;
for (int i= start; i < end; i++)
if ( integers[i] == x) count++;
mresult[rank] = count;
endmap.wait();
}
int rfunction(){
int count = 0;
for (int i : mresults) {
count += i;
}
return count;
}
int mapreduce(){
vector<thread &> mthreads;
int range = integers.size() / MAP_COUNT;
for (int i = 0; i < MAP_COUNT; i++ )
mthreads.push_back(thread(bind(mfunction, i * range, (i+1) * range, i)));
endmap.wait();
return rfunction();
}
Once the integers vector has been populated, you call the mapreduce function defined above, which should return the expected result. As you can see, the implementation is very specialized:
the map and reduce functions are specific to your problem,
the number of threads used for map is static,
I followed your style and used global variables,
for convenience, I used a boost::barrier for synchronization
However this should give you an idea of the algorithm, and how you could apply it to similar problems.
caveat: code untested.