I have allocated a big double vector, lets say with 100000 element. At some point in my code, I want to set all elements to a constant, nonzero value. How can I do this without using a for loop over all elements?
I am also using the blas package, if it helps.
You could use std::fill (#include <algorithm>):
std::fill(v.begin(), v.end(), 1);
This is essentially also only a loop of course..
'fill' is right from what you've said.
Be aware that it's also possible to construct a vector full of a specified value:
std::vector<double> vec(100000, 3.14);
So if "at some point" means "immediately after construction", do this instead. Also, it means you can do this:
std::vector<double>(100000, 3.14).swap(vec);
which might be useful if "at some point" means "immediately after changing the size", and you expect/want the vector to be reallocated ("expect" if you're making it bigger than its prior capacity, "want" if you're making it much smaller and want it trimmed to save memory).
You always use memset() if you don't want to loop.
That is, memset(myarr, 5, arrsize); in order to fill it with all 5's. Beware of implicit conversion to unsigned char.
SYNOPSIS
#include <string.h>
void *
memset(void *b, int c, size_t len);
DESCRIPTION
The memset() function writes len bytes of value c (converted to an
unsigned char) to the byte string b.
And if the vector is large, and you need it to go fast and you are using gcc, then :
Code generation of block move (memcpy)
and block set (memset) was rewritten.
GCC can now pick the best algorithm
(loop, unrolled loop, instruction with
rep prefix or a library call) based on
the size of the block being copied and
the CPU being optimized for.
Related
I wonder what is the ideal way if you want to fill an array with a default value n:
#include <cstring> // for memset
#include <algorithm> // for fill_n
static constexpr int N = 100;
int a[N];
static constexpr int defaultValue = -1;
void* memset( void* dest, int ch, std::size_t count );
memset(a, defaultValue, sizeof(a));
(memset) converts the value ch to unsigned char and copies it into each of the first count characters of the object pointed to by dest. If the object is a potentially-overlapping subobject or is not TriviallyCopyable (e.g., scalar, C-compatible struct, or an array of trivially copyable type), the behavior is undefined. If count is greater than the size of the object pointed to by dest, the behavior is undefined.
or
constexpr OutputIt fill_n( OutputIt first, Size count, const T& value );
fill_n(a, N, defaultValue);
(fill_n) assigns the given value to the first count elements in the range beginning at first if count > 0. Does nothing otherwise.
I am looking for insights, I know how to read the documentation of course!
edit: defaultValue might not be only -1.
Both functions do different things. Sure, they fill a block of memory, but the way they do it is completely different.
memset operates at the byte level. defaultValue is hacked down to an unsigned char, so a defaultValue greater than what can fit into a single byte gets cut down to size and information is lost. The now-byte-sized value is applied individually to every byte, not every int in the array. In the case of -1 you get "lucky" because four bytes worth of 0xFF looks the same, 0xFFFFFFFF, as a two's compliment -1 in the world of 32-bit integers. No such luck for most other numbers. 1, for example, will not result in an array full of int's set to 1, it's filled with 0x01010101, or 16843009.
fill_n , on the other hand, respects the array element's type. Every int in the array will be set to defaultValue. in the case of a defaultValue of 1, the array will be full of 1s. defaultValue of 256, provides an array full of 256.
In terms of speed, it probably won't matter much. Memory read or written in bytes is a rare sight these days. Writing whole ints at a time may be faster. But a good memset implementation knows this and will be exploiting it. If it doesn't, the compiler likely will.
On my MS VS 2015 compiler, the sizeof int is 4 (bytes). But the sizeof vector<int> is 16. As far as I know, a vector is like an empty box when it's not initialized yet, so why is it 16? And why 16 and not another number?
Furthermore, if we have vector<int> v(25); and then initialize it with int numbers, then still the size of v is 16 although it has 25 int numbers! The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?
The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?
You're confusing sizeof(std::vector) and std::vector::size(), the former will return the size of vector itself, not including the size of elements it holds. The latter will return the count of the elements, you can get all their size by std::vector::size() * sizeof(int).
so why is it 16? And why 16 and not another number?
What is sizeof(std::vector) depends on implmentation, mostly implemented with three pointers. For some cases (such as debug mode) the size might increase for the convenience.
std::vector is typically a structure which contains two elements: pointer (array) of its elements and size of the array (number of elements).
As size is sizeof(void *) and the pointer is also sizeof(void *), the size of the structure is 2*sizeof(void *) which is 16.
The number of elements has nothing to do with the size as the elements are allocated on the heap.
EDIT: As M.M mentioned, the implementation could be different, like the pointer, start, end, allocatedSize. So in 32-bit environment that should be 3*sizeof(size_t)+sizeof(void *) which might be the case here. Even the original could work with start hardcoded to 0 and allocatedSize computed by masking end so really dependent on implementation. But the point remains the same.
sizeof is evaluated at compile time, so it only counts the size of the variables declared in the class, which probably includes a couple of counters and a pointer. It's what the pointer points to that varies with the size, but the compiler doesn't know about that.
The size can be explained using pointers which can be: 1) begin of vector 2) end of vector and 3) vector's capacity. So it would be more of like implementation dependent and it will change for different implementation.
You seem to be mixing "array" with "vector". If you have a local array, sizeof will provide the size of the array indeed. However, vector is not an array. It is a class, a container from STL guaranteeing that the memory contents are located within a single block of memory (that may get relocated, if vector grows).
Now, if you take a look at the std::vector implementation, you'll notice it contains fields (at least in MSVC 14.0):
size_type _Count = 0;
typename _Alloc_types::_Alty _Alval; // allocator object (from base)
_Mylast
_Myfirst
That could sum up to 16 bytes under your implementation (note: experience may vary).
Say I want to assign 5 to all the elements in a 2d array. First I tried memset
int a[3][4];
memset(a, 5, sizeof a);
and
int a[3][4];
memset(a, 5, sizeof(a[0][0])*3*4);
But the same result is all the elements becomes 84215045.
Then I tried with fill_n, it showed buildup failed. it seems fill_n cannot do with 2d array.
So is there any fast way to make all the elements in a 2d array to a certain value? in C++?
********************************************************************************
UPDATE
Thanks #paddy for the answer. Actually fill_n does work. The way I used it is like this, which fails to build up with my compiler.
fill_n(a,3*4,5);
#paddy's answer is correct, we can use it in this way for a 2d array.
fill_n(a[0],3*4,5);
Then I tried a little more, I found we can actually use this to deal with a 3d array, but it should be like this. Say for a[3][4][5].
fill_n(a[0][0],3*4*5,5);
Unfortunately, memset is only useful for setting every byte to a value. But that won't work when you want to set groups of bytes. Because of the memory layout of a 2D array, it's actually okay to use std::fill_n or std::fill from the first value:
std::fill_n( a[0], 3 * 4, 5 );
std::fill( a[0], a[3], 5 ); // Note a[3] is one past the end of array.
Depending on your compiler, something like this might even be vectorized for even faster execution. But even without that, you ought not to worry about speed -- std::fill is plenty fast.
sizeof(a) will give you the size of a pointer in byte, which will depend on the system it is running. On a 32 bit system or OS, it's 4 (32 bits = 4 bytes), on a 64 bits system/OS, it's 8.
to get the size of an int array of 3x4 elements, you should use sizeof(int)*(3*4).
So, you should use memset(a, 5, sizeof(int)*(3*4));
Can't seem to find the answer to this anywhere,
How do I memset an array to the maximum value of the array's type?
I would have thought memset(ZBUFFER,0xFFFF,size) would work where ZBUFFER is a 16bit integer array. Instead I get -1s throughout.
Also, the idea is to have this work as fast as possible (it's a zbuffer that needs to initialize every frame) so if there is a better way (and still as fast or faster), let me know.
edit:
as clarification, I do need a signed int array.
In C++, you would use std::fill, and std::numeric_limits.
#include <algorithm>
#include <iterator>
#include <limits>
template <typename IT>
void FillWithMax( IT first, IT last )
{
typedef typename std::iterator_traits<IT>::value_type T;
T const maxval = std::numeric_limits<T>::max();
std::fill( first, last, maxval );
}
size_t const size=32;
short ZBUFFER[size];
FillWithMax( ZBUFFER, &ZBUFFER[0]+size );
This will work with any type.
In C, you'd better keep off memset that sets the value of bytes. To initialize an array of other types than char (ev. unsigned), you have to resort to a manual for loop.
-1 and 0xFFFF are the same thing in a 16 bit integer using a two's complement representation. You are only getting -1 because either you have declared your array as short instead of unsigned short. Or because you are converting the values to signed when you output them.
BTW your assumption that you can set something except bytes using memset is wrong. memset(ZBUFFER, 0xFF, size) would have done the same thing.
In C++ you can fill an array with some value with the std::fill algorithm.
std::fill(ZBUFFER, ZBUFFER+size, std::numeric_limits<short>::max());
This is neither faster nor slower than your current approach. It does have the benefit of working, though.
Don't attribute speed to language. That's for implementations of C. There are C compilers that produce fast, optimal machine code and C compilers that produce slow, inoptimal machine code. Likewise for C++. A "fast, optimal" implementation might be able to optimise code that seems slow. Hence, it doesn't make sense to call one solution faster than another. I'll talk about the correctness, and then I'll talk about performance, however insignificant it is. It'd be a better idea to profile your code, to be sure that this is in fact the bottleneck, but let's continue.
Let us consider the most sensible option, first: A loop that copies int values. It is clear just by reading the code that the loop will correctly assign SHRT_MAX to each int item. You can see a testcase of this loop below, which will attempt to use the largest possible array allocatable by malloc at the time.
#include <limits.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void) {
size_t size = SIZE_MAX;
volatile int *array = malloc(size);
/* Allocate largest array */
while (array == NULL && size > 0) {
size >>= 1;
array = malloc(size);
}
printf("Copying into %zu bytes\n", size);
for (size_t n = 0; n < size / sizeof *array; n++) {
array[n] = SHRT_MAX;
}
puts("Done!");
return 0;
}
I ran this on my system, compiled with various optimisations enabled (-O3 -march=core2 -funroll-loops). Here's the output:
Copying into 1073741823 bytes
Done!
Process returned 0 (0x0) execution time : 1.094 s
Press any key to continue.
Note the "execution time"... That's pretty fast! If anything, the bottleneck here is the cache locality of such a large array, which is why a good programmer will try to design systems that don't use so much memory... Well, then let us consider the memset option. Here's a quote from the memset manual:
The memset() function copies c (converted to an unsigned char) into
each of the first n bytes of the object pointed to by s.
Hence, it'll convert 0xFFFF to an unsigned char (and potentially truncate that value), then assign the converted value to the first size bytes. This results in incorrect behaviour. I don't like relying upon the value SHRT_MAX to be represented as a sequence of bytes storing the value (unsigned char) 0xFFFF, because that's relying upon coincidence. In other words, the main problem here is that memset isn't suitable for your task. Don't use it. Having said that, here's a test, derived from the test above, which will be used to test the speed of memset:
#include <limits.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void) {
size_t size = SIZE_MAX;
volatile int *array = malloc(size);
/* Allocate largest array */
while (array == NULL && size > 0) {
size >>= 1;
array = malloc(size);
}
printf("Copying into %zu bytes\n", size);
memset(array, 0xFFFF, size);
puts("Done!");
return 0;
}
A trivial byte-copying memset loop will iterate sizeof (int) times more than the loop in my first example. Considering that my implementation uses a fairly optimal memset, here's the output:
Copying into 1073741823 bytes
Done!
Process returned 0 (0x0) execution time : 1.060 s
Press any key to continue.
These tests are likely to vary, however significantly. I only ran them once each to get a rough idea. Hopefully you've come to the same conclusion that I have: Common compilers are pretty good at optimising simple loops, and it's not worth postulating about micro-optimisations here.
In summary:
Don't use memset to fill ints with values (with an exception for the value 0), because it's not suitable.
Don't postulate about optimisations prior to running tests. Don't run tests until you have a working solution. By working solution I mean "A program that solves an actual problem". Once you have that, use your profiler to identify more significant opportunities to optimise!
This is because of two's complement. You have to change your array type to unsigned short, to get the max value, or use 0x7FFF.
for (int i = 0; i < SIZE / sizeof(short); ++i) {
ZBUFFER[i] = SHRT_MAX;
}
Note this does not initialize the last couple bytes, if (SIZE % sizeof(short))
In C, you can do it like Adrian Panasiuk said, and you can also unroll the copy loop. Unrolling means copying larger chunks at a time. The extreme end of loop unrolling is copying the whole frame over with a zero frame, like this:
init()
{
for (int i = 0; i < sizeof(ZBUFFER) / sizeof(ZBUFFER[0]; ++i) {
empty_ZBUFFER[i] = SHRT_MAX;
}
}
actual clearing:
memcpy(ZBUFFER, empty_ZBUFFER, SIZE);
(You can experiment with different sizes of the empty ZBUFFER, from four bytes and up, and then have a loop around the memcpy.)
As always, test your findings, if a) it's worth optimizing this part of the program and b) what difference the different initializing techniques makes. It will depend on a lot of factors. For the last few per cents of performance, you may have to resort to assembler code.
#include <algorithm>
#include <limits>
std::fill_n(ZBUFFER, size, std::numeric_limits<FOO>::max())
where FOO is the type of ZBUFFER's elements.
When you say "memset" do you actually have to use that function? That is only a byte-by-byte assign so it won't work with signed arrays.
If you want to set each value to the maximum you would use something like:
std::fill( ZBUFFER, ZBUFFER+len, std::numeric_limits<short>::max() )
when len is the number of elements (not the size in bytes of your array)
How do I automatically set a dynamically allocated array of floats to zero(0.0) during allocation
Is this OK
float* delay_line = new float[filter_len];
//THIS
memset(delay_line, 0.0, filter_len); //can I do this for a float??
//OR THIS
for (int i = 0; i < filter_len; i++)
delay_line[i] = 0.0;
Which is the most efficient way
Thanks
Use sizeof(float) * filter_len unless you are working in some odd implementation where sizeof(float) == sizeof(char).
memset(delay_line, 0, sizeof(float) * filter_len);
Edit: As Stephan202 points out in the comments, 0.0 is a particularly easy floating point value to code for memset since the IEEE standard representation for 0.0 is all zero bits.
memset is operating in the realm of memory, not the realm of numbers. The second parameter, declared an int, is cast to an unsigned char. If your implementation of C++ uses four bytes per float, the following relationships hold:
If you memset the float with 0, the value will be 0.0.
If you memset the float with 1, the value will be 2.36943e-38.
If you memset the float with 42, the value will be 1.51137e-13.
If you memset the float with 64, the value will be 3.00392.
So zero is a special case.
If this seems peculiar, recall that memset is declared in <cstring> or <string.h>, and is often used for making things like "***************" or "------------------". That it can also be used to zero memory is a nifty side-effect.
As Milan Babuškov points out in the comments, there is a function bzero (nonstandard and deprecated), available for the moment on Mac and Linux but not Microsoft, which, because it is specially tailored to setting memory to zero, safely omits a few instructions. If you use it, and a puritanical future release of your compiler omits it, it is trivial to implement bzero yourself in a local compatibility patch, unless the future release has re-used the name for some other purpose.
use
#include <algorithm>
...
std::fill_n( delay_line, filer_len, 0 )
The elements of a dynamically allocated array can be initialized to the default value of the element type by following the array size by an empty pair of parentheses:
float* delay_line = new float[filter_len]();
Use a std::vector instead:
std::vector<float> delay_line( filter_len );
The vector will be zero initialised.
Now that we're at it: even better would be to use the vector class.
std::vector< float > delay_line( filter_len, 0.0 );
Another option is to use calloc to allocate and zero at the same time:
float *delay_line = (float *)calloc(sizeof(float), filter_len);
The advantage here is that, depending on your malloc implementation, it may be possible to avoid zeroing the array if it's known to be allocated from memory that's already zeroed (as pages allocated from the operating system often are)
Keep in mind that you must use free() rather than delete [] on such an array.
Which is the most efficient way
memset maybe a tad faster, BUT WHO CARES!?!? Micro-optimization down to this level is a total waste of time, unless you're programming a calculator, and probably not even then.
I think the memset way is clearer, BUT I think you really had better check your man-pages for memset... I'd be suprised if your version of standard libraries has a memset function which takes a float as the second argument.
PS: The bit pattern representing zero is the same for both integers and floats... this is by design, not just good luck.
Good Luck ;-)
Cheers. Keith.