How do I automatically set a dynamically allocated array of floats to zero(0.0) during allocation
Is this OK
float* delay_line = new float[filter_len];
//THIS
memset(delay_line, 0.0, filter_len); //can I do this for a float??
//OR THIS
for (int i = 0; i < filter_len; i++)
delay_line[i] = 0.0;
Which is the most efficient way
Thanks
Use sizeof(float) * filter_len unless you are working in some odd implementation where sizeof(float) == sizeof(char).
memset(delay_line, 0, sizeof(float) * filter_len);
Edit: As Stephan202 points out in the comments, 0.0 is a particularly easy floating point value to code for memset since the IEEE standard representation for 0.0 is all zero bits.
memset is operating in the realm of memory, not the realm of numbers. The second parameter, declared an int, is cast to an unsigned char. If your implementation of C++ uses four bytes per float, the following relationships hold:
If you memset the float with 0, the value will be 0.0.
If you memset the float with 1, the value will be 2.36943e-38.
If you memset the float with 42, the value will be 1.51137e-13.
If you memset the float with 64, the value will be 3.00392.
So zero is a special case.
If this seems peculiar, recall that memset is declared in <cstring> or <string.h>, and is often used for making things like "***************" or "------------------". That it can also be used to zero memory is a nifty side-effect.
As Milan Babuškov points out in the comments, there is a function bzero (nonstandard and deprecated), available for the moment on Mac and Linux but not Microsoft, which, because it is specially tailored to setting memory to zero, safely omits a few instructions. If you use it, and a puritanical future release of your compiler omits it, it is trivial to implement bzero yourself in a local compatibility patch, unless the future release has re-used the name for some other purpose.
use
#include <algorithm>
...
std::fill_n( delay_line, filer_len, 0 )
The elements of a dynamically allocated array can be initialized to the default value of the element type by following the array size by an empty pair of parentheses:
float* delay_line = new float[filter_len]();
Use a std::vector instead:
std::vector<float> delay_line( filter_len );
The vector will be zero initialised.
Now that we're at it: even better would be to use the vector class.
std::vector< float > delay_line( filter_len, 0.0 );
Another option is to use calloc to allocate and zero at the same time:
float *delay_line = (float *)calloc(sizeof(float), filter_len);
The advantage here is that, depending on your malloc implementation, it may be possible to avoid zeroing the array if it's known to be allocated from memory that's already zeroed (as pages allocated from the operating system often are)
Keep in mind that you must use free() rather than delete [] on such an array.
Which is the most efficient way
memset maybe a tad faster, BUT WHO CARES!?!? Micro-optimization down to this level is a total waste of time, unless you're programming a calculator, and probably not even then.
I think the memset way is clearer, BUT I think you really had better check your man-pages for memset... I'd be suprised if your version of standard libraries has a memset function which takes a float as the second argument.
PS: The bit pattern representing zero is the same for both integers and floats... this is by design, not just good luck.
Good Luck ;-)
Cheers. Keith.
Related
I would like to allocate some char buffers0, to be passed to an external non-C++ function, that have a specific alignment requirement.
The requirement is that the buffer be aligned to a N-byte1 boundary, but not to a 2N boundary. For example, if N is 64, then an the pointer to this buffer p should satisfy ((uintptr_t)p) % 64 == 0 and ((uintptr_t)p) % 128 != 0 - at least on platforms where pointers have the usual interpretation as a plain address when cast to uintptr_t.
Is there a reasonable way to do this with the standard facilities of C++11?
If not, is there is a reasonable way to do this outside the standard facilities2 which works in practice for modern compilers and platforms?
The buffer will be passed to an outside routine (adhering to the C ABI but written in asm). The required alignment will usually be greater than 16, but less than 8192.
Over-allocation or any other minor wasted-resource issues are totally fine. I'm more interested in correctness and portability than wasting a few bytes or milliseconds.
Something that works on both the heap and stack is ideal, but anything that works on either is still pretty good (with a preference towards heap allocation).
0 This could be with operator new[] or malloc or perhaps some other method that is alignment-aware: whatever makes sense.
1 As usual, N is a power of two.
2 Yes, I understand an answer of this type causes language-lawyers to become apoplectic, so if that's you just ignore this part.
Logically, to satisfy "aligned to N, but not 2N", we align to 2N then add N to the pointer. Note that this will over-allocate N bytes.
So, assuming we want to allocate B bytes, if you just want stack space, alignas would work, perhaps.
alignas(N*2) char buffer[B+N];
char *p = buffer + N;
If you want heap space, std::aligned_storage might do:
typedef std::aligned_storage<B+N,N*2>::type ALIGNED_CHAR;
ALIGNED_CHAR buffer;
char *p = reinterpret_cast<char *>(&buffer) + N;
I've not tested either out, but the documentation suggests it should be OK.
You can use _aligned_malloc(nbytes,alignment) (in MSVC) or _mm_malloc(nbytes,alignment) (on other compilers) to allocate (on the heap) nbytes of memory aligned to alignment bytes, which must be an integer power of two.
Then you can use the trick from Ken's answer to avoid alignment to 2N:
void*ptr_alloc = _mm_malloc(nbytes+N,2*N);
void*ptr = static_cast<void*>(static_cast<char*>(ptr_alloc) + N);
/* do your number crunching */
_mm_free(ptr_alloc);
We must ensure to keep the pointer returned by _mm_malloc() for later de-allocation, which must be done via _mm_free().
float* tempBuf = new float[maxVoices]();
Will the above result in
1) memory that is 16-byte aligned?
2) memory that is confirmed to be contiguous?
What I want is the following:
float tempBuf[maxVoices] __attribute__ ((aligned));
but as heap memory, that will be effective for Apple Accelerate framework.
Thanks.
The memory will be aligned for float, but not necessarily for CPU-specific SIMD instructions. I strongly suspect on your system sizeof(float) < 16, though, which means it's not as aligned as you want. The memory will be contiguous: &A[i] == &A[0] + i.
If you need something more specific, new std::aligned_storage<Length, Alignment> will return a suitable region of memory, presuming of course that you did in fact pass a more specific alignment.
Another alternative is struct FourFloats alignas(16) {float[4] floats;}; - this may map more naturally to the framework. You'd now need to do new FourFloats[(maxVoices+3)/4].
Yes, new returns contiguous memory.
As for alignment, no such alignment guarantee is provided. Try this:
template<class T, size_t A>
T* over_aligned(size_t N){
static_assert(A <= alignof(std::max_align_t),
"Over-alignment is implementation-defined."
);
static_assert( std::is_trivially_destructible<T>{},
"Function does not store number of elements to destroy"
);
using Helper=std::aligned_storage_t<sizeof(T), A>;
auto* ptr = new Helper[(N+sizeof(Helper)-1)/sizeof(Helper)];
return new(ptr) T[N];
}
Use:
float* f = over_aligned<float,16>(37);
Makes an array of 37 floats, with the buffer aligned to 16 bytes. Or it fails to compile.
If the assert fails, it can still work. Test and consult your compiler documentation. Once convinced, put compiler-specific version guards around the static assert, so when you change compilers you can test all over again (yay).
If you want real portability, you have to have a fall back to std::align and manage resources separately from data pointers and count the number of T if and only if T has a non-trivial destructor, then store the number of T "before" the start of your buffer. It gets pretty silly.
It's guaranteed to be aligned properly with respect to the type you're allocating. So if it's an array of 4 floats (each supposedly 4 bytes), it's guaranteed to provide a usable sequence of floats. It's not guaranteed to be aligned to 16 bytes.
Yes, it's guaranteed to be contiguous (otherwise what would be the meaning of a single pointer?)
If you want it to be aligned to some K bytes, you can do it manually with std::align. See MSalter's answer for a more efficient way of doing this.
If tempBuf is not nullptr then the C++ standard guarantees that tempBuf points to the zeroth element of least maxVoices contiguous floats.
(Don't forget to call delete[] tempBuf once you're done with it.)
I am receiving the error "User store segfault # 0x000000007feff598" for a large convolution operation.
I have defined the resultant array as
int t3_isize = 0;
int t3_irowcount = 0;
t3_irowcount=atoi(argv[2]);
t3_isize = atoi(argv[3]);
int iarray_size = t3_isize*t3_irowcount;
uint64_t t_result[iarray_size];
I noticed that if the array size is less than 2^16 - 1, the operation doesn't fail, but for the array size 2^16 or higher, I get the segfault error.
Any idea why this is happening? And how can i rectify this?
“I noticed that if the array size is greater than 2^16 - 1, the operation doesn't fail, but for the array size 2^16 or higher, I get the segfault error”
↑ Seems a bit self-contradictory.
But probably you're just allocating a too large array on the stack. Using dynamic memory allocation (e.g., just switch to using std::vector) you avoid that problem. For example:
std::vector<uint64_t> t_result(iarray_size);
In passing, I would ditch the Hungarian notation-like prefixes. For example, t_ reads like this is a type. The time for Hungarian notation was late 1980's, and its purpose was to support Microsoft's Programmer's Workbench, a now dicontinued (for very long) product.
You're probably declaring too large of an array for the stack. 216 elements of 8 bytes each is quite a lot (512K bytes).
If you just need static allocation, move the array to file scope.
Otherwise, consider using std::vector, which will allocate storage from the heap and manage it for you.
Using malloc() solved the issue.
uint64_t* t_result = (uint64_t*) malloc(sizeof(uint64_t)*iarray_size);
I have allocated a big double vector, lets say with 100000 element. At some point in my code, I want to set all elements to a constant, nonzero value. How can I do this without using a for loop over all elements?
I am also using the blas package, if it helps.
You could use std::fill (#include <algorithm>):
std::fill(v.begin(), v.end(), 1);
This is essentially also only a loop of course..
'fill' is right from what you've said.
Be aware that it's also possible to construct a vector full of a specified value:
std::vector<double> vec(100000, 3.14);
So if "at some point" means "immediately after construction", do this instead. Also, it means you can do this:
std::vector<double>(100000, 3.14).swap(vec);
which might be useful if "at some point" means "immediately after changing the size", and you expect/want the vector to be reallocated ("expect" if you're making it bigger than its prior capacity, "want" if you're making it much smaller and want it trimmed to save memory).
You always use memset() if you don't want to loop.
That is, memset(myarr, 5, arrsize); in order to fill it with all 5's. Beware of implicit conversion to unsigned char.
SYNOPSIS
#include <string.h>
void *
memset(void *b, int c, size_t len);
DESCRIPTION
The memset() function writes len bytes of value c (converted to an
unsigned char) to the byte string b.
And if the vector is large, and you need it to go fast and you are using gcc, then :
Code generation of block move (memcpy)
and block set (memset) was rewritten.
GCC can now pick the best algorithm
(loop, unrolled loop, instruction with
rep prefix or a library call) based on
the size of the block being copied and
the CPU being optimized for.
I am working on refactoring some old code and have found few structs containing zero length arrays (below). Warnings depressed by pragma, of course, but I've failed to create by "new" structures containing such structures (error 2233). Array 'byData' used as pointer, but why not to use pointer instead? or array of length 1? And of course, no comments were added to make me enjoy the process...
Any causes to use such thing? Any advice in refactoring those?
struct someData
{
int nData;
BYTE byData[0];
}
NB It's C++, Windows XP, VS 2003
Yes this is a C-Hack.
To create an array of any length:
struct someData* mallocSomeData(int size)
{
struct someData* result = (struct someData*)malloc(sizeof(struct someData) + size * sizeof(BYTE));
if (result)
{ result->nData = size;
}
return result;
}
Now you have an object of someData with an array of a specified length.
There are, unfortunately, several reasons why you would declare a zero length array at the end of a structure. It essentially gives you the ability to have a variable length structure returned from an API.
Raymond Chen did an excellent blog post on the subject. I suggest you take a look at this post because it likely contains the answer you want.
Note in his post, it deals with arrays of size 1 instead of 0. This is the case because zero length arrays are a more recent entry into the standards. His post should still apply to your problem.
http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx
EDIT
Note: Even though Raymond's post says 0 length arrays are legal in C99 they are in fact still not legal in C99. Instead of a 0 length array here you should be using a length 1 array
This is an old C hack to allow a flexible sized arrays.
In C99 standard this is not neccessary as it supports the arr[] syntax.
Your intution about "why not use an array of size 1" is spot on.
The code is doing the "C struct hack" wrong, because declarations of zero length arrays are a constraint violation. This means that a compiler can reject your hack right off the bat at compile time with a diagnostic message that stops the translation.
If we want to perpetrate a hack, we must sneak it past the compiler.
The right way to do the "C struct hack" (which is compatible with C dialects going back to 1989 ANSI C, and probably much earlier) is to use a perfectly valid array of size 1:
struct someData
{
int nData;
unsigned char byData[1];
}
Moreover, instead of sizeof struct someData, the size of the part before byData is calculated using:
offsetof(struct someData, byData);
To allocate a struct someData with space for 42 bytes in byData, we would then use:
struct someData *psd = (struct someData *) malloc(offsetof(struct someData, byData) + 42);
Note that this offsetof calculation is in fact the correct calculation even in the case of the array size being zero. You see, sizeof the whole structure can include padding. For instance, if we have something like this:
struct hack {
unsigned long ul;
char c;
char foo[0]; /* assuming our compiler accepts this nonsense */
};
The size of struct hack is quite possibly padded for alignment because of the ul member. If unsigned long is four bytes wide, then quite possibly sizeof (struct hack) is 8, whereas offsetof(struct hack, foo) is almost certainly 5. The offsetof method is the way to get the accurate size of the preceding part of the struct just before the array.
So that would be the way to refactor the code: make it conform to the classic, highly portable struct hack.
Why not use a pointer? Because a pointer occupies extra space and has to be initialized.
There are other good reasons not to use a pointer, namely that a pointer requires an address space in order to be meaningful. The struct hack is externalizeable: that is to say, there are situations in which such a layout conforms to external storage such as areas of files, packets or shared memory, in which you do not want pointers because they are not meaningful.
Several years ago, I used the struct hack in a shared memory message passing interface between kernel and user space. I didn't want pointers there, because they would have been meaningful only to the original address space of the process generating a message. The kernel part of the software had a view to the memory using its own mapping at a different address, and so everything was based on offset calculations.
It's worth pointing out IMO the best way to do the size calculation, which is used in the Raymond Chen article linked above.
struct foo
{
size_t count;
int data[1];
}
size_t foo_size_from_count(size_t count)
{
return offsetof(foo, data[count]);
}
The offset of the first entry off the end of desired allocation, is also the size of the desired allocation. IMO it's an extremely elegant way of doing the size calculation. It does not matter what the element type of the variable size array is. The offsetof (or FIELD_OFFSET or UFIELD_OFFSET in Windows) is always written the same way. No sizeof() expressions to accidentally mess up.