Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm developing an application in C++ and I need to know the number of elements of a variable. I'm looking for how to do this but I'm not able to find a solution for this. The variable is defined in this way:
unsigned char *values = (unsigned char *) some_function(some_parameter);
// "some_function" takes "some_parameter" and fills "values" correctly
Thanks in advance for any help you can provide.
Best regards.
Since you told us which function you're using (FreeImage::GetBits), we now know that you're querying the raw data of an image. Its size is the product of the pitch and height of the image, as seen in this formula:
int size = image.GetPitch() * image.GetHeight();
This is the size in bytes, which is the number of elements if you access a char pointer. But speaking of "number of elements" in such a case (where we speak of some low-level memory, a bit stream with no high-level types) is a bit misleading, as when reading the question one might think it's about a higher level array.
In case you wonder: Raw image data is typically laid out in rows of size pitch, one pixel after another, from left to right, where the size per pixel can depend on some storage format (for example 1 byte grayscale, 3 bytes RGB with 8 bit per channel, 1 bit for monochrome bitmaps, and many more formats).
These rows are laid out from top to bottom (in most cases) or sometimes from bottom to top (in the case of BMP file format for example). The pitch is at least the width of the image times the size per pixel, so all pixels have space in such a "scan line", which is how such a memory per line of the image is called. It's rounded up to some alignment, so every line can start at an aligned address in the memory for the whole image. The unused space is called "padding" and ignored.
Depending on the library, sometimes "pitch" means "pixels per line" in the memory, not "bytes per line", but in this case it's already given in bytes so you only have to multiply by the image height. Note that typically the height is not padded like the width, since there's no advantage of doing so.
You can never deterministically know the length of an array given a pointer to the beginning of the array. You must pass some information along with the array.
That extra information may be in the form of:
another return value specifying the length
an agreed encoding that encodes the length into the beginning of the array
an agreed encoding that marks the end of the array (e.g. \0 at the end of a string)
Related
I'm converting the deprecated cudaMemcpyToArray and cudaMemcpyFromArray into cudaMemcpy2DToArray and cudaMemcpy2DFromArray. Rather than size of the deprecated calls, the new API calls for width, height, and pitch. The descriptions of spitch and dpitch are correspondingly "Pitch of source memory" and "Pitch of destination memory". I wonder what are those values: size of data items, something else?
More specifically, if I were to copy W*H floats, should I have pitch=sizeof(float), width=W, height=H, or pitch=sizeof(float)*W, width=sizeof(float)*W, height=H, or something else?
It should be:
pitch=sizeof(float)*W
width = sizeof(float)*W
height = H
The above is for cudaMemcpy2DToArray, and assumes you are transferring from host to device, which would most likely involve an unpitched allocation in host memory as the source.
The pitch of a pitched allocation is the size in bytes of one line of of a 2D allocation, including padding bytes at the end of the line. It is the value returned by cudaMallocPitch, for example. For unpitched allocations, it is still the width of the line, and it is given by W*sizeof(element) where the 2D allocation width is given by W elements each of size sizeof(element).
This question and the link it refers to may also be of interest.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Assume we want to copy n bytes of data from void* src to void* dst. It is well-known that standard library implementation of memcpy is heavily optimized for using platform-dependent vectorized instructions and various other tricks to perform copying as fast as possible.
Now assume that p bytes of data after src + n are readable and p bytes of data after dst + n are writeable. Also, assume that it is OK if arbitrary garbage is written in [dst + n, dst + n + p).
Clearly, these assumptions widen the range of our possible actions, possibly leading to even faster memcpy. For example, we may copy some portion of less than 16 trailing bytes in a small number of unaligned 128-bit instructions (load + store). Maybe there are other tricks that are allowed by such extra assumptions.
01234 ..... n
src: abcdabcdabcdabcdabcdabcGARBAGEGA
v v
dst: ______(actl dst)________(wrtbl)_
| block1 || block2 |
Note that the assumptions are actually rather practical in cases when you need to append a sequence of strings within allocated buffer of capacity enough to hold p + total string size bytes. For example, a following routine may happen somewhere in database internals:
You are given a binary string char* dictionary and an integer array
int* offsets which is a monotonic sequence of offsets in dictionary;
these two variables represent a dictionary of strings obtained from a
disk. You also have an integer array int* indices indicating an
order in which dictionary strings must be written to an output buffer
char* buffer.
Using the technique described above you may safely
write each new string not caring about the garbage to the right from
it, as it is going to be overridden by the next string to be appended.
The questions are:
Are there open-source implementations of such technique? Achieving an optimal implementation would clearly require spending lot of time on (platform-dependent) tuning, so writing such code without considering existing implementations does not seem like a good idea.
Why readability of 15 bytes past an allocation is not a feature of modern allocators? If a memory allocator could just allocate one more unitialized page of memory in each mmap it does internally, it would provide the desired readability effectively zero-cost without need to change the program code.
Final remark: this idea is not new; for example, it appears in the source code of ClickHouse. Still, they have implemented own custom templated POD array to handle such allocations.
Now assume that p bytes of data after src + n are readable and p bytes of data after dst + n are writeable. Also, assume that it is OK if arbitrary garbage is written in [dst + n, dst + n + p).
These assumptions are impractical in most cases:
If the target OS is known to allocate blocks with a size multiple of some alignment value such as 16, 4K or even 8K, the compiler and the library can assume extra bytes of data can be read at the end of an unaligned block. So for your purpose, it seems you could save some tests on the last chunk read.
Conversely, a library function should not make the other two assumptions, so a generic implementation of memcpy will not do this, even if it restores the previous contents of the area beyond dst + n, as this would be potentially incorrect in a multi-threaded process.
The programmer could try and round n up by a few bytes in an attempt to shave some cycles, but it is very tricky and error prone.
In most cases, there is very little to gain with this approach. memcpy is already optimized inline for many constant sizes and the difference would be infinitesimal for large sizes.
Yet if you know the actual data size is in the range [1 .. 16], and 16 bytes can be read/written harmlessly, you could optimize the call by specifying a constant maximal size: memcpy(dst, src, 16). You can adjust for other similar cases, check the generated code and measure actual performance, which could be substantially better than memcpy(dst, src, n) with n variable but known to be in the expected range.
If you implement vectorized memchr or memcmp, where you only read, you can align vectors to natural boundaries, and for the first/last incomplete vectors mask out paddings. By processing in naturally aligned chunks, you'll never hit the next page in the same read, so reading unallocated data should be safe, even though the extra space is not allocated.
Apparently it is the only way to vectorize memchr, as this function should be prepared to work when count parameter is more than the actual range, if the result is found earlier than the actual end. cppreference:
This function behaves as if it reads the characters sequentially and stops as soon as a matching character is found: if the array pointed to by ptr is smaller than count, but the match is found within the array, the behavior is well-defined
For the writes, I don't think it is possible to do any optimization, unless vectorization instructions support masked writes.
Assume you have an array separated by user at some arbitrary index, and separate threads memcpy the different data into the parts. If memcpy would write out of range, there would be a data race.
I'm doing a performance test of the SHA3 algorithm on a variable, I'm checking the execution time of the algorithm for different size of the variable. For this I am using char type and increasing the size of it, but I do not know if I am doing it effectively right. I will use the line of code below to explain my doubt.
char[1000] = "A text";
I know that each char has a size of 1 Byte. My question is: when I predefine a vector, will the size of the variable be the index of the vector, in this case 1000? Or will the size of the variable be given by the content inside it, in this case by the text, which would be 6 Bytes?
The test that I'm doing is right? Or does not the allocated memory size account for the performance of SHA3? (I ask this because I intend to do the same test with larger values. If I want to, for example, do this test with 20 KBytes, will I have to fill in the variable with 20000 characters?)
I'm using C++.
The amount of memory allocated on the stack by that line of code will be 1,000 bytes. However, what you send to your SHA3 code may only be the number of bytes of the string "A text", depending on how you're calling it, and how it uses the data. If it calculates the length of the string using a function like strlen(), then it will likely only iterate over the 6 characters (and 1 NUL byte) of the string and ignore the remaining 993 bytes. So it really depends on how you're using it and how you're calculating the size for your tests.
Hi I have a problem drawing with VBO. So I asked a question a here .I could not find the answer of my problem. But by discussing one answer given there I have now another question about strides in VBO. I am confused about what it is and does.
In here I found someone answered
If you have all your vertex data in one array (array : read as
malloc''ed pointer), all your normals in another array, etc. Then your
stride is 0. For instance if vertex, normals, etc are stored like that
:
[vertex0][vertex1][vertex2]...
[normal0][normal1][normal2]...
[texcoord0][texcoord1][texcoord2]...
If your vertex, normal, etc is packed like that :
[vertex0][normal0][texcoord0][vertex1][normal1][texcoord1][vertex2][normal2][texcoord2]...
Then you should set a non-null stride, which corresponds to the offset
needed to switch from one element to the next. (this stride is counted
as bytes btw)
From that explanation I thought the stride actually means the distance between end of one vertex and the start of another vertex in the buffer. In the first case its 0 because all the vertexes are stored contiguously. Same goes for the textures. But then I read another answer about the definition of strides in that same thread.
There tends to be a bit of confusion regarding VBO stride, mostly
because of its special meaning for 0.
"Stride" in this context means the distance between the beginning of a
value in memory, and the beginning of the next value in memory. It is
not the distance between the end of one and the beginning of the next.
So in a VBO that is an array of a structure, the stride for each
element of that structure will be the sizeof the structure as a whole.
Keep in mind that struct padding can affect this.
which just says the opposite of what the other answer says. Or I am wrong about what the first answer meant? Can anyone please help me solve the problem. I will really appreciate if anyone can give an answer with example. I have given the link of my implementation of VBO in the start of this question which is not solved yet. Thanks.
What the first answer is trying to say is that the "stride" between two elements is the offset in bytes between the beginning of one element and the beginning of the next.
However, if the elements you're passing are contiguous (i.e. there's no space between them), you can pass 0 for the stride parameter.
I would say that it's wrong to claim that "stride is 0" in this case - the stride is sizeof(element), but the value 0 gets special treatment and is taken to mean sizeof(element).
This is most likely done so the poor programmer doesn't have to use two (bug-prone) sizeof parameters in the common case when they are the same.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
As far as I can understand, RAM is organized like a net of rows and columns of cells, each cell containing 1 byte. Also, each cell is label with an address memory written in hexadecimal system. Is this so? Now, when running a c++ program, I suppose it uses the RAM as a mean of storage. In this case, as the char type on c++ is the basic unit of storage, is this size of a char exactly the same as the cell (1 byte)?, does the size of a char depends on the size of a cell (in case the size of a cell is not 1 byte)?, does it depend on the compiler? Thank you so much.
It is easy to visualize RAM as a net of rows and columns. This is how most CS classes teach students as well and for most purposes this would do well at a conceptual level.
One thing you must know while writing C++ programs is the concept of 2 different memories: stack and heap. Stack is memory that stores variables when they come in scope. When they go out of scope, they are removed. Think of this as a stack implementation (FIFO).
Now, heap memory is slightly more complicated. This does not have anything to do with scope of the variable. You can set a fixed memory location to contain a particular value and it will stay there until you free it up. You can set the heap memory by using the 'new' keyword.
For instance: int* abc = new int(2);
This means that the pointer abc points to a heap location with the value '2'. You must explicitly free the memory using the delete keyword once you are done with this memory. Failure to do so would cause memory leaks.
In C, the type of a character constant like a is actually an int, with size of 4. In C++, the type is char, with size of 1. The size is NOT dependent on compiler. The size of int, float and the like are dependent on the configuration of your system (16/32/64-bit). Use the statement:
int a=5;
cout<<sizeof(a)<<endl;
to determine the size of int in your system.