Checking record size in ocaml? - ocaml

Is there any way to check the size of a record in Ocaml? Something like sizeof of C/C++?

Yes:
# Obj.size (Obj.repr (1,2,3,4,5)) ;;
- : int = 5
But for a record type, the size only depends on the type declaration, so you could just infer it from that.
The actual size occupied in memory is the number returned by Obj.size plus one in words. Words are 32 or 64 bit depending which OCaml version you are using. The additional word is used for book-keeping.

Besides Obj module, there is also a Objsize library from Dmitry Grebeniuk ( http://forge.ocamlcore.org/projects/objsize/ ). It allows you to get more detailed info about values and its size.

Related

Why sizeof() of a string variable always return the same number even when content changes?

This is a rather simple problem but is pretty confusing.
string R = "hhhh" ;
cout<< sizeof( R )<<endl;
OUTPUT:
4
Variation:
string R = "hhuuuuuuhh" ;
cout<< sizeof( R )<
OUTPUT2:
4
What is going wrong ? Should I use char array instead ?
Think of sizeof being compile-time evaluable. It evaluates to the size of the type, not the size of the contents. You can even write sizeof(std::string) which will be exactly the same as sizeof(foo) for any std::string instance foo.
To compute the number of characters in a std::string, use size().
If you have a character array, say char c[6] then the type of c is an array of 6 chars. So sizeof(c) (known at compile-time) will be 6 as the C++ standard defines the size of a single char to be 1.
sizeof expression returns the size required for storage of the type expression evaluates to (see http://en.cppreference.com/w/cpp/language/sizeof). In case of std::string, this contains a pointer to the data (and possibly a buffer for small strings), but not the data itself, so it doesn't (and can't) depend on string length.
Your string variable will consist of a part most often stored on the stack, which has fixed dimensions. The size of this part is what's returned by sizeof (). Inside this fixed part is a pointer (or reference) to a part stored on the heap, which actually contain your characters and has a varying size. However the size of this part is only known at runtime, while sizeof () is computed at compile time.
You may wonder why. Things like this are both the strength and the weakness of C++. C++ is a totally different beast from e.g. languages like Python and C#. While the latter languages can produce all kinds of dynamically changing meta-data (like the size or type of a variable), the price that is paid is that they're all slow. C++, while being a bit 'spartan', can run rings around such languages. In fact most 'dynamic' languages are in fact implemented (programmed) in C/C++.

Limit the GDB output length

I have a structure that describes a bitmap. It looks like this
struct bitmap {
int XSize;
int YSize;
unsigned char *pData;
};
When an instances of this structure is initialized pData points to thousands of random-like but non-zero bytes. When I print the instance of the structure GDB shows a lot of meaningless bytes. That very time consuming. When the disp of such a variable is active I get the output for each step what delays debugging.
Is there a GDB option that limits the output length?
When the bytes are meaningless I could change the type of pData to void *. But since the structure is used in a precompiled library the type can't be changed. Can the type that GDB uses for print and disp be "overridden"?
As Paul has pointed out the answer in this question gives the correct command to allow unlimited length.
To limit the length you need the command
set print elements n
where n is the maximum number of elements. Setting n to 0 gives the unlimited length.
Setting print elements 4 will limit the number of pData characters to 4, but it will also limit all other strings and arrays, which could be quite annoying (e.g. print filename would produce /tmp... when the actual value is /tmp/foobar.
A possibly better approach is to write a Python pretty-printer for struct bitmap (assuming you have sufficiently recent GDB). See this answer on how to do that.

How much "data" you can put into a string?

As programmers, we work with strings a lot. Most of the time, I use them without thinking about them too much. Lately though, I have been using strings to return copious amounts of information from a function with no problem. My latest example is a binary tree with 10's of 1000's of entries. I have a recursive function that simply keeps adding to the string with a newline character at the end. This function gave no trouble.
So is there any kind of "limit" on how many characters you can put in a string or are you only limited by the amount of memory available?
The real limit on the size a string object can reach is returned by member max_size.
from here.
Link specific to max_size
So yeah, it's implementation-specific.
No, the only limit is available contiguous memory. There are no artificial limits imposed on string length; the length of a string is kept in a size_t variable, the maximum value of which is the largest addressable byte in the system (be it 8 or 16 or 32 or 64 bit or whatever).
It's very big but not unlimited though. You can use string::max_size to return the maximum number of characters that the string object can hold. Note that the returned value may vary from system to system.

how to efficiently access 3^20 vectors in a 2^30 bits of memory

I want to store a 20-dimensional array where each coordinate can have 3 values,
in a minimal amount of memory (2^30 or 1 Gigabyte).
It is not a sparse array, I really need every value.
Furthermore I want the values to be integers of arbirary but fixed precision,
say 256 bits or 8 words
example;
set_big_array(1,0,0,0,1,2,2,0,0,2,1,1,2,0,0,0,1,1,1,2, some_256_bit_value);
and
get_big_array(1,0,0,0,1,2,2,0,0,2,1,1,2,0,0,0,1,1,1,2, &some_256_bit_value);
Because the value 3 is relative prime of 2. its difficult to implement this using
efficient bitwise shift, and and or operators.
I want this to be as fast as possible.
any thoughts?
Seems tricky to me without some compression:
3^20 = 3486784401 values to store
256bits / 8bitsPerByte = 32 bytes per value
3486784401 * 32 = 111577100832 size for values in bytes
111577100832 / (1024^3) = 104 Gb
You're trying to fit 104 Gb in 1 Gb. There'd need to be some pattern to the data that could be used to compress it.
Sorry, I know this isn't much help, but maybe you can rethink your strategy.
There are 3.48e9 variants of 20-tuple of indexes that are 0,1,2. If you wish to store a 256 bit value at each index, that means you're talking about 8.92e11 bits - about a terabit, or about 100GB.
I'm not sure what you're trying to do, but that sounds computationally expensive. It may be reasonable feasible as a memory-mapped file, and may be reasonably fast as a memory-mapped file on an SSD.
What are you trying to do?
So, a practical solution would be to use a 64-bit OS and a large memory-mapped file (preferably on an SSD) and simply compute the address for a given element in the typical way for arrays, i.e. as sum-of(forall-i(i-th-index * 3^i)) * 32 bytes in pseudeo-math. Or, use a very very expensive machine with that much memory, or another algorithm that doesn't require this array in the first place.
A few notes on platforms: Windows 7 supports just 192GB of memory, so using physical memory for a structure like this is possible but really pushing it (more expensive editions support more). If you can find a machine at all that is. According to microsoft's page on the matter the user-mode virtual address space is 7-8TB, so mmap/virtual memory should be doable. Alex Ionescu explains why there's such a low limit on virtual memory despite an apparently 64-bit architecture. Wikipedia puts linux's addressable limits at 128TB, though probably that's before the kernel/usermode split.
Assuming you want to address such a multidimensional array, you must process each index at least once: that means any algorithm will be O(N) where N is the number of indexes. As mentioned before, you don't need to convert to base-2 addressing or anything else, the only thing that matters is that you can compute the integer offset - and which base the maths happens in is irrelevant. You should use the most compact representation possible and ignore the fact that each dimension is not a multiple of 2.
So, for a 16-dimensional array, that address computation function could be:
int offset = 0;
for(int ii=0;ii<16;ii++)
offset = offset*3 + indexes[ii];
return &the_array[offset];
As previously said, this is just the common array indexing formula, nothing special about it. Note that even for "just" 16 dimensions, if each item is 32 bytes, you're dealing with a little more than a gigabyte of data.
Maybe i understand your question wrong. But can't you just use a normal array?
INT256 bigArray[3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3];
OR
INT256 ********************bigArray = malloc(3^20 * 8);
bigArray[1][0][0][1][2][0][1][1][0][0][0][0][1][1][2][1][1][1][1][1] = some_256_bit_value;
etc.
Edit:
Will not work because you would need 3^20 * 8Byte = ca. 25GByte.
The malloc variant is wrong.
I'll start by doing a direct calculation of the address, then see if I can optimize it
address = 0;
for(i=15; i>=0; i--)
{
address = 3*address + array[i];
}
address = address * number_of_bytes_needed_for_array_value
2^30 bits is 2^27 bytes so not actually a gigabyte, it's an eighth of a gigabyte.
It appears impossible to do because of the mathematics although of course you can create the data size bigger then compress it, which may get you down to the required size although it cannot guarantee. (It must fail to some of the time as the compression is lossless).
If you do not require immediate "random" access your solution may be a "variable sized" two-bit word so your most commonly stored value takes only 1 bit and the other two take 2 bits.
If 0 is your most common value then:
0 = 0
10 = 1
11 = 2
or something like that.
In that case you will be able to store your bits in sequence this way.
It could take up to 2^40 bits this way but probably will not.
You could pre-run through your data and see which is the commonly occurring value and use that to indicate your single-bit word.
You can also compress your data after you have serialized it in up to 2^40 bits.
My assumption here is that you will be using disk possibly with memory mapping as you are unlikely to have that much memory available.
My assumption is that space is everything and not time.
You might want to take a look at something like STXXL, an implementation of the STL designed for handling very large volumes of data
You can actually use a pointer-to-array20 to have your compiler implement the index calculations for you:
/* Note: there are 19 of the [3]'s below */
my_256bit_type (*foo)[3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3];
foo = allocate_giant_array();
foo[0][1][1][0][2][1][2][2][0][2][1][0][2][1][0][0][2][1][0][0] = some_256bit_value;

Maximum number of characters in a string

Is there a maximum number of characters allowed in a string? If so, what is the limit on the number of characters?
For std::string str you can get maximum size as str.max_size().
To get currently allocated size use str.capacity().
Are we talking "C string" or std::string... the former depends entirely on the size of your buffer. The latter should only be restricted by the amount of available memory.
My understanding is that the maximum number of characters in a C-style string is the capacity of the size_t type. The size_t is defined by the standard to be able to handle the largest size on the given platform. There may be lesser constraints such as the memory available to store the text (as either read only or writable).
As far as std::string (C++ string) goes, the limit is specified by the maximum value that the std::string::size_type type can accommodate. This varies among platforms and translators. Again, this quantity may be reduced by the platforms ability to store the string.
Some newbies have been able to declare 10 MB strings for processing files.
Assuming you are talking about an array of characters (and not something like std::string), then I believe the limit is 32768, depending on the compiler.
UPDATE:
As has been pionted out to me, this limit only applies when declaring an array on the stack like so:
char str[32768];
This limit does not apply when declaring the array on the heap like this:
char *str = new char[32769];