How to nicely print Buffer in OCaml? - ocaml

I have a Buffer.
Question 1
How can I print out all byte inside one by one?
Question 2
How can I control the format of the printing?
For example, if I have a buffer like 33 33 33 33 33 33 14 40 (every byte is in HEX format), how can I print it as \x33\x33\x33\x33\x33\x33\x14\x40?

To apply an imperative function f to every byte in a buffer b, you can use String.iter f (Buffer.contents b).
To print a value with a desired format, you can use Printf.printf.
To get the integer value of a byte in a string you can use Char.code.
As a side comment, many of your recent questions could be answered extremely quickly by reading through the OCaml standard library documentation. I think this would be a good thing for you to do. There's not a lot of deep intellectual content, it's just something you should know about as an OCaml programmer.

Related

Save short int in binary file instead of text file [duplicate]

This question already has answers here:
Unexpected results with std::ofstream binary write
(2 answers)
Closed 6 years ago.
Let's say I have a vector with 9 integers.
in total, I should have 36 bytes.
some of these integers fit in the size of a short, so I wanna store the ones who fit as short in 2 bytes and the ones who don't, in 4.
I noticed that a file with 120 98 99 99 98 257 259 98 0 was 28 bytes and I wonder what I did wrong.
ofstream out(file, ios::binary);
int len = idx.size(); //idx is the vector<int>
string end = " 0", space = " "; //end is just to finish the saving.
for(int i = 0; i < len; i++) {
if(idx[i] <= SHRT_MAX){
short half = idx[i];
out<<half;
}
else out<<idx[i];
if(i == len-1) out<<end; else out<<space;
}
First piece of advice, use the header cstdint if you want to work with types of a guaranteed size. Types such as uint16_t are standard and are there for a reason.
Next, this idea of sometimes writing two bytes and sometimes writing four. Keep in mind that when you write data to a file like this, it's just going to look like a big chunk of data. There will not be any way to magically know when to read two bytes and when to read four. You can store metadata about the file, but that would probably be more inefficient than simply just consistently using the same size. Write everything as two bytes or four bytes. That's up to you, but whatever it is you should probably stick with it.
Now, moving on to why you have 28 bytes of data written.
You're writing the ASCII representations of your numbers. This ends up being "120 98 99 99 98 257 259 98 9" which has a size of 28 bytes.
When writing your data, you probably want to do something like
out.write( (char*)&my_data, sizeof(my_data));
Keep in mind though this isn't really a safe way to write binary data. I think you already understand the necessity to make sure you write the size you intend. Sadly the complications with creating portable files doesn't end there. You also need to worry about the endianess of the machine your program is running on. This is an article that I think you might enjoy reading to learn more about the subject.
Disch's Tutorial To Good Binary Files

Why is the length of the SAS character field 32,767?

According to The Little SAS Book, SAS character data can be up to 2^(15)-1 in length.
Where does that 1 character go? Usually in floating point arithmetic, we reserve one byte for the sign of the floating point number. Is something similar happening for SAS character data?
I don't have a definite answer, but I have a supposition.
I think that the length of 32,767 is not related to the field itself; SAS stores all of its rows (in an uncompressed file) in identical sized blocks, and so there is no need for a field length indicator or a null terminator. IE, in a SAS dataset you would have something like, for the following data step equivalent:
data want;
length name $8;
input recnum name $ age;
datalines;
01 Johnny 13
02 Nancy 12
03 Rachel 14
04 Madison 12
05 Dennis 15
;;;;
run;
You'd have something like this. The headers are of course not written that way but are just packed sequences of bytes.
<dataset header>
Dataset name: Want
Dataset record size: 24 bytes
... etc. ...
<subheaders>
Name character type length=8
Recnum numeric type length=8
Age numeric type length=8
... etc. ...
<first row of data follows>
4A6F686E6E792020000000010000000D
4E616E6379202020000000020000000C
52616368656C2020000000030000000E
4D616469736F6E20000000040000000C
44656E6E69732020000000050000000F
<end of data>
The variables run directly into each other, and SAS knows where one starts and one stops from the information in the subheaders. (This is just a PUT statement of course; I think in the actual file the integers are stored first, if I remember correctly; but the idea is the same.)
Technically the .sas7bdat specification is not a publicly disclosed specification, but several people have worked out most of how the file format works. Some R programmers have written a specification which while a bit challenging to read does give some information.
It denotes that 4 bytes are used to specify the field length, which is more than enough for 32767 (it's enough for 2 billion), so this isn't the definite answer; I suppose it may have originally been 2 bytes and changed to 4 at some later point in the development of SAS, though .sas7bdat was a totally new filetype created relatively recently (version 7, hence sas7bdat; we're on 9 now).
Another possibility, and perhaps the more likely one, is that before 1999 the ANSI C standard only required C compilers to support objects to a minimum of 32767 bytes - meaning a compiler didn't have to support arrays larger than 32767 bytes. While many of them did support much larger arrays/objects, it's possible that SAS was working with the minimum standard to avoid issues with different OS and hardware implementations. See this discussion of the ANSI C standards for some background. It's also possible another language's limitations (as SAS uses several different ones) of a similar nature are at fault here. [Credit to FriedEgg for the beginning of this idea (offline).]

Types bit length and architecture specific implementations

I'm doing stuff in C++ but lately I've found that there are slight differences regarding how much data a type can accomodate and also the byte order is an issue.
Suppose I got a binary file, where I've encoded shorts that are 2 bytes in size. The file is in binary format like:
FA C8 - data segment 1
BA 32 - data segment 2
53 56 - data segment 3
Now all is well up to this point. Now I want to read this data. There are 2 problems:
1 what data type to choose to store this values?
2 how to deal with endianness of the target architecture?
The first problem is actually related to the second because here I will have to do bit shifts in order to swap the order of bytes.
I know that I could read the file byte by byte and add every two bytes. But is there an approach that could ease that pain?
I'm sorry If I'm being ambiguous. The problem is hard to explain. Hope you get a glimpse of what I'm talking about. I just want to store this data internally.
So I would appreciate some advices or if you can share some of your experience in this topic.
If you use big endian on the file that stores the data then you could just rely on htons(), htonl(), ntohs(), ntohl() to convert the integers to the right endianess before saving or after reading.
There is no easy way to do this.
Rather than doing that yourself, you might want to look into serialization libraries (for example Protobuf or boost serialization), they'll take care of a lot of that for you.
If you want to do it yourself, use fixed-width types (uint32_t and the like from <cstdint>), and endian conversion functions as appropriate. Either have a "prefix" in your file that determines what endianness it contains (a BOM/Byte Order Mark), or always store in either big or little endian, and systematically convert.
Be extra careful if you need to serialize strings, they have encoding problems of their own too.

What's an AoB (Array of Bytes)

I have encountered this term a couple of times now, and I have googled for explanations, but couldn't find any.
I'm accessing the memory of a running software-game. I do have an address but I'm also given an AoB, for example
89 8B ? ? 00 00 8B 50 ? 89 93 ? ?.
What do I do with it?
I'd appreciate it if you could give me a guide or something.
Thanks
An array of bytes is best explained in C/++ as an array of [unsigned] char.
The values you see are only hexadecimal representations of these bytes or unsigned char's.
An array of bytes is a contiguous series of values, usually in the range 0 to 255 (0x00 to 0xFF).
The contents must be interpreted by the programmer and can be anything from addresses to pixels for a bitmap.
A common use of AoB, a.k.a. buffer, is for I/O, reading and writing data. The fundamental I/O routines do not care about content, just quantity, source and destination. A program may read large amounts of data into an AOB, then later cast it as some kind of structure or assign fields with data from the buffer. See also "serialization." This is a performance technique with I/O: convert many small reads into one large block read.
Not all data has to be in structures or objects; those are just a convenience.

library for matrices in c++

I have a lot of elements in a matrix and when I access them manually it takes a pretty long time to eliminate all the bugs arising from wrong indexing... Is there a suitable library that can keep track of e.g the neighbors,the numbering, if an element is in the outer edge or not and so on.
e.g.
VA=
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
Now what I would like to do is write a function that says something like
for every Neighbor to element at index 12(which would be 41)
do something
I would like this to only recognize the elements at index 8 (31) and 13 (42).
Right now I'm using vectors (vector<vector<int>>V;)but the code gets pretty difficult and clumsy both to write and read since I have these annoying if statements in every single function.
example:
for (int i=0;i<MatrixSIZE;i++)
if ((i+1)%rowSize!=0){//check that it's not in the outer edge.
//Do something
}
What approach would you suggest?
Can boost::MultiArray help me here in some way? Are there any other similar?
UPDATE::
So i'm looking more for a template that can easily access the elements than a template that can do matrix arithmetichs.
Try LAPACK, a linear algebra package.
There is this: http://osl.iu.edu/research/mtl/
or this: http://www.robertnz.net/nm_intro.htm
If you Google it a bit, there's quite a few matrix libraries out there for C++.
This might inspire you:
Matrix classes in c++
Is it used in a larger program ? If not, it would be more adapted to use R to deal with matrices.
If it's in a larger program, you can use a lib such as MTL.