Fibonacci problem causes Arithmetic overflow - crystal-lang

The problem: create a function with one input. Return the index of an array containing the fibonacci sequence (starting from 0) whose element matches the input to the function.
16 ~ │ def fib(n)
17 ~ │ return 0 if n == 0
18 │
19 ~ │ last = 0u128
20 ~ │ current = 1u128
21 │
22 ~ │ (n - 1).times do
23 ~ │ last, current = current, last + current
24 │ end
25 + │
26 + │ current
27 │ end
28 │
60 │ def usage
61 │ progname =
62 │
63 │ STDERR.puts <<-H
64 │ #{progname} <integer>
65 │ Given Fibonacci; determine which fib value would
66 │ exist at <integer> index.
67 │ H
68 │
69 │ exit 1
70 │ end
71 │
72 │ if ARGV.empty?
73 │ usage
74 │ end
75 │
76 │ begin
77 ~ │ i = ARGV[0].to_i
78 ~ │ puts fib i
79 │ rescue e
80 │ STDERR.puts e
81 │ usage
82 │ end
My solution to the problem is in no way elegant and I did it at 2AM when I was quite tired. So I'm not looking for a more elegant solution. What I am curious about is that if I run the resultant application with an input larger than 45 then I'm presented with Arithmetic overflow. I think I've done something wrong with my variable typing. I ran this in Ruby and it runs just fine so I know it's not a hardware issue...
Could someone help me find what I did wrong in this? I'm still digging, too. I just started working with Crystal this week. This is my second application/experiment with it. I really like, but I am not yet aware of some of its idiosyncrasies.
Updated script to reflect suggested change and outcome of runtime from said change. With said change, I can now run the program successfully over the number 45 now but only up to about low 90s. So that's interesting. I'm gonna run through this and see where I may need to add additional explicit casting. It seems very unintuitive that changing the type at the time of initiation didn't "stick" through the entire runtime, which I tried first and that failed. Something doesn't make sense here to me.
Original Results
$ crystal build
$ ./fib 45
$ ./fib 46
Arithmetic overflow
$ ./fib.rb 460
Latest Results
$ ./fib 92
$ ./fib 93
Arithmetic overflow
./fib <integer>
Given Fibonacci; determine which fib value would
exist at <integer> index.
Edit ^2
Now also decided that maybe ARGV[0] is the problem. So I changed the call to f() to:
62 begin
63 i = ARGV[0]
64 puts f i
65 rescue e
66 STDERR.puts e
67 usage
68 end
and added a debug print to show the types of the variables in use:
22 return 0 if p == 0
24 puts "p: %s\tfib_now: %s\tfib_last: %s\tfib_hold: %s\ti: %s" % [typeof(p), typeof(fib_now), typeof(fib_last), typeof(fib_hold), typeof(i)]
25 loop do
p: UInt64 fib_now: UInt64 fib_last: UInt64 fib_hold: UInt64 i: UInt64
Arithmetic overflow
./fib <integer>
Given Fibonacci; determine which fib value would
exist at <integer> index.
Edit ^3
Updated with latest code after bug fix solution by Jonne. Turns out the issue is that I'm hitting the limits of the structure even with 128 bit unsigned integers. Ruby handles this gracefully. Seems that in crystal, it's up to me to gracefully handle it.

The default integer type in Crystal is Int32, so if you don't explicitly specify the type of an integer literal, you get that.
In particular the lines
fib_last = 0
fib_now = 1
turn the variables into the effective type Int32. To fix this, make sure you specify the type of these integers, given you don't need negative numbers, UInt64 seems most appropriate here:
fib_last = 0u64
fib_now = 1u64
Also note the the literal syntax I'm using here. Your 0.to_i64's create an In32 and then an Int64 out of that. The compiler will be smart enough to do this conversion at compile time in release builds, but I think it's nicer to just use the literal syntax.
Edit answering to to the updated question
Fibonacci is defined as F0 = 0, F1 = 1, Fn = Fn-2 + Fn-1, so 0, 1, 1, 2, 3, 5.
Your algorithm is off by one. It calculates Fn+1 for a given n > 1, in other words 0, 1, 2, 3, 5, in yet other words it basically skips F2.
Here's one that does it correctly:
def fib(n)
return 0 if n == 0
last = 0u64
current = 1u64
(n - 1).times do
last, current = current, last + current
This correctly gives 7540113804746346429 for F92 and 12200160415121876738 for F93. However it still overflows for F94 because that would be 19740274219868223167 which is bigger than 264 = 18446744073709551616, so it doesn't fit into UInt64. To clarify once more, your version tries to calculate F94 when being asked for F93, hence you get it "too early".
So if you want to support calculating Fn for n > 93 then you need to venture into the experimental Int128/UInt128 support or use BigInt.

I think one more thing should be mentioned to explain the Ruby/Crystal difference, besides the fact that integer literals default to Int32.
In Ruby, a dynamically typed interpreted language, there is no concept of variable type, only value type. All variables can hold values of any type.
This allows it to transparently turn a Fixnum into a Bignum behind the scenes when it would overflow.
Crystal on the contrary is a statically typed compiled language, it looks and feels like Ruby thanks to type inference and type unions, but the variables themselves are typed.
This allows it to catch a large number of errors at compile time and run Ruby-like code at C-like speed.
I think, but don't take my word for it, that Crystal could in theory match Ruby's behavior here, but it would be more trouble than good. It would require all operations on integers to return a type union with BigInt, at which point, why not leave the primitive types alone, and use big integers directly when necessary.
Long story short, if you need to work with very large integer values beyond what an UInt128 can hold, require "big" and declare the relevant variables BigInt.
edit: see also here for extreme cases, apparently BigInts can overflow too (I never knew) but there's an easy remedy.


calculate merkle root from 2 transaction

I want to prove the calculation of Merkle root for bitcoin, but I can't get the same root as shown on block explorer.
Data from block explorer (
tx1 hash = "c06fbab289f723c6261d3030ddb6be121f7d2508d77862bb1e484f5cd7f92b25"
tx2 hash = "5a4ebf66822b0b2d56bd9dc64ece0bc38ee7844a23ff1d7320a88c5fdb2ad3e2"
Merkle Root = "8fb300e3fdb6f30a4c67233b997f99fdd518b968b9a3fd65857bfe78b2600719"
Now what I do is, first I reverse the order of TX1 and TX2 to become
tx1 hash reverse = "252BF9D75C4F481EBB6278D708257D1F12BEB6DD30301D26C623F789B2BA6FC0"
tx2 hash reverse = "E2D32ADB5F8CA820731DFF234A84E78EC30BCE4EC69DBD562D0B2B8266BF4E5A"
Then I combine both string and hash it using sha256 from web (
so first hash i got answer = "9a5c2897b8d01cb7996867e01b70bb1a4c84190982bd71d55d1efe5320feee22"
second hash = Merkle root = "437b30772522751ee150eb4b0a9c246d28557036cd720f75aecbba32ea59d174"
but answer from block hash is = "8fb300e3fdb6f30a4c67233b997f99fdd518b968b9a3fd65857bfe78b2600719"
I tried also to do it in C++ using the hashlib++ library and the answer is same as above but not equal to the root shown in block explorer.
So where is my understanding wrong? I need help to pin point my correction.
I tried your example and got the exact correct answer, not sure if you are doing this right, please see my code below written in Python to make it easier for you to calculate:
X1a = binascii.unhexlify("c06fbab289f723c6261d3030ddb6be121f7d2508d77862bb1e484f5cd7f92b25")[::-1]
X1b = binascii.unhexlify("5a4ebf66822b0b2d56bd9dc64ece0bc38ee7844a23ff1d7320a88c5fdb2ad3e2")[::-1]
Y = hashlib.sha256(hashlib.sha256(X1a + X1b).digest()).digest()[::-1]
Z = binascii.hexlify(Y)
print Z
To explain my code:
Lines (1 and 2) takes your string and converts it into a byte array, it then reverses it with the [::-1] at the end of that line, so, your 11223344 would become 44,33,22,11
Line 3 does a Double-SHA256 of the concatenated byte arrays and then also reverse the byte order
Line 4 converts the byte array into a hex string, this is for printing or sending somewhere.
Line 5 is the printing/output.
This works and returns the output of "8fb300e3fdb6f30a4c67233b997f99fdd518b968b9a3fd65857bfe78b2600719" which is the correct answer as submitted by you initially, also matches the block you referenced.

How to generate every combination from a given amount of integers

I found this code which does exactly what I want: gets a list of integers, and generates every combination.
perm([H|T], Perm) :-
perm(T, SP),
insert(H, SP, Perm).
perm([], []).
insert(X, T, [X|T]).
insert(X, [H|T], [H|NT]) :-
insert(X, T, NT).
Now what I want to do is, if one permutation does not meet some criteria, I want perm to return another result. So, and sorry for the lack of vocabulary, I want the same effect that would happen if I would execute that code, got a solution, and typed ; to get more results. I believe this is a very simple idea but I can't see it right now.
So, pseudocode would be:
enumerate(inputList, outputNodes, OutputArcs) :-
getArcs(OutputPermutation,OutputArcs),%I want to build the OutputArcs, then check for every element to be unique, if it isn't, generate another list with perm, if it IS, return said list as accepted)
areArcsNumberUniques(OutputArcs,OutputArcs),%TODO now is when I do not know how to make the call, here if it is valid, end, if it isn't, call perm again)
So I would need to understand how do I go about this. Also, any other ideas about the problem are welcome, since I'm brute forcing my way because I'm unable to find any type of algorithm or pattern to solve the actual problem (which I've asked about before. This is my attempted solution, just in order to give an actual answer to the exercise...)
edit: query:
enumerate([a-b,b-c], EnumNodos, EnumArcos).
expected output:
EnumNodos = [enum(3,a), enum(1,b), enum(2,c)],
EnumArcos = [enum(2,a‐b), enum(1,b‐c)]
This would be like the end game goal, where I get a list of arcs where each arc has an unique value that is equal to substracting the values of its nodes (every node also has an unique value).
And so far, since I did not find any way to do this algorithmically, I thought about trying every possibility (basically I cannot get an unique way to do this, trees with different branches seem different to me, and only restriction is that there are N nodes and N-1 arcs).
edit more examples:
5 4
1b 2e
2 3
3c 5f
EnumNodos = [enum(6,a), enum(1,b), enum(2,e), enum(3,c), enum(5,f), enum(4,d)],
EnumArcos = [enum(5,a‐b), enum(4,a-e), enum(3,e-f), , enum(2,b-c), enum(1,c-d)]
4 3
1b 2e
1 2
3c 4f
EnumNodos = [enum(5,a), enum(1,b), enum(2,e), enum(3,c), enum(4,f)],
EnumArcos = [enum(4,a‐b), enum(3,a-e), enum(1,b-c), , enum(2,e-f)]
4 3
1b 2e
8 7
1b 2e
6 4
7c 6f
2 2
5d 4g
3 1
8h 3i

Inferring simplest method to convert bit array 1 to bit array 2

Consider the set of all bit arrays of length n. Now consider the set of all 1-to-1 functions that map from this set to this set.
Now select a single function out of the latter set. Is there any algorithm to find a "minimal" method of implementing this function? Assume that we only have access to fundamental bit array operators such as AND OR XOR NOT and left and right bitshifts.
In case you're wondering, the reason I want this is because I'm writing an algorithm to convert from z-curve ordering of bits to hilbert-curve ordering of bits. My current method is to make a lookup table, but I bet there's a better option.
As a simple example, let's say I have a truth table that looks like this:
00 -> 10
01 -> 01
10 -> 00
11 -> 11
Then I should be able to infer that, given an input bit string input, the output bit string output is (in java syntax)
output = ((~input) << 1) ^ input
Here's the proof in this case:
00 -> 11 -> 10 -> 10
01 -> 10 -> 00 -> 01
10 -> 01 -> 10 -> 00
11 -> 00 -> 00 -> 11

R: invalid subscript type 'list'

I'm trying to use the indices of a sorted column of a dataset. I want to reorder the entire dataset by one sorted column.
area.sort<-sort(xsample$area1, index.return=TRUE)[2]
The output is a list, so I can't use it index through the whole dataset.
Error in xj[i] : invalid subscript type 'list'
Someone suggested using unlist but I can't get rid of the ix*.
Any ideas? Thanks
> area.sort<-unlist(area.sort)
ix1 ix2 ix3 ix4 ix5 ix6 ix7 ix8 ix9 ix10 ix11 ix12 ix13
45 96 92 80 53 54 24 21 63 81 40 66 64
The call to sort with index.return=TRUE returns a list with two components: x and ix. Indexing with [2] returns a subset of the list - still a list.
If you index using [[2]] it should work better. That returns the element in the list.
But indexing using $ix is perhaps a bit clearer.
But then again, if you only need the sorted indices, you should call order instead of sort...

Efficient way of storing Huffman tree

I am writing a Huffman encoding/decoding tool and am looking for an efficient way to store the Huffman tree that is created to store inside of the output file.
Currently there are two different versions I am implementing.
This one reads the entire file into memory character by character and builds a frequency table for the whole document. This would only require outputting the tree once, and thus efficiency is not that big of a concern, other than if the input file is small.
The other method I am using is to read a chunk of data, about 64 kilobyte in size and run the frequency analysis over that, create a tree and encode it. However, in this case before every chunk I will need to output my frequency tree so that the decoder is able to re-build its tree and properly decode the encoded file. This is where the efficiency does come into place since I want to save as much space as possible.
In my searches so far I have not found a good way of storing the tree in as little space as possible, I am hoping the StackOverflow community can help me find a good solution!
Since you already have to implement code to handle a bit-wise layer on top of your byte-organized stream/file, here's my proposal.
Do not store the actual frequencies, they're not needed for decoding. You do, however, need the actual tree.
So for each node, starting at root:
If leaf-node: Output 1-bit + N-bit character/byte
If not leaf-node, output 0-bit. Then encode both child nodes (left first then right) the same way
To read, do this:
Read bit. If 1, then read N-bit character/byte, return new node around it with no children
If bit was 0, decode left and right child-nodes the same way, and return new node around them with those children, but no value
A leaf-node is basically any node that doesn't have children.
With this approach, you can calculate the exact size of your output before writing it, to figure out if the gains are enough to justify the effort. This assumes you have a dictionary of key/value pairs that contains the frequency of each character, where frequency is the actual number of occurrences.
Pseudo-code for calculation:
Tree-size = 10 * NUMBER_OF_CHARACTERS - 1
Encoded-size = Sum(for each char,freq in table: freq * len(PATH(char)))
The tree-size calculation takes the leaf and non-leaf nodes into account, and there's one less inline node than there are characters.
SIZE_OF_ONE_CHARACTER would be number of bits, and those two would give you the number of bits total that my approach for the tree + the encoded data will occupy.
PATH(c) is a function/table that would yield the bit-path from root down to that character in the tree.
Here's a C#-looking pseudo-code to do it, which assumes one character is just a simple byte.
void EncodeNode(Node node, BitWriter writer)
if (node.IsLeafNode)
EncodeNode(node.LeftChild, writer);
EncodeNode(node.Right, writer);
To read it back in:
Node ReadNode(BitReader reader)
if (reader.ReadBit() == 1)
return new Node(reader.ReadByte(), null, null);
Node leftChild = ReadNode(reader);
Node rightChild = ReadNode(reader);
return new Node(0, leftChild, rightChild);
An example (simplified, use properties, etc.) Node implementation:
public class Node
public Byte Value;
public Node LeftChild;
public Node RightChild;
public Node(Byte value, Node leftChild, Node rightChild)
Value = value;
LeftChild = leftChild;
RightChild = rightChild;
public Boolean IsLeafNode
return LeftChild == null;
Here's a sample output from a specific example.
A: 6
B: 1
C: 6
D: 2
E: 5
Each character is just 8 bits, so the size of the tree will be 10 * 5 - 1 = 49 bits.
The tree could look like this:
| 8
| -------
12 | 3
----- | -----
6 6 5 1 2
So the paths to each character is as follows (0 is left, 1 is right):
A: 00
B: 110
C: 01
D: 111
E: 10
So to calculate the output size:
A: 6 occurrences * 2 bits = 12 bits
B: 1 occurrence * 3 bits = 3 bits
C: 6 occurrences * 2 bits = 12 bits
D: 2 occurrences * 3 bits = 6 bits
E: 5 occurrences * 2 bits = 10 bits
Sum of encoded bytes is 12+3+12+6+10 = 43 bits
Add that to the 49 bits from the tree, and the output will be 92 bits, or 12 bytes. Compare that to the 20 * 8 bytes necessary to store the original 20 characters unencoded, you'll save 8 bytes.
The final output, including the tree to begin with, is as follows. Each character in the stream (A-E) is encoded as 8 bits, whereas 0 and 1 is just a single bit. The space in the stream is just to separate the tree from the encoded data and does not take up any space in the final output.
001A1C01E01B1D 0000000000001100101010101011111111010101010
For the concrete example you have in the comments, AABCDEF, you will get this:
A: 2
B: 1
C: 1
D: 1
E: 1
F: 1
| 4
| ---------
3 2 2
----- ----- -----
2 1 1 1 1 1
A: 00
B: 01
C: 100
D: 101
E: 110
F: 111
Tree: 001A1B001C1D01E1F = 59 bits
Data: 000001100101110111 = 18 bits
Sum: 59 + 18 = 77 bits = 10 bytes
Since the original was 7 characters of 8 bits = 56, you will have too much overhead of such small pieces of data.
If you have enough control over the tree generation, you could make it do a canonical tree (the same way DEFLATE does, for example), which basically means you create rules to resolve any ambiguous situations when building the tree. Then, like DEFLATE, all you actually have to store are the lengths of the codes for each character.
That is, if you had the tree/codes Lasse mentioned above:
A: 00
B: 110
C: 01
D: 111
E: 10
Then you could store those as:
2, 3, 2, 3, 2
And that's actually enough information to regenerate the huffman table, assuming you're always using the same character set -- say, ASCII. (Which means you couldn't skip letters -- you'd have to list a code length for each one, even if it's zero.)
If you also put a limitation on the bit lengths (say, 7 bits), you could store each of these numbers using short binary strings. So 2,3,2,3,2 becomes 010 011 010 011 010 -- Which fits in 2 bytes.
If you want to get really crazy, you could do what DEFLATE does, and make another huffman table of the lengths of these codes, and store its code lengths beforehand. Especially since they add extra codes for "insert zero N times in a row" to shorten things further.
The RFC for DEFLATE isn't too bad, if you're already familiar with huffman coding:
branches are 0 leaves are 1. Traverse the tree depth first to get its "shape"
e.g. the shape for this tree
0 - 0 - 1 (A)
| \- 1 (E)
0 - 1 (C)
\- 0 - 1 (B)
\- 1 (D)
would be 001101011
Follow that with the bits for the characters in the same depth first order AECBD (when reading you'll know how many characters to expect from the shape of the tree). Then output the codes for the message. You then have a long series of bits that you can divide up into characters for output.
If you are chunking it, you could test that storing the tree for the next chuck is as efficient as just reusing the tree for the previous chunk and have the tree shape being "1" as an indicator to just reuse the tree from the previous chunk.
The tree is generally created from a frequency table of the bytes. So store that table, or just the bytes themselves sorted by frequency, and re-create the tree on the fly. This of course assumes that you're building the tree to represent single bytes, not larger blocks.
UPDATE: As pointed out by j_random_hacker in a comment, you actually can't do this: you need the frequency values themselves. They are combined and "bubble" upwards as you build the tree. This page describes the way a tree is built from the frequency table. As a bonus, it also saves this answer from being deleted by mentioning a way to save out the tree:
The easiest way to output the huffman tree itself is to, starting at the root, dump first the left hand side then the right hand side. For each node you output a 0, for each leaf you output a 1 followed by N bits representing the value.
A better approach
| 4
| ---------
3 2 2
----- ----- -----
2 1 1 1 1 1 : frequencies
2 2 3 3 3 3 : tree depth (encoding bits)
Now just derive this table:
depth number of codes
----- ---------------
2 2 [A B]
3 4 [C D E F]
You don't need to use the same binary tree, just keep the computed tree depth i.e. the number of encoding bits. So just keep the vector of uncompressed values [A B C D E F] ordered by tree depth, use relative indexes instead to this separate vector. Now recreate the aligned bit patterns for each depth:
depth number of codes
----- ---------------
2 2 [00x 01x]
3 4 [100 101 110 111]
What you immediately see is that only the first bit pattern in each row is significant. You get the following lookup table:
first pattern depth first index
------------- ----- -----------
000 2 0
100 3 2
This LUT has a very small size (even if your Huffman codes can be 32-bit long, it will only contain 32 rows), and in fact the first pattern is always null, you can ignore it completely when performing a binary search of patterns in it (here only 1 pattern will need to be compared to know if the bit depth is 2 or 3 and get the first index at which the associated data is stored in the vector). In our example you'll need to perform a fast binary search of input patterns in a search space of 31 values at most, i.e. a maximum of 5 integer compares. These 31 compare routines can be optimized in 31 codes to avoid all loops and having to manage states when browing the integer binary lookup tree.
All this table fits in small fixed length (the LUT just needs 31 rows atmost for Huffman codes not longer than 32 bits, and the 2 other columns above will fill at most 32 rows).
In other words the LUT above requires 31 ints of 32-bit size each, 32 bytes to store the bit depth values: but you can avoid it this by implying the depth column (and the first row for depth 1):
first pattern (depth) first index
------------- ------- -----------
(000) (1) (0)
000 (2) 0
100 (3) 2
000 (4) 6
000 (5) 6
... ... ...
000 (32) 6
So your LUT contains [000, 100, 000(30times)]. To search in it you must find the position where the input bits pattern are between two patterns: it must be lower than the pattern at the next position in this LUT but still higher than or equal to the pattern in the current position (if both positions contain the same pattern, the current row will not match, the input pattern fits below). You'll then divide and conquer, and will use 5 tests at most (the binary search requires a single code with 5 embedded if/then/else nested levels, it has 32 branches, the branch reached indicates directly the bit depth that does not need to be stored; you perform then a single directly indexed lookup to the second table for returning the first index; you derive additively the final index in the vector of decoded values).
Once you get a position in the lookup table (search in the 1st column), you immediately have the number of bits to take from the input and then the start index to the vector. The bit depth you get can be used to derive directly the adjusted index position, by basic bitmasking after substracting the first index.
In summary: never store linked binary trees, and you don't need any loop to perform thelookup which just requires 5 nested ifs comparing patterns at fixed positions in a table of 31 patterns, and a table of 31 ints containing the start offset within the vector of decoded values (in the first branch of the nested if/then/else tests, the start offset to the vector is implied, it is always zero; it is also the most frequent branch that will be taken as it matches the shortest code which is for the most frequent decoded values).
There are two main ways to store huffman code LUTs as the other answers state. You can either store the geometry of the tree, 0 for a node, 1 for a leaf, then put in all the leaf values, or you can use canonical huffman encoding, storing the lengths of the huffman codes.
The thing is, one method is better than the other depending on the circumstances.
Let's say, the number of unique symbols in the data you wish to compress (aabbbcdddd, there are 4 unique symbols, a, b, c, d) is n.
The number of bits to store the geometry of the tree along side the symbols in the tree is 10n - 1.
Assuming you store the code lengths in order of the symbols the code lengths are for, and that the code lengths are 8 bits (code lengths for a 256 symbol alphabet will not exceed 8 bits), the size of the code length table will be a flat 2048 bits.
When you have a high number of unique symbols, say 256, it will take 2559 bits to store the geometry of the tree. In this case, the code length table is much more efficient. 511 bits more efficient, to be exact.
But if you only have 5 unique symbols, the tree geometry only takes 49 bits, and in this case, when compared to storing the code length table, storing the tree geometry is almost 2000 bits better.
The tree geometry is most efficient for n < 205, while a code length table is more efficient for n >= 205. So, why not get the best of both worlds, and use both? Have 1 bit at the start of your compressed data represent whether the next however many bits are going to be in the format of a code length table, or the geometry of the huffman tree.
In fact, why not add two bits, and when both of them are 0, there is no table, the data is uncompressed. Because sometimes, you can't get compression! And it would be best to have a single byte at the beginning of your file that is 0x00 telling your decoder not to worry about doing anything. Saves space by not including the table or geometry of a tree, and saves time, not having to unnecessarily compress and decompress data.