IP address matching filter function - c++

I am writing code in C++ which runs both on windows and mac platform. I want to write a function which will accept machine IP address list and list of IP filters in CIDR format. This function will check if machine IP matches IP filter.
For example. If machine IP 10.210.177.47 and filter is 10.210.177.1/32
The function will check if 10.210.177.47 falls inside the filter range.
Filter can also be Plain IP address like 10.210.177.45
i need to write a common code which can run on windows and mac.

The easiest solution is to convert the mask length into a bit mask. E.g. a /8 uses the upper 8 bits to identify the network and the lower 24 bits to identify hosts within that network. Hence, by shifting the IP address (expressed as std::uint32_t) left over 24 bits (>>24, you keep just the network part. For 10.210.177.47 within 10.0.0.0/8, that leaves 10 - matches. For /24, it would leave 10.210.177 - no match.

Related

HD wallet (bip32) addresses derivation path

I am creating an application that needs to generate a new address from a provided XPUB key.
For instance xpub6CUGRUonZSQ4TWtTMmzXdrXDtypWKiKrhko4egpiMZbpiaQL2jkwSB1icqYh2cfDfVxdx4df189oLKnC5fSwqPfgyP3hooxujYzAu3fDVmz
I am using the Electrum wallet and a key provided by this app.
My application allows users to add their own xpub keys, so my application will be able to generate new addresses without affecting users privacy, as far as xpub keys are only used by my application and not exposed to public.
So I am looking for a way to generate new addresses correctly, I have found some libraries, however I am not sure about the derivation path, how should it look like ?
Consider the following path example
Is the derivation path is more a convention rather than a rule?
Bitcoin first external first m / 44' / 0' / 0' / 0 / 0 is this is a valid path? I have found it here https://github.com/bitcoin/bips/blob/master/bip-0044.mediawiki
I have also found out that Electrum wallets uses another schema https://bitcoin.stackexchange.com/questions/36955/what-bip32-derivation-path-does-electrum-use/36956 in the following format. It uses m/0/ for receiving addresses, and m/1/ for change addresses.
What is the maximum number (n) of addresses? How online tools calculate the balance of an HD wallet, if the N number is quite large it will require a lot of processing power to calculate sum.
So all in all, I wonder what format of the derivation path should I use in order to have no problems with compatibility?
I would be grateful for any help.
question 1-3:
It's bip44 convention, electrum isn't following it therefore it's not compatiable with other wallets which support bip44.
question 4:
the number can be infinite, if you are talking about the maximum number for a certain parent key, answer is:
Each extended key has 2^31 normal child keys, and 2^31 hardened child
key
-https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki
if your application design leads to a very large quantity of addresses, that's your own issue which you need to handle it by better design, and if you mean the compatibility with other wallets, according to bip44,
Address gap limit is currently set to 20. If the software hits 20
unused addresses in a row, it expects there are no used addresses
beyond this point and stops searching the address chain.
https://github.com/bitcoin/bips/blob/master/bip-0044.mediawiki#Address_gap_limit

matching against a list of subnets

There is a list of subnets in the form of net-addr/mask, such as
12.34.45.0/24 192.168.0.0/16 45.0.0.0/10 ...
Wonder what is the best way to tell if a given IP address is in any of the subnets.
Here is a little background on the matching:
For an IP address x, we convert it to an integer. For example, 11.12.13.14 is converted to 0x0b0c0d0e. For a mask m, we convert it to integer whose leading (32-m) bits are 1, the rest are 0.
To check if IP x is in subnet A/m,
we just need to check (x&m) == (A&m)
Curious what's the data structure or functions that makes matching against a range of subnets fast. Of course, we can go through the subnets in a loop but that's not efficient.
Make a tree where each level represents n bits of the IP address. Store subnets on each level so that the number of masks bits is between n * level and n * (level +1). So for example with n = 4, you have 16 children per node. So if you are testing against 11.12.13.14 (== 0x0b0c0d0e), you could walk the tree like this:
0 -> b -> 0 -> c -> 0 -> d -> 0 -> e
And on node you keep track of the subnets with the corresponding size. I mean: level 0 should have subnets /1 to /4 (inclusive), level 1 should have subnets /5 to /8, and so on up to /29 to /32. Note that /0 matches everything, so that would be useless to have in the data structure.
To search in the tree, group the IP in groups of n bits (in my example 4). Descend to the first level matching the first n bits and test all subnets on that level. If not found descend to the next level matching the next n bits.
This way you would have to test 32/n levels of each 2^n subnets maximum. For n=4, you would have to test 8 levels, each with at max 16 subnets. This is done in no time.
Clarification: A node is a subnet, for example (in hex, one digit is a nibble, which is 4 bits): 0a.5a.00.00/16. The parent of this node would be a subnet containing this subnet: for example: 0a.50.00.00/12. The edge towards a child node could be interpreted as: "contains", like in: "this (the parent) subnet contains the subnet represented by child node". For this tree to contain all the subnets you want, you will likely have to insert nodes, which represent a subnet that is not in your list. So mark these nodes as auxiliary nodes so you know that when searching this tree, you know that there are more specific subnets after under it, but the node itself is not part of the list of subnets you want to check against. You only should add these nodes that are directly in the list, and all parent nodes to make the nodes reachable in the tree structure.
Here is a struct on how I see it:
struct subnet_tree_node
{
uint_32 ip; // 32 bit IP address
subnet_tree_node *children;
uint_8 number_of_children;
uint_8 mask; // number of bits for this subnet
uint_8 valid; // wether this node is valid or auxiliary
}
So you've established performance is a problem.
Consider each netmask/addr pair as a pair of IP addresses: First valid, last-valid.
Let us assume last-valid is always odd (Not sure if that's true with a /32 network - but that's really, really strange).
Construct a sorted vector of these IP addresses. (Complain if the networks overlap or anything stupid.)
Search the vector for your target IP address with some sort of binary chop.
If the IP address is in the vector, it is a) wierd; b) in one of the subnets.
If the IP address is not in the vector and the value below is even - it is in a sub-net. If the value below is odd, it is not in a sub-net.
Do you have any evidence that performance is an issue? There are only 2^24 subnets (well, you can have /28 subnets, but they are usually internal to an organization, so even if the organization has a class A network, there's still only 2^24 of them).
Doing 16 million ands and comparisons is going to take no time.
Keep it simple (until you really have to do better).
Thanks for the discussions here, they got me inspired with this solution.
First, with loss of generality, we assume none of the subnet covers the other subnet (or we just remove the smaller one).
Each subnet is considered as an interval [subnet_min, subnet_max].
We just need to organize all the subnets into a binary tree, each node being a pair (subnet_min, subnet_max). When searching for an IP, it traverses the tree just like a regular binary search basing only on subnet_min, with the purpose of finding the node with subnet_min is largest among all the subnet_min's that are <= the given IP. Once we find this node, we check whether the node's subnet_max is greater >= the given IP. If so, the given IP is covered by the subnet, otherwise, we can say this IP is not covered by the subnet in this node, it's also not covered by any of the subnets neither.
The last point is guaranteed by the assumption that none of the two subnets contain each other.

How to understand network length from geolite2 block

I'm trying to decode the end address from the Maxmind Geolite2 database fro the ip_v4 portion, however I am used to working with class, eg /8 /16, etc. and this length of 113, 114, 112 isn't making any sense to me presumably because these are v4 addresses in v6 notation.
eg.
::ffff:1.0.128.0,113
Can anyone point me to how to translate the lengths here so that I can generate the correct mask? I want to understand it mathematically, but for some reason the penny isn't dropping.
To get the IPv4 prefix/mask length, subtract 96 from the IPv6 prefix length. For instance, :ffff::1.0.128.0/113, from your example, is equivalent to 1.0.128.0/17 or the range 1.0.128.0 to 1.0.255.255.

How to discern between network flows

I want to be able to discern between networks flows. I am defining a flow as a tuple of three values (sourceIP, destIP, protocol). I am storing these in a c++ map for fast access. However, if the destinationIP and the sourceIP are different, but contain the same values, (e.g. )
[packet 1: source = 1.2.3.4, dest = 5.6.7.8]
[packet 2: source = 5.6.7.8, dest = 1.2.3.4 ]
I would like to create a key that treats these as the same.
I could solve this by creating a secondary key and a primary key, and if the primary key doesn't match I could loop through the elements in my table and see if the secondary key matches, but this seems really inefficient.
I think this might be a perfect opportunity for hashing, but the it seems like string hashes are only available through boost, and we are not allowed to bring in libraries, and I am not sure if I know of a hash function that only computes on elements, not ordering.
How can I easily tell flows apart according to these rules?
Compare the values of the source and dest IPs as 64-bit numbers. Use the lower one as the hash key, and put the higher one, the protocol and the direction as the values.
Do lookups the same way, use the lower value as the key.
If you consider that a single client can have more than one connection to a service, you'll see that you actually need four values to uniquely identify a flow: the source and destination IP addresses and the source and destination ports. For example, imagine two developers in the same office are searching StackOverflow at the same time. They'll both connect to stackoverflow.com:80, and they'll both have the same source address. But the source ports will be different (otherwise the company's firewall wouldn't know where to route the returned packets). So you'll need to identify each node by an <address, port> pair.
Some ideas:
As stark suggested, sort the source and destination nodes, concatenate them, and hash the result.
Hash the source, hash the destination, and XOR the result. (Note that this may weaken the hash and allow more collisions.)
Make 2 entries for each flow by hashing
<src_addr, src_port, dst_addr, dst_port> and also
<dst_addr, dst_port, src_addr, src_port>. Add them both to the map and point them both to the same data structure.

Querying GeoLite2 Country CSV in SQL

Does anyone know how to look up an IP4 address from MaxMind's GeoLite2 Country CSV using SQL?
I have been using MaxMind's free GeoIP data for many years, and would like to upgrade to their GeoLite2 data. I have the blocks and locations data loaded into MySQL tables, but am not sure how to determine the address range that an IP4 address falls into. The old format had a start/end number for each block; the new format only seems to have a start number.
I have already hunted through the MaxMind developer docs, and Googled, but can't seem to find any information on how to query the new format. I'm sure it's obvious, and will edit this posting if I figure it out in the interim.
I think thank I'd have to find the first blocks entry that is greater or equal to the IP4 address, and LIMIT 1, perhaps.
I use this data both for web application looks, and for querying directly in SQL for generating reports; so I usually need to make sure I can implement the lookup in both Perl code and pure SQL.
I am upgrading because I'm seeing some funny results for Japanese visitors appearing to be from France on the old data.
many thanks
The address format used in Geolite2 CSV includes a block IP address start, followed by a Prefix length #, which can be converted into the block IP address end.
(Somewhat confusingly, Maxmind is using "Network_Mask_Length" instead of "Prefix_Length", the accepted IPv6 terminology, to label this field.)
Geolite2 CSV's blocks fields layout:
network_start_ip,network_mask_length,geoname_id,registered_country_geoname_id,re
presented_country_geoname_id,postal_code,latitude,longitude,is_anonymous_proxy,i
s_satellite_provider
Exemple:(record extracted from Geolite2-Country-Blocks.csv)
::ffff:81.248.136.0,120,3578476,3017382,,,,,0,0
Given the above exemple, what is the Last IPv4 address assigned to the block?
First IP address: 81.248.136.0
Prefix_Length/Network_mask_Length: 120
Last IP address: 81.248.136.255
The following URL might be handy to quickly look up the number of IP addresses available for a specific Prefix_Length:
http://www.gestioip.net/cgi-bin/subnet_calculator.cgi
__philippe
Prefix_Length calculator usage:
(In this case, more of a simple Table Lookup tool than a Calculator, really...;-)
http://www.gestioip.net/cgi-bin/subnet_calculator.cgi
In the calculator, tick the IPv6 button, then click the PL box down arrow.
A list of "Prefix Length" will be presented, with the corresponding available number of IP addresses.
To determine the last IP adress of any Geolite2 block, the following relevant range of Prefix_Length/address pairs should most likely suffice:
Prefix #addresses
Length
117 2048
118 1024
119 512
120 256
121 128
122 64
123 32
124 16
125 8
126 4
127 2
128 1
Note that the Geolite2 file structure follows a form of hybrid IPv4/IPv6 notation, aka
"IPv4-mapped-IPv6 address".
Those "hybrid" addresses are written with the first 96 bits in the standard IPv6 format, and the remaining 32 bits written in the customary dot-decimal notation of IPv4.
For instance, ::ffff:192.0.2.128 represents the IPv4 address 192.0.2.128
For much more on this very(hairy) subject, check here:
http://en.wikipedia.org/wiki/IPv6
__philippe