HD wallet (bip32) addresses derivation path - blockchain

I am creating an application that needs to generate a new address from a provided XPUB key.
For instance xpub6CUGRUonZSQ4TWtTMmzXdrXDtypWKiKrhko4egpiMZbpiaQL2jkwSB1icqYh2cfDfVxdx4df189oLKnC5fSwqPfgyP3hooxujYzAu3fDVmz
I am using the Electrum wallet and a key provided by this app.
My application allows users to add their own xpub keys, so my application will be able to generate new addresses without affecting users privacy, as far as xpub keys are only used by my application and not exposed to public.
So I am looking for a way to generate new addresses correctly, I have found some libraries, however I am not sure about the derivation path, how should it look like ?
Consider the following path example
Is the derivation path is more a convention rather than a rule?
Bitcoin first external first m / 44' / 0' / 0' / 0 / 0 is this is a valid path? I have found it here https://github.com/bitcoin/bips/blob/master/bip-0044.mediawiki
I have also found out that Electrum wallets uses another schema https://bitcoin.stackexchange.com/questions/36955/what-bip32-derivation-path-does-electrum-use/36956 in the following format. It uses m/0/ for receiving addresses, and m/1/ for change addresses.
What is the maximum number (n) of addresses? How online tools calculate the balance of an HD wallet, if the N number is quite large it will require a lot of processing power to calculate sum.
So all in all, I wonder what format of the derivation path should I use in order to have no problems with compatibility?
I would be grateful for any help.

question 1-3:
It's bip44 convention, electrum isn't following it therefore it's not compatiable with other wallets which support bip44.
question 4:
the number can be infinite, if you are talking about the maximum number for a certain parent key, answer is:
Each extended key has 2^31 normal child keys, and 2^31 hardened child
key
-https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki
if your application design leads to a very large quantity of addresses, that's your own issue which you need to handle it by better design, and if you mean the compatibility with other wallets, according to bip44,
Address gap limit is currently set to 20. If the software hits 20
unused addresses in a row, it expects there are no used addresses
beyond this point and stops searching the address chain.
https://github.com/bitcoin/bips/blob/master/bip-0044.mediawiki#Address_gap_limit

Related

How is the 'finalDigest' calculated in the 'sevLaunchAttestationReportEvent' log entry for confidential VMs in GCP?

I've experimented with launching some confidential VM instances. The simple scenario includes:
Launch an instance named 'Alice'.
Stop and relaunch instance 'Alice'.
Delete instance 'Alice', create a new VM instance named 'Alice'
I checked the 'sevLaunchAttestationReportEvent' log entry.
As expected, in all three cases the 'guestMemoryRegion' digest was identical in all cases.
However, the 'finalDigest' was different in all three cases. My questions are:
A. How is the 'finalDigest' calculated?
B. What is the purpose of a 'finalDigest' that is different at each launch of an identical VM image?
C. Can the 'finalDigest' be pre-calculate before instantiation?
Thanks.
First of all, a Confidential Virtual Machine runs on hosts based on the second generation of AMD Epyc processors, it is optimized for security workloads and includes inline memory encryption that ensures that data is encrypted while it's in RAM.
You can consult the following documentation to get further information.
Regarding your questions:
A. How is the 'finalDigest' calculated?
To calculate the digest value, a Digest Algorithm you can be use, those algorithms could be:
SHA-1
SHA-256
SHA-384
SHA-512
MD5
They are functions to take a large document and compute a "digest" (also called "hash"), this is typically used in a digital signing process.
B. What is the purpose of a 'finalDigest' that is different at each launch of an identical VM image?
A message digest or hash function is used to turn input of arbitrary length into an output of fixed length and this output can then be used in place of the original input, and the digest can be changed every time that the VM instance is turned on because some changes were executed internally in the instance. I mean, the hash algorithm takes into consideration those changes, even though a single byte is changed the digest or hash will change completely.
C. Can the 'finalDigest' be pre-calculate before instantiation?
In my opinion this is not feasible because the digest algorithm is a one-way function, that is, a function which is practically infeasible to invert.
You can get more information about the hash functions on this link.

How to discern between network flows

I want to be able to discern between networks flows. I am defining a flow as a tuple of three values (sourceIP, destIP, protocol). I am storing these in a c++ map for fast access. However, if the destinationIP and the sourceIP are different, but contain the same values, (e.g. )
[packet 1: source = 1.2.3.4, dest = 5.6.7.8]
[packet 2: source = 5.6.7.8, dest = 1.2.3.4 ]
I would like to create a key that treats these as the same.
I could solve this by creating a secondary key and a primary key, and if the primary key doesn't match I could loop through the elements in my table and see if the secondary key matches, but this seems really inefficient.
I think this might be a perfect opportunity for hashing, but the it seems like string hashes are only available through boost, and we are not allowed to bring in libraries, and I am not sure if I know of a hash function that only computes on elements, not ordering.
How can I easily tell flows apart according to these rules?
Compare the values of the source and dest IPs as 64-bit numbers. Use the lower one as the hash key, and put the higher one, the protocol and the direction as the values.
Do lookups the same way, use the lower value as the key.
If you consider that a single client can have more than one connection to a service, you'll see that you actually need four values to uniquely identify a flow: the source and destination IP addresses and the source and destination ports. For example, imagine two developers in the same office are searching StackOverflow at the same time. They'll both connect to stackoverflow.com:80, and they'll both have the same source address. But the source ports will be different (otherwise the company's firewall wouldn't know where to route the returned packets). So you'll need to identify each node by an <address, port> pair.
Some ideas:
As stark suggested, sort the source and destination nodes, concatenate them, and hash the result.
Hash the source, hash the destination, and XOR the result. (Note that this may weaken the hash and allow more collisions.)
Make 2 entries for each flow by hashing
<src_addr, src_port, dst_addr, dst_port> and also
<dst_addr, dst_port, src_addr, src_port>. Add them both to the map and point them both to the same data structure.

Machine unique ID [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Generating a unique machine id
I want processor serial number which is unique id no other processor have that id. Also i have hard disk serial number. I am using c++. Can anyone please help me for this?
I need unique machine id like CPU number,motherboard number using c++.
Win32_BaseBoard,
Win32_Processor
Win32_DiskPartition
Thank you.
According to Wikipedia, starting with the Pentium III the CPUID assembler opcode is supported, however due to security concerns is no longer implemented. See the following article for details: http://en.wikipedia.org/wiki/CPUID#EAX.3D3:_Processor_Serial_Number
The best way is to derive a Machine Unique ID from different sources rather than depending on single parameter.
Check http://sowkot.blogspot.com/2008/08/generating-unique-keyfinger-print-for.html for more information.
Even the method described in the above link can't guarantee always same MID (user might change the hardware).
Based on my experience, at the application start/launch generate MID and store in the application specific area (may be in registry) and use this for all other application related tasks instead of generating every time. In such case a normal GUID generation should suffice.
If you need a unique ID, you don't have to tie it up to the hardware, simply, generate a new random ID (128 bits or larger)! Store it in whatever persistent storage mechanism you prefer, so that next time you extract the same ID you generated before.
If you use processor or disk serial numbers, they will be subject to change, because users could upgrade their hardware. Your own unique ID will never change. The only downside of this, is that machines with dual boot will have two or more ID's -- one ID per instance of the OS.

Check a fingerprint in the database

I am saving the fingerprints in a field "blob", then wonder if the only way to compare these impressions is retrieving all prints saved in the database and then create a vector to check, using the function "identify_finger"? You can check directly from the database using a SELECT?
I'm working with libfprint. In this code the verification is done in a vector:
def test_identify():
cur = DB.cursor()
cur.execute('select id, fp from print')
id = []
gallary = []
for row in cur.fetchall():
data = pyfprint.pyf.fp_print_data_from_data(str(row['fp']))
gallary.append(pyfprint.Fprint(data_ptr = data))
id.append(row['id'])
n, fp, img = FingerDevice.identify_finger(gallary)
There are two fundamentally different ways to use a fingerprint database. One is to verify the identity of a person who is known through other means, and one is to search for a person whose identity is unknown.
A simple library such as libfprint is suitable for the first case only. Since you're using it to verify someone you can use their identity to look up a single row from the database. Perhaps you've scanned more than one finger, or perhaps you've stored multiple scans per finger, but it will still be a small number of database blobs returned.
A fingerprint search algorithm must be designed from the ground up to narrow the search space, to compare quickly, and to rank the results and deal with false positives. Just as a Google search may come up with pages totally unrelated to what you're looking for, so too will a fingerprint search. There are companies that devote their entire existence to solving this problem.
Another way would be to have a mysql plugin that knows how to work with fingerprint images and select based on what you are looking for.
I really doubt that there is such a thing.
You could also try to parallelize the fingerprint comparation, ie - calling:
FingerDevice.identify_finger(gallary)
in parallel, on different cores/machines
You can't check directly from the database using a SELECT because each scan is different and will produce different blobs. libfprint does the hard work of comparing different scans and judging if they are from the same person or not
What zinking and Tudor are saying, I think, is that if you understand how does that judgement process works (which is by the way, by minutiae comparison) you can develop a method of storing the relevant data for the process (the *minutiae, maybe?) in the database and then a method for fetching the relevant values -- maybe a kind of index or some type of extension to the database.
In other words, you would have to reimplement the libfprint algorithms in a more complex (and beautiful) way, instead of just accepting the libfprint method of comparing the scan with all stored fingerprint in a loop.
other solutions for speeding your program
use C:
I only know sufficient C to write kind of hello-world programs, but it was not hard to write code in pure C to use the fp_identify_finger_img function of libfprint and I can tell you it is much faster than pyfprint.identify_finger.
You can continue doing the enrollment part of the stuff in python. I do it.
use a time / location based SELECT:
If you know your users will scan their fingerprints with more probability at some time than other time, or at some place than other place (maybe arriving at work at some time and scanning their fingers, or leaving, or entering the building by one gate, or by other), you can collect data (at each scan) for measuring the probabilities and creating parallel tables to sort the users for their probability of arriving at each time and location.
We know that identify_finger tries to identify fingers in a loop with the fingerprint objects you provided in a list, so we can use that and give it the objects sorted in a way in which the more likely user for that time and that location will be the first in the list and so on.

Geocoding Accuracy

I'm brand new to geocoding and have a relatively large dataset of addresses 100,000+. When I geocode them using MapMarker Professional I get about 10% that I'm not able to geocode with a high level of precision (I get mostly S2 precision values back which mean that it was able to match to a Primary Postal Code centroid, centerpoint of the Primary Postal Code boundary). Each of the addresses has already been standardized so they should be valid (I have taken a random sample and run them through the USPS Zip Code Lookup process to verify this). My question is, should I be able to geocode addresses with a higher degree of accuracy than what I'm seeing or am I expecting too much of the products currently on the market? I've tried geocoding using google and yahoo's services without any better luck. All of the services appear to be able to give me the postal code centroid, but none of them appear to have enough information to be able to give me distinct coordinates for houses in at least 98% of the addresses I send to it.
Thanks for any guidance you can provide,
Jeremy
Geocoding is an imprecise process. The addresses you are geocoding that don't have good precision are likely in rural areas, where it is not uncommon for addresses to be off by up to a mile.
They only know where addresses are by taking the number at the start and end of a street segment, and dividing from there.