Hierarchical Deterministic Wallet Derivation Paths
In the very first post of my blog, I will talk about Hierarchical Deterministic (HD, for short) Wallet Derivation Paths. Besides personal usage, I developed applications that control a large set of private keys thanks to derivation paths. Recently, I get more involved with them thanks to OrdinalSafe (opens in a new tab), ordinals-first BTC extension wallet. I decided to write about it as most people use it without noticing or understanding how it works. Let's get started.
Overview
Have you ever wondered how CEXs control thousands of user wallets in the background, or how Metamask populates multiple accounts from a single mnemonic that you own? It's all about HD Derivation Paths.
The basic idea behind the blockchain authorization scheme is based on private-public key pairs. You sign your transaction with your private key, then the blockchain mathematically verifies that the signer of this message is equal to your public key. I won't talk about this in detail to stay on topic. You can find a very detailed explanation here (opens in a new tab).
In order to access your account, you have to own your private key all the time whether it is stored in your wallet or on paper. However, importing your wallet using this private key is not feasible as the private key is a random string that looks like this: d9df0b29b0cf0eeca5087bfc8cc4fc076f5b87c5963a5994a611156108750155
(opens in a new tab)
In order to ease access, we are using mnemonics. Mnemonics are randomly selected (but in some ruleset) words that help you to derive a seed. So, wtf is seed? Seed is also some value that looks like a random string. Why do we need it? During the generation of the seed, we can add an additional password to our mnemonic string, and the seed is generated from these two values. It makes accessing your accounts more secure. Then finally, we can derive our "Extended Private Key". You can think of this as the father of your all private keys. This also derives "Extended Public Key", the father of your all public keys (except hardened keys, which we will talk about soon). We can omit this derivation right now. Yes, in the end, we have 5 derivations.
The generation of mnemonic is not complex, but I will also skip it. Let's move to the details.
Extended Private Key
After getting our seed, we put this seed + another key, which is an arbitrary string like "Bitcoin seed", into HMAC function. The result of this function is 64 bytes of data. The first half of this data is a regular private key. The second half of this data is something called "chain code". It is also some random data.
Congrats, now you have your "Extended Private Key". Father of your all private keys 🤓. This is also called "MasterNode", which refers to this key as a node of our beloved tree.
Extended Public Key
Extended Public Key is not derived directly from Extended Private Key, sadly. It is the corresponding public key of the first 32 bytes of Extended Private Key but appended with "chain code" also.
Our Extended Keys create a tree. This will help you to understand the origins of HD Wallet Derivation Paths
We can derive 2**32
child key from our Extended Key. Half of these keys are "Normal Keys", and another half of the keys are "Hardened Keys".
Normal Keys can be derived from both Extended Private Key and the corresponding Extended Public Key. This means that anyone who knows your Extended Public Key can also know your Normal Public Keys. However, of course, only you can know their private key by generating by Extended Private Key with the same index. This option is not used widely, as I cannot tell any use case for this.
Hardened Keys can only be derived from Extended Private Key. This is the common case as we don't want to associate all of our public keys.
In both cases, we put one data and "chain code" to HMAC function again to derive the child. In Normal Keys, we put Public Key + index
as the first element. In Hardened Keys, we put Private Key + index
as the first element.
After getting the HMAC result, the process is the same. We add our original private key part of Extended Private Key to the result, then get the modulus n
of this (n is the order of the underlying curve). The first 32 bytes of the end result is our private key. Modulus n
helps us stay in the curve.
Child Private Key = first 32 bytes of ((first 32 bytes of Ext. Priv. Key) + HMAC result) % n
Now, the last 32 bytes of this operation is also a "chain code". Which we will be using while deriving new child keys from this child key. Come again? Yes, as chain code + private key
is also an Extended Private Key 🤭
Now, this is time to show the magic ✨
Derivation Paths
Now that I assume you understand the core topic of key derivations from mnemonic to public-private keys. Let's explore how we are deriving other keys.
As you remember, we are using index parameter while deriving. The index is just some number. However, in order to derive the same addresses in different implementations, developers created some standards. The main standards for Bitcoin are BIP44, BIP49, and BIP84.
The notation logic is the same for all of them. The notation is;
m / purpose' / coin_type' / account' / change / address_index
Don't worry about purpose, coin_type etc. They are just numbers that are used to identify the wallet. For example, for purpose'
, BIP44 uses 44'
for Bitcoin, 49'
for Bitcoin Segwit, 84'
for Bitcoin Native Segwit.
m
is the root node. It is the Extended Private Key that we derived from our mnemonic.
/
indicates that this is a child of the previous node. For example, m / 44' / 0'
means that this is a child of m
node. m / 44' / 0' / 0
means that this is a child of m / 44' / 0'
node.
'
indicates that this is a hardened key. For example, m / 44' / 0' / 0
means that this is a hardened key. We already know that hardened keys can only be derived from Extended Private Key. So, this is the reason why we need to know our mnemonic to derive this path.
Generally, personal wallets use the last address_index parts to identify the other wallets. For example, in wallets where you can generate new addresses with the single mnemonic, the wallet uses the last address_index to identify the wallet. It just increases the index by one for each new address. For example, if you have 5 wallets, the derivation paths will be like this:
m / 44' / 0' / 0 / 0
m / 44' / 0' / 0 / 1
m / 44' / 0' / 0 / 2
m / 44' / 0' / 0 / 3
m / 44' / 0' / 0 / 4
I believe this draw will help you to understand the derivation paths better:
Different Standards
I demonstrated the BIP standards. However, there are other standards that are used in different blockchains. For example,
- Ethereum uses
m / 44' / 60' / 0' / 0
for its derivation paths. - Avalanche C-Chain also uses
m / 44' / 60' / 0' / 0
for its derivation paths. - Avalanche X & P-Chain uses
m / 44' / 9000' / 0' / 0
for its derivation paths.
As you can notice, the parameter that is different is the coin_type. This is the number that is used to identify the blockchain.
Conclusion
I hope this article helped you to understand the derivation paths. If you have any questions, feel free to ask me on Twitter.
© Orkun Mahir Kilic.RSS