Discussion on the Principles and Technical Details of the Ordinal Inscription Protocol

In the past two weeks, while researching the BTC ecosystem and various inscription projects, I found that very few articles clearly introduce the principles and technical details: for example, how transactions are initiated during the minting of inscriptions, how the sats in UTXO are tracked, where the inscribed content is placed in the script, and why BRC20 requires two operations during transfers? I realized that without understanding these technical details, it is difficult to grasp the differences between various protocols like BRC20, BRC420, atomicals, stamps, and runes. This article will delve into the basic knowledge of the BTC blockchain and attempt to answer the above questions.

BTC Block Structure#

The essence of blockchain is a multi-user accounting technology, which, in computer science terms, is a distributed database. Records (accounts) for a period of time form a block, which is then expanded in chronological order.

We created a table in Excel to illustrate how blockchain works. An Excel file represents a blockchain, where each individual table represents a block. The blocks are arranged in chronological order from 560331 to the latest 560336. Block 560336 will package the most recent transactions. The main part of the block is the double-entry bookkeeping method most commonly seen in accounting, where one side records the address as debit (inputs from), and the other side records the address as credit (outputs to). The value corresponds to the BTC amount for the respective address. The number of coins in Inputs will be greater than the number of coins in Outputs, and the difference is the transaction fee at the user level, which is also the fee obtained by miners (accountants). The block header will obtain the height of the previous block, the hash of the previous block, the creation time of the current block (timestamp), and a random number. So, as a decentralized accounting technology, who gets to seize the accounting rights for the next block? It relies on this random number and the corresponding hash value. Miners with computing power hash the random number of the current block, and the first miner to obtain a qualifying hash value has the accounting rights for the next block and wins the block reward and transaction fees. Finally, there is the script area, which can be used for some extended applications, such as the script op_return, which can serve as a memo field. It is important to note that in actual blocks, the script area is attached to the input and output information, rather than being a separate area. For example, the script attached to the input is the unlocking script (ScriptSig), which requires a wallet address for private key signature authorization to allow the transfer, while the script attached to the output is the locking script (ScriptPubKey), which sets the unlocking conditions for receiving the BTC (generally, the condition is "only the person with the corresponding private key can spend").

Snip20240129_3

Snip20240129_4

The above two images show the original data structure tables for input and output. At the execution level, the script acts as an accompanying parameter for transaction information, where the unlocking script (ScriptSig) requires private key authorization and is also referred to as "witness data."

Segregated Witness and Taproot#

Although the Bitcoin network has been running for over 10 years without any significant events, there have been multiple instances where transaction costs soared to unfeasible heights. Therefore, Bitcoin developers have been discussing how best to scale the network to handle the growing transaction volume in the future.

In 2017, this debate reached a climax, splitting the Bitcoin development community into two factions: one supporting the implementation of a feature called SegWit via a soft fork, and the other advocating for direct block size expansion, known as the "big block" faction.

We mentioned earlier that the unlocking script requires private key authorization to generate "witness data." So, can this witness data be separated from the block, thereby increasing the number of transactions each block can accommodate? Segregated Witness (SegWit) was officially activated in August 2017. Its implementation method is to divide all transaction data into two parts: one part is the basic transaction information (Transaction Data), and the other part is the signature information (Witness Data), storing the signature information in a new data structure known as "witness," which is transmitted separately from the original transaction.

Snip20240129_5

Technically, the implementation of SegWit means that transactions no longer need to include witness data (which would occupy the 1MB space originally allocated for blocks). Instead, an additional independent space is created at the end of a block for witness data. It supports arbitrary data transfers and has a discounted "block weight," cleverly keeping a large amount of data within Bitcoin's block size limit to avoid the need for a hard fork. Thus, the transaction data size limit for Bitcoin transactions has increased while reducing the transaction fees for signature data. Before the SegWit upgrade, Bitcoin's capacity limit was 1 MB, while after SegWit, although the pure transaction capacity limit remains 1 MB, the size of the segregated witness space has reached 4 MB.

Taproot was implemented in November 2021 and consists of three different Bitcoin Improvement Proposals (BIPs), including: Taproot, Tapscript, and a new digital signature scheme called "Schnorr signatures." Taproot aims to bring numerous benefits to Bitcoin users, such as enhanced transaction privacy and reduced transaction fees. It will also allow Bitcoin to execute more complex transactions, thereby broadening the application scenarios (adding some new opcodes).

These updates are key driving factors for Ordinals NFTs, which store NFT data in the spending script of the Taproot script path (witness data space). This upgrade has made structuring and storing arbitrary witness data much easier, laying the foundation for the "ord" standard. With the relaxation of data requirements, suppose a transaction can fill an entire block with its transaction and witness data -- reaching the 4 MB block size (witness data space) limit -- greatly expanding the types of media that can be placed on-chain.

Some may wonder, since we can place some strings in the script, are there no restrictions on these strings? What if these scripts are executed? If we can place content freely, could there be error codes that reject block creation? This brings us to the OP_FALSE instruction. OP_FALSE (also represented as "0" in Bitcoin scripts) ensures that the execution path in the script language never enters the OP_IF branch and remains unexecuted. It acts as a placeholder or no-operation (No Operation) in the script, similar to "comments" in high-level languages, to ensure that subsequent code is not executed.

OP_FALSE

UTXO Transfer Model#

The above discussion has focused on the basic principles of BTC from the perspective of computer data structures. Now, let's discuss the UTXO model from a financial model perspective.

UTXO stands for Unspent Transaction Outputs, which can be understood as the funds remaining unspent during a transfer. So why does Bitcoin use this concept? This relates to the accounting methods of account transaction models and account balance models.

Having been in a centralized system for too long, we are very accustomed to the account balance model of bookkeeping. When User A transfers 100 yuan to User B, the bank first checks if User A's bank account has 100 yuan. If so, it deducts 100 yuan from User A's account and adds 100 yuan to User B's account, completing the transfer.

However, Bitcoin's accounting algorithm does not have the concept of balance. The distributed ledger on the blockchain only records individual transactions and does not directly record the current balance of an account (recording balances generally requires dedicated server nodes, which would centralize it). Suppose User A currently has a balance of 1000 yuan; if User A transfers 100 yuan to User B, this transfer will be recorded as:

Transaction 1: User A transfers 100 yuan to User B

Transaction 2: User A transfers 900 yuan to User A (UTXO)

Snip20240129_6

Although Transaction 2 is a transaction, functionally it serves the role of account balance, indicating that after completing the 100 yuan transfer, User A's account still has 900 yuan left.

So, the question arises: why create such a UTXO? Because the BTC blockchain can only record transactions and cannot record account balances. Without this UTXO, calculating the balance would require summing all incoming and outgoing transactions for an account, which is very time-consuming and resource-intensive. The emergence of UTXO cleverly avoids the pain point of having to backtrack through all transactions when calculating balances.

UTXO has a characteristic: like coins, it cannot be split. So how do we gather enough input amounts during transactions, and how do we provide change? We can use coins as an analogy (in fact, every time you see the word UTXO, it is better to automatically translate it as "coin").

Xiao Ming transfers 1 Bitcoin to Xiao Gang. The entire process is as follows: Xiao Ming needs to collect enough inputs. For example, in previous transactions corresponding to Xiao Ming's address, he finds a UTXO with a value of 0.9, which is not enough for 1 Bitcoin. Fortunately, multiple inputs are allowed in a transaction, so Xiao Ming also finds a UTXO with a value of 0.2. Thus, in this transfer transaction, there will be two inputs. At the same time, there will also be two outputs: one pointing to Xiao Gang's address with a value of 1 Bitcoin, and the other pointing to Xiao Ming's own address with a value of 0.1 Bitcoin, which is the change (this example ignores gas).

In other words, Xiao Ming has two coins in his pocket, one worth 0.9 and the other worth 0.2. At this point, if Xiao Ming needs to pay a coin worth 1, he needs to hand both coins to Xiao Gang. After receiving them, Xiao Gang gives Xiao Ming 0.1 as change. Therefore, the essence of this accounting model is to avoid "calculating balances" through the action of "giving change."

The Ordinal Protocol's Ordering System#

The Ordinal protocol can be said to be the source of the current BTC ecosystem explosion, breaking down homogeneous BTC into the smallest unit, sat, and then assigning a serial number to each sat. How is this done?

We know that the total amount of BTC is 21 million coins, and one BTC can be split into 100 million parts (sat), so the smallest unit of BTC is sat. Both BTC and the smallest unit sat are typical homogeneous tokens (FT). We will now try to assign a serial number (ordinal) to these sats.

Earlier, when discussing the block data structure, we mentioned that transaction information needs to specify the input address and amount, as well as the output address and amount. Each block contains two parts of transactions: BTC block rewards and transaction fees. Fee transactions must have inputs and outputs, but block rewards, being BTC generated out of thin air, have no input address, so the "input from" field is blank, also known as "coinbase transaction." The total amount of 21 million BTC comes from this coinbase transaction, which is also the first in the transaction list of all blocks.

The Ordinal protocol stipulates the following:

Numbering: Each sat is numbered in the order they are mined.
Transfer: According to the first-in-first-out rule, from transaction inputs to outputs.

The first rule is relatively simple; it determines that numbering can only be generated from the coinbase transaction in the mining rewards. For example, if the mining reward for the first block is 50 BTC, then the first block will allocate sats in the range of [0;1;2;...;4,999,999,999]; if the second block also has a mining reward of 50 BTC, then the second block will allocate sats in the range of [5,000,000,000;5,000,000,001;...;9,999,999,999].

Snip20240129_7

The more difficult part to understand is that since UTXO actually contains many sats, each sat in this UTXO looks the same. How do we sort them? This is actually determined by the second rule. Let’s take a simple example:

Assuming the smallest divisible unit of BTC is 1, a total of 10 blocks are produced, with each block's mining reward being 10 BTC, resulting in a total of 100 BTC. We can directly assign a serial number (0-99) to these 100 BTC. If there are no transactions, we only know that the 10 BTC of the first block are numbered (0-9), the 10 BTC of the second block are numbered (10-19), and so on until the 10 BTC of the tenth block are numbered (90-99). Since there are no expenditures, there are no outputs, so we can only assign a range of numbers to every 10 BTC.

Suppose in the second block, two expenditures (outputs) are added: one is 3 BTC, and the other is the "change" of 7 BTC, corresponding to transferring 3 BTC to someone else and giving 7 BTC back to oneself. At this point, in the transaction list of the block, suppose the 7 BTC for change ranks first (corresponding to numbers 10-16), and the 3 BTC for others ranks second (corresponding to numbers 17-19). This confirms the ordered set of sats contained in a certain UTXO through the transfer of outputs.

Note that each sat is not a UTXO! Since UTXO is the smallest indivisible transaction unit, sats can only exist within UTXO, and UTXO contains a certain range of sats, and new outputs can only be generated by spending a certain UTXO.

As for how to express this "numbering," Ordinals supports various forms, such as the aforementioned "integer method," as well as decimal fractional method, degree method, percentage method, and pure letter naming method.

Snip20240129_8

Once sats have a unified serial number, we can consider inscriptions. As mentioned earlier, we can upload any type of file in the 4 MB space of the witness data area, whether text, images, or videos. After uploading, the file will automatically be converted to hexadecimal and stored in the Taproot script area. Thus, one UTXO corresponds to one Taproot script area, and this one UTXO will simultaneously contain many sats (the whole is a collection of sat sequences, with a restriction that the amount of Bitcoin in a single UTXO cannot be less than 546 sats to prevent dust attacks). The Ordinal protocol specifies for convenience that "the first sat number of this sequence collection represents the binding relationship" (the original wording from the white paper is the number of the first sat in the first output), for example, a UTXO containing sats numbered (17-19) will directly use 17 to represent this collection and the inscribed content.

Minting and Transferring Ordinal Assets#

Ordinal NFTs clearly involve uploading various files to the script in the segregated witness area and binding them to a sat sequence collection, thereby issuing NFT assets on the BTC chain. However, there is another question: since the script in the segregated witness area contains both the unlocking script for inputs and the locking script for outputs, where is the content placed? The correct answer is that it is in both. Here, we must mention the commit-reveal mechanism in blockchain technology.

The Commit-Reveal mechanism in blockchain is a protocol used to ensure fair and transparent handling of information. This mechanism is typically used in scenarios where hidden information (such as votes or bids) needs to be submitted and then revealed at a later time. The Commit-Reveal mechanism consists of two phases: the commit phase and the reveal phase.

Commit Phase: In this phase, users submit their information (such as voting choices or bid prices), but this information is encrypted. Typically, users generate a hash of this information (the encrypted summary of the information) and then send this hash to the blockchain. Due to the properties of hash functions, they can generate a unique output (hash value) that is irreversible for the original information. This means that the original information cannot be inferred from the hash value. This process ensures the confidentiality of the information at the time of submission.
Reveal Phase: At a predetermined later time, users must reveal their original information and prove that it matches the previously submitted hash value. This is typically done by submitting the original information along with any additional data (such as random numbers or "salt") used to generate the hash value. The network then verifies whether the hash value of this original information matches the previously submitted hash value. If they match, the original information is accepted as valid.

As mentioned earlier, the content of the inscription needs to be bound to the sat sequence collection contained in the UTXO. Since UTXO is an output in the block, it must be attached to the locking script of the output. However, Bitcoin full nodes need to maintain and transmit the entire network's UTXO collection locally. Imagine if 10,000 4 MB video files were directly uploaded to the locking scripts of 10,000 UTXOs; all full nodes would require extremely high storage space and ultra-fast internet speed, which could cause the entire chain to collapse. Therefore, the only solution is to place the content in the unlocking script of the input and then let this content "point" to another output.

Thus, the minting of Ordinal assets needs to be divided into two steps (wallets combine these two steps; when constructing transactions, they simultaneously construct the commit-reveal parent-child transaction, giving users the experience of a single step while saving gas fees).

During the minting phase, users first need to upload the hash value of a certain file to the locking script of the UTXO in the commit transaction (transferring from their address A to their address B) because it is a hash value, so it does not occupy too much space in the full node's UTXO database. Next, users construct a new transaction (transferring from their address B to their address A), called the reveal transaction, where the input needs to use the UTXO containing the file hash value from the previous commit transaction, and the unlocking script of this input must contain the original inscribed file. To quote the white paper, "First, in the commit, create a submission to the taproot output containing the inscription content. Second, in the reveal transaction, use the output generated from the commit transaction to display the inscription content on-chain."

In the transfer phase, Ordinal NFTs differ slightly from BRC20. Ordinal NFTs require an overall transfer, simply transferring the NFT bound to a certain UTXO directly to the recipient, similar to a regular BTC transfer. However, BRC20 involves custom amount transfers, which are also divided into two steps: the first step is called the inscription "transaction" (Inscribe "TRANSFER"), and the second step is called the transfer "transaction" (Transfer "TRANSFER"). The first inscription transaction is actually similar to the minting process of an Ordinal NFT, implicitly containing the commit-reveal parent-child transaction pair, while the second transfer transaction is similar to a regular Ordinal NFT transfer, directly transferring the BRC20 asset bound to a certain UTXO to the recipient. Some wallets will construct these three transactions (parent-child-grandchild transactions) simultaneously to save time and gas.

Snip20240130_9

In summary, the commit transaction is used to bind the inscribed content (the hash value of the original content) to the numbered sats (UTXO), while the reveal transaction is used to display the content (the original content). This parent-child transaction pair jointly completes the minting of the NFT.

P2TR and an Example#

The above technical discussion about minting is not yet complete, as some may wonder how the reveal transaction verifies the inscription information in the commit transaction. Why is it necessary to transfer between one's own addresses A and B when constructing the transaction? There was no need to prepare two wallets when inscribing. This brings us to one of the significant upgrades of Taproot, P2TR.

P2TR (Pay-to-Taproot) is a new type of Bitcoin transaction introduced by the Taproot upgrade. P2TR transactions allow users to spend Bitcoin using a single public key or more complex scripts (such as multi-signature wallets or smart contracts), achieving higher privacy and flexibility. This is accomplished through the use of Merkleized Abstract Syntax Trees (MAST) and Schnorr signatures, which enable the efficient encoding of multiple spending conditions within a single transaction.

Creating Spending Conditions
To create a P2TR transaction, users first define a spending condition, such as a single public key or a more complex script that specifies the requirements for spending Bitcoin (e.g., multi-signature wallets or smart contracts).
Generating Taproot Output
Then, users generate a Taproot output that includes a single public key (representing the spending condition). This public key is derived from a combination of the user's public key and the hash of the script using a process called "tweaking." This ensures that the output looks like a standard public key, making it difficult to distinguish from other transactions on the blockchain.
Spending Bitcoin
When users want to spend Bitcoin, they can use their single public key (if the spending condition is met) or reveal the original script and provide the necessary signatures or data to meet the spending condition. This is accomplished using Tapscript, which allows for more efficient and flexible execution of spending conditions.
Verifying Transactions
Miners and nodes then verify the transaction by checking the provided Schnorr signatures and data against the spending conditions. If the conditions are met, the transaction is considered valid, and Bitcoin can be spent.
Enhanced Privacy and Flexibility
Because P2TR transactions only reveal the necessary spending conditions when spending Bitcoin, they maintain a high level of privacy. Additionally, the use of MAST and Schnorr signatures allows for the efficient encoding of multiple spending conditions, enabling more complex and flexible transactions without increasing the overall size of the transaction.

This is how the commit-reveal mechanism is applied in P2TR. Let's illustrate this with a practical example.

Using the blockchain explorer https://www.blockchain.com/, we will examine the minting process of an Ordinal image NFT, including the previous commit-reveal two phases.

First, we see that the hash ID of the commit transaction is (2ddf90ddf7c929c8038888fc2b7591fb999c3ba3c3c7b49d54d01f8db4af585c). It can be noted that this transaction's output does not contain the inscription data (it actually contains the hash value of the 16-megabyte image file), and there is no related inscription information on the webpage. The output address (bc1p4mtc.....) is actually a temporary address generated through the "tweaking" process (representing the public key of the script unlocking condition), sharing a private key with the taproot main address (bc1pg2mp...). The second UTXO in this transaction belongs to the returned "change" operation. Thus, the binding of the inscription content to the sats contained in the first UTXO is achieved.

Snip20240131_12

Next, we check the record of the reveal transaction, whose hash ID is (e7454db518ca3910d2f17f41c7b215d6cba00f29bd186ae77d4fcd7f0ba7c0e1). Here, we can see the information of the Ordinals inscription. The input address of this transaction is the temporary output address generated from the previous transaction (bc1p4mtc.....), and the unlocking script of the input contains the original image's hexadecimal file, while the output of 0.00000546 BTC (546 sats) is sent to the user's taproot main address (bc1pg2mp...). Based on the First in First Out principle and the "binding is the first output's first sat number," although the number of sats contained in the two UTXOs changes, the bound sat number remains unchanged. Therefore, we can find the sat containing this inscription in (sat 1893640468329373).

(https://ordinals.com/sat/1893640468329373)

Snip20240131_13

These two transactions (belonging to the parent-child transaction) are submitted to the memory pool simultaneously by the wallet during minting, so only one gas fee is required, and there is a high probability that they will be recorded and broadcast by miners in the same block (the two transactions in the above example exist simultaneously in block 790468). Miners and nodes then verify the reveal transaction by checking the Schnorr signature provided in the input and the hexadecimal image's hash value against the 16-megabyte image hash value in the output locking script of the commit transaction. If both match, the transaction is considered valid, and the Bitcoin UTXO can be spent. Thus, these two transactions are permanently recorded in the BTC blockchain database, and the NFT image is naturally preserved and displayed. If the two hash values differ, the two transactions will be canceled, and the inscription will fail.

BRC20 Protocol and Indexers#

For the Ordinal protocol, inscribing a piece of text results in a text NFT (corresponding to Loot on Ethereum), inscribing an image results in an image NFT (corresponding to PFP on Ethereum), and inscribing a piece of music results in an audio NFT. But what if we inscribe a piece of code, and this code is for "issuing FT homogeneous tokens"?

BRC20 utilizes the Ordinal protocol to set inscriptions (inscriptions) as JSON data format to deploy, mint, and transfer tokens. The JSON contains code snippets describing various attributes of the token, such as its supply, maximum minting units, and unique code. In the previous article, we discussed that BRC20 tokens are essentially semi-homogeneous tokens (SFT), meaning that in some cases they can be treated as NFT transactions, while in other cases they can be treated as FT transactions. How is this control over "different situations" achieved? The answer is indexers.

An indexer is essentially an accountant that categorizes and records the received information in a database. In the Ordinal protocol, the indexer determines the changes in the ordered sats across different addresses by tracking inputs and outputs. In the BRC-20 protocol, the indexer has an additional function: to record the changes in token balances in different addresses.

Therefore, we can view the existence of different token forms from the perspective of the accountant: BRC20 protocol tokens actually exist in a triple database. The first layer (Layer1) has BTC miners as accountants, with a "chain database" type, producing BTC as FT assets. The second layer (Layer2) has the Ordinal indexer as the accountant, with a "relational database" type, producing numbered sats as NFT assets. The third layer (Layer3) has the BRC20 indexer as the accountant, with a "relational database" type, producing BRC20 assets as FT assets. When we count BRC20 as "pieces," the perspective is from the ordinal indexer (recorded by that indexer), so it is naturally NFT; when we think of BRC20 as "individuals" (especially after depositing into centralized exchanges), the perspective is from the BRC20 indexer (recorded by that indexer or the centralized exchange's server), so it is naturally FT. Thus, we can conclude that the existence of semi-homogeneous tokens (SFT) is due to the different levels of accountants.

Isn't blockchain just a distributed database? That’s why there is a group of miners as accountants to jointly maintain this "chain database" (because only a chain database can achieve true decentralization). But after all this, we still return to the old path of centralized "relational databases." This is also the essence of why the initiators of the Ordinal protocol, the initiators of the BRC20 protocol, and the Unisat wallet have been in heated discussions about whether to upgrade the indexer -- differing opinions among accountants.

However, after more than a decade of industry development, a lot of "decentralized" experience has been accumulated. Can indexers replace relational databases with "chain databases"? Can fraud proofs or ZKP be used to ensure security and decentralization? Will the DA demand of the Bitcoin ecosystem overflow into other DAs, promoting the prosperity and integration of multi-chain ecosystems? I seem to see more possibilities.

This article is authored by @hicaptainz
Follow the author, and you won't get lost in web3.

References

https://www.aixinzhijie.com/books/261/master_bitcoin/_book/

https://learnblockchain.cn/article/5717

https://zhuanlan.zhihu.com/p/361854961

https://www.odaily.news/post/5187233

https://learnblockchain.cn/article/5376

https://www.panewslab.com/zh/articledetails/1301r1ibp79c.html

https://docs.ordinals.com/inscriptions.html

https://thebitcoinmanual.com/articles/pay-to-taproot-p2tr/