Intro
Kolme is a Rust-powered blockchain framework that lets you build fast, secure apps with no delays or limits—run your own chain, connect anywhere, and ship products in record time.
Why Kolme?
Kolme lets you build blockchain apps that are fast, secure, and free from the usual limits. Your app gets its own chain, runs at top speed, and connects anywhere, with built-in safeguards you can trust. It’s all about getting stuff done without the wait or hassle.
What Makes Kolme Different
Most blockchain apps split their work across slow smart contracts and clunky external servers, forcing delays, limits, and extra steps. Kolme gives your app its own chain, running everything in one fast, flexible Rust program.
Unlike smart contract, there's no need to fit your execution into a given CPU, storage, or gas limit. You can also securely load external data while processing, bypassing the need for oracles.
And by limiting the chain data to just your own application's operations, offchain processing becomes simpler and faster. And with no external node requirements to access the Kolme state, reliability issues from failing nodes and request throttling are no longer concerns.
How Kolme Stays Secure
Kolme splits trust across three groups:
- Listeners watch for real events, such as fund transfers on external blockchains
- The processor produces blocks, and by running as a single node with no fixed block time, can run as fast as a centralized server
- Approvers validate the operations of the processor and sign off on fund transfers before they occur.
The system gains speed by centralizing its processing, while maintaining security with checks and balances from the listeners and approvers. Unlike fully centralized servers, all operations on Kolme remain transparent and auditable.
Engineer overview
Kolme is a Rust library for building blockchain apps with their own chains, driven by three core pieces—listeners, a processor, and approvers—working together to break free from smart contract limits. You write full Rust code, use any server-side tools you want—like databases or whatever else—and run fast with no delays, all while keeping it secure and connected.
Core chain
The chain itself is made up of a series of blocks. Unlike other blockchains, a Kolme chain's blocks:
- Have no specific timing. The processor can produce a new block at any time.
- Always contain exactly one transaction.
The core of the Kolme codebase revolves around managing this chain. The core is made up of only one mutating function: adding a new block to the chain. Each block must be signed with the processor's signing key.
Beyond that, the core layer of Kolme provides a series of data lookups, the ability to subscribe to notifications (such as new blocks), and the ability to simulate running a transaction.
Components
On top of this core, Kolme adds in components. Components leverage the core logic and provide additional functionality. Some components are integral parts of the system, such as the processor, listener, and approver components. Other components, like the gossip component (for peer to peer communication), are needed by most systems, but are technically optional.
In addition, individual applications can write their own components to handle custom needs. For example, indexers, API servers, bots, cron jobs, and more, can all be implemented as custom components within a Kolme application.
The upside of this is that you can easily extend your application without needing to split logic into multiple code bases, and without the need for error-prone network communications with an external blockchain node.
External Data Handling
Kolme lets you load external data straight into your app. Each data load is recorded as part of the blockchain, allowing other nodes to validate the data itself, as well as rerun the transaction to confirm the processor has produced a valid state. This allows the processor code to run quickly and pull in the data it needs, without compromising security or relying on centralization of trust.
High Availability Processor
Kolme’s processor runs in a high availability cluster for production apps. It uses leader election to allow multiple processor nodes to remain running, while ensuring only one of them produces blocks at a time. This provides a hot standby capability to ensure your application isn't taken down by a single machine failure.
Network Syncing
Kolme ties it all together with a gossip network built on libp2p for discovering other nodes. Components use it to receive events like new blocks, query missing blocks if they’re behind, and share transactions with the processor, keeping everything in sync, fast, and reliable.
Innovations
- Triadic Trust Model
- Instant Block Production with External Data
- Component-Based Chain Architecture
- High Availability Processor Cluster
Triadic Trust Model
A system where three distinct groups—listeners, processor, and approvers—handle event detection, execution, and fund approval with customizable quorums, balancing speed and security in a single app chain. Kolme provides a novel mechanism for providing security and transparency without slowing down processing.
Instant Block Production with External Data
The processor’s ability to generate blocks on-demand, pulling in external data (like price feeds) without oracle delays, and bundling it for verification.
Component-Based Chain Architecture
A modular Rust framework where a core chain library pairs with plug-and-play components (listeners, API servers, etc.), all sharing state and notifications.
High Availability Processor Cluster
Running the processor in an HA setup with leader election and hot standbys to keep block production uninterrupted.
FP Block's Goals
Kolme is a framework produced by the FP Block team. After years of building standard dapps on a variety of blockchains, we ended up running into the same issues time and again:
- Slow block times resulting in degraded user experience
- Awkward splits between smart contracts and backend services complicating development
- Unreliable blockchain nodes leading to system outages
- The difficulty of creating a secure environment when external data is needed by the system (oracle challenges)
- Multichain support
Our goal with Kolme is to provide a high quality, reusable, open source, and secure framework for building a new generation of dapps.
Monetization
As an engineering consulting firm, FP Block's primary monetization strategy is to use Kolme as an application accelerator. We believe we can build more secure, high quality products in less time. In addition, if there's wider demand, we intend to offer managed hosting for Kolme applications, leveraging our network of partners to provide decentralized security for product companies.
Why Kolme pitch
This is an anonymized pitch written for a customer explaining why Kolme is a good fit for their Solana-based application. It's intended to help the Kolme team understand how to represent Kolme in discussions.
Short Hook (30-45 seconds)
Picture an app as fast as your favorite website, as secure as a blockchain, and works seamlessly with Solana wallets. That’s YourApp, powered by our Kolme framework on Solana. It delivers instant transactions, ironclad security, and lower costs—unlocking a new era of decentralized apps. Ready to join us in building the future of YourIndustry?
Detailed Pitch (2-3 minutes)
YourApp is a blockchain application that harnesses Solana’s speed and scalability alongside the Kolme framework—a game-changer for decentralized applications (dapps). Most dapps today are hybrid, splitting logic between on-chain and off-chain systems, which often compromises security. Kolme flips this script: by hosting all application logic in a dedicated sidechain, YourApp ensures no security trade-offs while preserving the flexibility of hybrid designs. Paired with Solana’s high-throughput ecosystem, it creates a platform that’s fast, secure, and ready for validators, partners, and ecosystem participants to shape.
YourApp’s application-specific sidechain under Kolme delivers powerful advantages:
- Independent History: YourApp’s transactions form a distinct blockchain, separate from other apps. This simplifies analysis and empowers partners to build tailored tools and integrations with ease.
- Blazing Speed: Kolme produces blocks at its own pace, sidestepping chain congestion and delays. While traditional blockchains wait hundreds of milliseconds—or even seconds—for blocks, Kolme slashes this to a few milliseconds, matching raw computation speed for near-instant processing.
- Secure Off-Chain Data: Kolme’s native support for verified off-chain data brings real-time data from trusted third-party sources directly into the blockchain history. Unlike on-chain oracles, this approach blocks timing attacks and removes cumbersome multi-step transaction submissions, keeping the user experience seamless and secure.
- Robust Logic: By running all logic in the sidechain, YourApp leverages server-side Rust for complex, reusable code. This eliminates blockchain computation limits, enabling richer features for partners and developers to extend the platform.
- Cost Efficiency: Kolme’s sidechain reduces validator overhead. Our listeners and approvers—specialized validator roles—operate on lightweight hardware, cutting costs and boosting profitability for ecosystem participants.
- User-Friendly Transactions: In-app transactions use app-specific private keys, sparing users repetitive wallet approvals while safeguarding their Solana funds.
Security is non-negotiable. The Kolme Bridge Contract, a Solana smart contract, custodies funds with a triadic security model, requiring a quorum of approvers and the central processor node to authorize transfers. This multiparty, multisignature system ensures lightning-fast operations with ironclad trust, delivering a native Solana experience via users’ preferred wallets.
YourApp on Kolme + Solana redefines dapps with uncompromised security, unmatched speed, and boundless potential. Validators gain a low-cost, high-reward platform. Partners find a flexible foundation to innovate. Ecosystem participants can shape a faster, more secure blockchain future. Join us to make YourApp the benchmark for decentralized operations—as a validator, developer, or advocate. Let’s build the future together.
Failed transactions
There are two different categories of "failed transactions" within Kolme: transactions submitted within Kolme that are rejected by the processor, and transactions that fail on an external chain. We'll cover each of these categories separately.
Kolme chain transaction failures
The process for submitting a transaction for inclusion in a block is:
- Broadcast the proposed, signed transaction to any node in the network
- Node uses the gossip component to share that proposed transaction with other nodes
- One of the processor nodes picks up the transaction
- The processor attempts to execute the transaction
- If the execution is successful
- The processor produces a new block
- The processor gossips that block to all nodes in the network
- The nodes are able to observe that the transaction has been added and remove it from their mempools
If, however, the execution is unsuccessful, what do we do? Many blockchains will include the transaction as a failed transaction within a block. We could elect to do that as well in Kolme. However, doing so would unnecessarily bloat the size of our chain, something we're trying to avoid.
Instead, we simply drop the transaction, together with a gossiped notification indicating that the transaction failed. At that point, the processor's job is done.
For security and censorship protection, other nodes in the network should confirm that the attempt to run that transaction fails. The purpose of this is to detect if the processor is unfairly rejecting transactions.
Current plan: This will ultimately be added as part of the watchdog component.
External chain failures
External chain failures cover any kind of failure case with the bridge contracts on external chains.
User deposits
User deposits can fail due to insufficient gas, insufficient user funds, or something else. In all such cases, either no transaction is generated, or a failed transaction is generated. In any event, these transactions should be completely ignored by listeners. They do not generate a new bridge event ID and are not relayed to the Kolme chain.
Invalid funds deposited
A variation on the above is when a user deposits unsupported funds in a bridge contract. In the current design, those funds will simply be lost within the contract. This may sound surprising, but falls in line with standard behavior for unsupported transfers into a contract (e.g., using a MsgBank
to transfer funds into a contract in Cosmos does not trigger execute messages).
We have potential alternatives to consider, such as:
- Keeping a list of permitted received coins/tokens, and rejecting transactions without them.
- Providing an administrative "send untracked tokens" feature for recovery of funds.
We'll wait for sufficient demand before implementing such a change.
Failed actions/withdrawals
Kolme blocks can emit actions to be run on external chains. These actions have the potential to fail. Kolme provides a mechanism for one common failure mode (insufficient funds), and a back door for fixing any other failing action.
One thing to note in particular is that actions must be executed in ascending order. That means that if action 56 fails, no other transactions will be able to proceed. Therefore, unblocking a broken action is a requirement for correct chain operation.
Insufficient funds
It's possible for a Kolme application to generate a fund transfer message that cannot be supported on chain. A simple example would be a multichain application supporting USDC. A client can legitimately deposit 1,000 USDC on chain A, then issue a withdrawal request for chain B, essentially turning Kolme into a token bridge.
We need to hold off on issuing an action until there are sufficient funds on chain B. This may require generating an external bridge transaction, for instance, that will move USDC from one chain to another (probably using the cross-chain transfer protocol).
To allow for this, we have the following model (note not yet implemented!):
- For each chain, we maintain an internal accounting balance of tokens held by the bridge contract. This can be calculated by summing deposits and withdrawals.
- Additionally, we provide listeners with a special message type to synchronize balances. This can be used to account for a token transfer initiated outside the normal deposit flow.
- When a transaction generates a fund transfer action, we check if there are sufficient funds to transfer. If so, we immediately emit the action. Otherwise, we add it to a FIFO queue of pending transfers per chain.
- Each time the balance of funds on a chain changes, we check the pending transfers queue and emit as many actions as we can currently support.
Hard override
Ideally no other actions should ever fail, assuming all code is written correctly. Given that such an assumption is guaranteed to be proven false, we include a hard override mechanism. This is a manual workaround for a stalled action.
Any approver may issue a message to either skip a bridge action, or replace a bridge action with a new action. The other approvers must confirm this message, with a final confirmation by the processor. Once the processor confirms the change, the new action is emitted by the chain, and the submitter components will attempt to broadcast the new transaction (or skip it).
Note that this is a fully manual process, intended to be used in exceptional circumstances.
Hard fork/reverted transaction
If a blockchain has a hard fork or reverts a transaction we have already observed, we risk solvency of the system. For all bridge events added to Kolme that no longer exist on the destination chain (note not yet implemented):
- A listener can send a message (manually) requesting that an event be reverted.
- Once a quorum of listeners vote in agreement, the processor must confirm and add a new block to the chain.
- That new block will roll back the next expected event ID to the previous one.
- And, if possible, funds remaining in the account will be burned to revert the transaction. Unfortunately, if the funds have already been used, this will be impossible, and may introduce an insolvency issue.
The best defense against this is to ensure that listeners wait for sufficient confirmations on transactions before submitting them to Kolme.
High availability
Kolme is designed to make high availability deployments of services possible, natural, and easy. The Kolme framework is built around horizontal scaling of nodes within a logical group to provide for high availability. Special care must be taken during version upgrades, which are handled separately.
Normal operations
Under normal, non-upgrade conditions, there are four different sets of Kolme nodes that may be running:
- Processor
- Listener
- Approver
- App-specific services (indexers, API servers, etc.)
While there are many ways to run each of these in a high-availability setup, let's review the standard Kolme recommendations.
Listeners and approvers
These are the easiest to run in a high availability setup. Downtime for a listener or approver does not result in any downtime for the application itself. Instead, if sufficient nodes are down from either set to break the quorum, downtime will result in delays in fund transfers (listeners will block deposits, approvers will block withdrawals).
There are two ways to approach this:
Single copy
Don't worry about downtime, and just run one copy per signing key. This relies on the inherent protection provided by a multisignature set. Further, even if multiple nodes go down, the result is simply a delay, not an outage.
One thing to keep in mind with this approach is the quorum rules for a set. If you have a 2 of 2 listener set, for example, downtime from either node would result in blocked deposits. In essence, you have a disaster management OR relationship: if either node A or node B goes down, the entire set cannot proceed.
On the other hand, with a 3 of 5 listener set, you would need to have 3 nodes all fail at the same time before deposits become blocked.
Replicas
Alternatively, if you want to provide even stronger uptime guarantees, you can run multiple replicas. This has the downside of requiring more hardware, and will result in redundant external blockchain queries (for listeners) and redundant Kolme transactions being generated. However, both of these are ultimately hardware problems: extra queries to an external blockchain requires more hardware, and the redundant Kolme transactions will simply be rejected.
Processors
Processors are the trickiest component from a high availability standpoint. Unlike approvers and listeners, we cannot simply let individual processors work in parallel. They'll produce conflicting blocks and could lead to a forked chain. While some blockchains--like Bitcoin--explicitly support chain forking and reconciliation, Kolme does not. Our performance model is based around irrefutable block production.
On the other hand, running just a single processor node isn't an option either. Unlike listeners and approvers, any downtime on the processor front results in the inability to perform any operations on the Kolme chain.
Our recommended high availability deployment, therefore, is:
- Run a cluster of 3 processor nodes in different availability zones (or equivalent for your hosting platform).
- Use the PostgreSQL data store. This provides a construction lock, which the processor will use to ensure it is the only nodes attempting to produce a block at a given time. Furthermore, it will use table uniqueness rules to ensure only one version of a block is saved to durable storage.
Other services
The HA story for other services depends greatly on what they're doing. For example:
- An API server may simply be able to run multiple replicas of the service behind a load balancer. Without any data management, just ephemeral queries against the most recent block, the service would be trivially scalable.
- An indexer may want to take a batch approach: have one process responsible for filling up a SQL database, and a high availability cluster of machines serving data from that shared database.
Startup time
When a node first launches, it will need to catch up to the latest state of the block. This will significantly impact startup time, which is especially important for high availability, ephemeral services. We have a few methods to speed this up:
- Use persistent volumes with the Fjall data store. When a service restarts, it will be able to continue from its most recently stored block.
- Use fully ephemeral storage and synchronize from other nodes in the network at launch. Fast sync can speed this up by trusting the network's most recent app state.
- While not currently supported, we may choose to add a data storage backend that stores the full chain state in shared storage. See discussions in the storage page.
Version upgrades
Version upgrades represent a problem from Kolme, since the app logic is intrinsically linked with the version of the code. This is different from other smart contract chains, where the app logic lives in a binary blob--the contract code--which is separate from the underlying chain.
Our goal for version upgrades is to be explicit and provide a zero-downtime upgrade process. Each release of the code has a version string associated with it. Let's consider a case of a chain running version 0.1.0, and wanting to upgrade to 0.1.1.
The first problem is reproducibility of the chain state. Most likely, there will be subtle (or not-so-subtle) differences in the state produced by old and new versions of the code. We need to ensure old versions of the code are never used for processing new blocks, and new versions of the code are never used for processing old blocks.
Let's step through the process.
- App team releases the first version of the chain, running version 0.1.0. This ships with a source code release at commit A. That commit is audited. Processors, listeners, and approvers build binaries (or get checksummed binaries that have been audited) and begin running the chain. The genesis block includes the information "running version 0.1.0."
- The app team continues working on new features and passes the code to auditors for review. After changes, we end up with a final commit B, defining code version 0.1.1.
- The app team creates a new commit based on the original commit A. This commit just includes a new config setting for "next code version is called 0.1.1 and is ready." Call this commit A1.
- The processor, approver, and listener teams deploy binaries built from commit B. These binaries will launch and observe the network, but will not do anything yet.
- Once all teams report the new clusters are ready, deploy code version A1 to the current running clusters.
- Upon launch, the new processor will produce a special transaction, a "propose upgrade" transaction. This will specify that the processor is ready to upgrade to version 0.1.1.
- Upon seeing that transaction, listeners and approvers will send their own "confirm upgrade" transactions, providing signatures that they are ready to upgrade and in agreement with the upgrade.
- When a quorum of listeners and approvers confirm the upgrade, the processor will produce a final "upgrade initiated." From that moment, none of the nodes running code version 0.1.0 will perform any actions, and the nodes running code version 0.1.1 (from commit B) will immediately take over.
- After confirming that the upgrade has gone through correctly, the old clusters can be spun down.
There are tweaks that could potentially be made to the plan above. For example, instead of running two different clusters, we could have a single cluster with a rollout strategy, and set up the new code versions to crash on the old code. As we develop out the DevOps scripts we use, we will update docs for our most up-to-date recommendations.
One final question here is how do the new nodes synchronize state with the old chain version. Firstly, we will need a guarantee that the data storage is backwards compatible. It must be possible for the new version of the code to load old state versions.
We'll also need fast sync. This is the ability to download complete state data from other nodes in the chain. This is how old and new nodes will be able to transfer data to each other. This introduces some trust assumptions, in particular trusting that the processor's most recently published block is valid. We could also use some shared storage of blocks between old and new versions of the nodes so that the only trust is between your old and new processes. And finally, we could have special "trust this block" messages sent over the network to facilitate fast sync without centralized trust.
We'll flesh out the sync decisions as we implement the mechanism.
Version checking
A cornerstone of the security and transparency of Kolme is reproducibility in block production. This means that, given the same input chain state and the same block information (transaction and data loads), Kolme must guarantee byte-identical results for the various states it produces. This is the most important reason for including framework and app state hashes in the serialized blocks.
New versions of an application will almost always introduce changes in this binary output, either through changes in app logic, updates to underlying libraries, or modifications to the data storage format. To handle this in a reproducible, transparent manner, we explicit track version upgrades within the chain.
Versions within Kolme
Kolme uses a simple string to represent versions. The motivation for this is that we only care about versions matching or not matching, not whether a version is older or newer.
Versions appear in four different places:
- When creating a
Kolme
value, we specify the current version of the codebase. We call this thecode_version
. - The framework state tracks the current version of the code used by the chain. This is the
chain_version
. - The genesis information for a chain contains the initial value for the
chain_version
. - The upgrade procedure (described below) includes admin messages that set the version, which will eventually be used to update the framework state's
chain_version
.
All operations that work on processing blocks (e.g., executing transactions, producing new blocks) check if the current chain version matches the running code version. If not, execution is blocked, since this may result in incorrect state representations. This necessitates the usage of fast sync, as described below.
The upgrade process
Upgrading is handled as an admin message, where the validator set must propose and vote on a migration to a new code version. This follows the same voting procedure as used for other admin messages like validator set changes, namely that once 2 out of 3 groups within the validator set agree, the change goes through.
Let's consider a situation where we have a chain that started at version v1
and is trying to upgrade to v2
. The process works as follows:
- One of the validators (e.g., one of the listeners) proposes an upgrade to
v2
. This updates the framework state's proposals data structure. This is generally handled by the upgrader component, described below. - Other validators detect and vote in favor of this proposal until a quorum is reached. (This is also handled by the upgrader component.) Once the quorum is reached, the processor running the
v1
code will produce one final block that switches the framework state'scode_version
tov2
. At this point, thev1
processor no longer produces any more blocks, since it's running the wrong code version. - Nodes on the network running
v2
are unable to execute any blocks that have occurred so far, since they are all using chain versionv1
. Instead, these nodes must use fast sync to transfer the entirety of the framework and app state directly to those nodes. - Once validators transfer the framework and app state, they resume chain operation as usual.
Upgrader component
The upgrader component is a recommended component for all validators to run. It handles the logic of the upgrade procedure above, namely:
- Check if the desired and actual chain version differ
- Check if an existing proposal exists to move to the desired chain version
- Voting on the existing proposal if it exists
- Creating the proposal if it doesn't
Applications should accept runtime parameters (e.g., environment variables or command line arguments) to indicate if a version upgrade is desired. Note that the upgrader component must be running on the old version of the code, e.g. the v1
processors, listeners, and approvers from the example above.
Once the upgrade process is complete, the old nodes should be taken down, as they will simply drain network bandwidth by performing state transfers.
Ensuring high availability
To keep high availability, we recommend the following deployment strategy:
- Publish a new version of the executable with the new code version.
- Launch a parallel set of all validator nodes running this new code version (
v2
). - Modify the existing validator nodes to begin running the upgrader component, setting the desired version to
v2
. - Wait for the chain to upgrade to
v2
(should be very fast once the validators are reconfigured). - Shut down the old
v1
validators.
Node sync
Nodes within a Kolme chain communicate with each other using the Kolme gossip component. See that document for details of peer discovery and communication.
This page discusses how nodes synchronize with each other. This is the process of finding the latest block and populating the node's storage with the blocks and state hashes necessary to run the Kolme core.
There are three sync approaches. (At time of writing, only the third, slowest, is implemented.)
State transfer (fast sync)
With fast sync, a node requests both the most recent block itself, as well as all Merkle hash blobs necessary to load up the framework and app state for that block. This has the advantage of being not only fast, but being able to fast forward over an arbitrary number of blocks. It has two downsides:
- Bandwidth usage may be high, since it needs to transfer the entirety of the state, which could be large. And while the MerkleMap approach to data storage provides for lots of data sharing, that data sharing also increases the storage size for a single copy of the data.
- It requires full trust in the processor's version of the state.
Just blocks (medium sync)
In this approach, we transfer just the block data itself over the network. Each node is then responsible to execute the blocks one by one and populate their own data store, validating that the state hashes are identical to the claims of the block at each step.
This may be lighter on bandwidth usage, since it doesn't require transferring the full state. However, it does require transferring all missing blocks, which may be significant.
Side note: we have plans to provide for optimized, bulk downloads of large numbers of blocks. We may do the same thing for fast sync with occasional compressed state files being made available. This also improves on the trust model, by verifying the results of each block execution.
Validated blocks (slow sync)
In addition to the steps taken in "just blocks," we also validate all data loads performed during execution. The impact of this varies by application. For an app that uses cryptographic data loads, verification may require no network traffic and be trivial to perform. For verification that requires an HTTP request for each data load, it could end up being significant.
At time of writing, this is the only sync mechanism implemented.
Key rotation
- Motivation
- Initiating key rotation
- Executing the change
- Transition period
- Force-replacing a processor
Motivation
There are three groups of specially recognized public keys within Kolme: the processor node, the listener set, and the approver set. Each set has its own quorum rules, requiring a certain number of members from the set to perform their operations. Since the goal of the processor is to allow fast, centralized block production, the processor has only one key and operates autonomously.
Key rotation recognizes the fact that, at some point in the future, these keys may need to be replaced. Our design must handle these cases:
- Normal key rotation for security or hardware migration for a single operator (processor, listener, or approver). This should not require any assistance from other operators.
- A non-responsive or misbehaving operator needs to be replaced. Misbehaving can either mean:
- The original operator is issuing incorrect data (e.g., a listener reporting on fund transfers that never happened).
- A key has been compromised and is now being abused by a third party attacker.
In any of these cases, we need to both initiate a key rotation, and then execute a key rotation.
Initiating key rotation
The use cases above can roughly be divided into "key replaces itself" and "others replace key." The former is a normal maintenance operation for network maintenance and does not require additional approval. The latter is a response to a security threat to the network, and requires quorum to initiate. Kolme provides two different routes for initiating key rotation.
Self replacement
In the self replacement case, we have a single message that says "replace me as the processor, listener, or approver." The message fails if the signing key is not currently a member of one of those sets. If the same key is used in multiple sets, each set would require a separate message to initiate the change.
No further action is needed to initiate key rotation. As this point, the chain can proceed with the execute case.
Change the set
Instead of replacing a single key, this message initiates a complete replacement of the current set of keys. The new set may contain any set of keys, including keys used in previous sets. This message includes:
- Processor key
- Listener keys and quorum requirement
- Approver keys and quorum requirement
Any member of the processor, listener, or approvers sets may propose a set change. Each set change gets its own unique ID (potentially the block height it was issued at), allowing multiple proposals to exist simultaneously to avoid a misbehaving set member from disrupting the voting process.
Any members of the current set can submit a message voting for the change. (Question: should we also support voting against?) Voting requires 2 out of 3 of the processor, listener, and approver sets to vote in favor of the change. For the listener and approver sets, a normal quorum is needed for the group to vote in favor of the change.
Once a change proposal receives enough votes, it is approved and can move on to execution. At that point, all previous proposals are canceled.
Executing the change
Each accepted change is stored within the FrameworkState
, in a MerkleMap
with monotonically increasing keys. This sequence of changes includes the full signature history. The motivation of this is that, by just observing this history, you can prove the current set of keys. This allows for a secure fast sync, requiring only trust in the original set of signers.
Immediately upon executing the change, the FrameworkState
is also updated with the modified key set. All listener and approver actions will require a quorum from the new set of keys. If the processor key changed, the next block will be signed by that new key.
Each block that executes a key set change will also emit an external chain action to be performed. This will update the contract with the new set of keys. Note that this necessitates that all bridge contracts track not just the processor and approver keys for normal execution. It also means the contracts will need to be aware of listeners to validate a "change the set" action, which may rely upon listener votes to execute.
Transition period
The basic idea of this key rotation flow is:
- Perform actions on the Kolme chain
- Wait for finality on the Kolme chain (immediate for self-replace, or wait for sufficient approvals for changing the set)
- Kolme chain begins using the new set of validators
- New set of processor and approvers generate bridge actions to update bridge contracts
- Submitter takes the newly signed action and submits to the bridge contracts
The tricky bit here is the fact that the validators that signed off in steps (1) and (2) are the old validators, while the bridge action itself will be signed by the new validators. Therefore, the bridge contracts need to have special logic to handle this transition period, namely:
- Confirm that there are sufficient signatures from the old validators on the change itself
- Confirm that the new set of validators have signed the bridge action message itself
This leads to some code duplication: we need to reimplement the quorum rule checks in both the smart contracts and Kolme itself. However, this is an unavoidable duplication.
Force-replacing a processor
If the old processor is no longer behaving correctly, it won't be able to produce blocks to allow itself to be replaced. Not yet implemented, but https://github.com/fpco/kolme/issues/207 will cover that case. A theoretical approach:
- Add a new special "replace the processor" message
- It requires signatures from listeners and approvers be included in that one message
- It has the special behavior that, unlike every other message in the system, it changes the expected processor immediately, allowing that block to be produced by the new processor instead of the old one.
Timestamp verification
Every transaction submitted to the Kolme network includes a timestamp of when--on the client machine--the transaction was generated. Similarly, every block contains a timestamp of the processor's machine time when producing the block.
Timestamps are generally considered "best effort" in Kolme, and shouldn't overall be used for any security-sensitive topics. Kolme does not take significant efforts to avoid clock skew among network participants. That said, we do implement the following basic verification checks:
- No nodes accept timestamps from the future. To account for clock skew, we define "future" as "more than 2 minutes ahead of the current machine clock time." Accepting here means accepting a transaction into a mempool, or accepting a block from the processor (checking the timestamps on both the block and its transaction).
- Block time must be monotonically increasing. Each subsequent block must take place after the preceding block.
- The difference between a block's timestamp and its transaction's timestamp must always be, at most, 2 minutes. This also means that all transactions in the mempool can be flushed after 2 minutes, and should be resubmitted.
NOTE these checks have not been implemented at time of writing. This is currently a design document outlining plans.
Relying on timestamp
While timestamps are in general best effort, if an application needs to rely on a timestamp it can do so. This follows the general trust model of Kolme: trusting the processor with significant autonomy. Consider, however, that a processor has the option of significantly changing timestamps, so that timestamps used for things like calculating payouts may be manipulated. This is probably fine for things like interest charges, where two minutes won't make much difference. If, however, this is used for something where a minute of difference is significant, it opens an attack vector for a processor to abuse timestamps.
Watchdogs
We haven't built out the watchdog component yet. Its idea would be to observe the network and, if it observes potentially abusive operations, raise an alert. For timestamp verification, one possibility would be to track the transaction timestamps in the blockchain and, if they regularly arrive out of order, consider the possibility that the processor is reordering transactions.
Note that even this idea isn't fullproof, since it could simply be a consequence of clock skew among clients. More real world testing would be needed to ascertain if this is a good idea or not.
Multisig accounts
Kolme supports two different account types: regular accounts controlled by external wallets and public keys, and multisig accounts. (NOTE at time of writing, multisig accounts have not yet been implemented.) Multisig accounts allow for a quorum of users to control actions from an account, a common desire for more secure management. The basic workflow of multisig accounts is:
- Create the account. This is done by using a normal account to send a "create multisig account" transaction. Any account can perform this action, and the new account will be created with the given set of keys and quorum rules, with no connection to the original account.
- Any member of the multisig set can propose a list of messages to be run by the multisig account. These messages will be assigned a multisig proposal ID and await voting.
- Other members can vote yes or no. If a quorum of yes can no longer be made, the proposal is removed from the pending proposals. If a quorum for yes is achieved, the proposal is removed and the messages are executed in the same block that the final yes vote occurred.
- Note that we need to consider how we associate log messages with each individual transaction, we may end up needing some kind of "hierarchical messages", TBD. We also need to handle the possibility of these messages failing, probably by putting an explicit "this proposal failed" message in the logs.
- Question: any reason to consider separating voting from execution?
- A special message can be used to change the voting set needed for a multisig. This must be voted on by the existing quorum.
- Question: do we want to generalize this to a "convert account" message, and allow converting multisigs to regular accounts and vice-versa?
Wallets and keys
Kolme keeps an internal concept of accounts. Accounts are able to receive and send funds and perform other actions. Each account is either a multisig account or a regular account, equivalent to Externally Owned Accounts (EOAs) from other chains.
Each regular account has 0 or more wallets and public keys associated with it, and must at all times have at least 1 wallet or 1 public key. Public keys are the only authentication mechanism supported within Kolme, meaning every transaction you send to the chain must be signed with a public/private keypair. Wallets, on the other hand, represent a wallet on an external blockchain. Since many blockchains uses the same wallet addresses (e.g., all EVM chains use identical representations), we internally track wallet addresses as simple strings, not tied to a specific chain.
Wallet addresses can only be used for controlling an account through the bridge contract. It's easiest to understand the workflow by following how a user will normally initiate and use an account.
- Most applications begin with a fund transfer to bridge funds into the Kolme app. Let's say that the user has a wallet address
0xDEADBEEF
. They first initiate a transfer of 100 USDC into the bridge contract on an external chain. The message includes a public key,PUBKEY1
. - Listeners see this bridge event and submit it to the chain as a new deposit. When a quorum of listeners signs off, the processor accepts the event and executes it. During execution, it does the following:
- Look up the
0xDEADBEEF
wallet address. If it has an existing account ID associated with it, we use that ID. Otherwise, we add a new account entry with the next available account ID and associate0xDEADBEEF
with it. - Look up the
PUBKEY1
public key. If it is currently unused, we associate that public key with our account. - Increase the balance for the account by 100 USDC.
- Look up the
- The user uses
PUBKEY1
's secret key to sign a message to interact with Kolme. The public key is looked up, we find the appropriate account, and all actions are taken on behalf of this account. - A few days later, the user is on a new machine and no longer has the secret key for
PUBKEY1
. They only have the0xDEADBEEF
wallet. They need to add a new key to their account, so they send a new bridge contract message. This one includes no USDC, but specifies the public keyPUBKEY2
. - The same listener/processor dance occurs as in step (2), and unless
PUBKEY2
is already used by another account, we add it to our existing account. - The user can now perform Kolme transactions on their new device. They can also choose to remove
PUBKEY1
from their account, or even disassociate the0xDEADBEEF
wallet to allow it to be used with a new account.
Gossip
The gossip component is used in virtually all node software. It uses the libp2p
library for establishing a peer to peer network. This network is used for handling all coordination among nodes, in particular:
- Peer discovery, using both mDNS (for local networks) and Kademlia (for globally distributed peers).
- GossipSub for gossiping notifications between nodes. This covers use cases like:
- Broadcasting a transaction to the processor for inclusion in a block.
- Processor providing newly produced blocks.
- Notification of failed transactions.
- Alert notifications about potential chain manipulation.
- A request/response system for more sophisticated communication. In particular, this is used for synchronizing nodes.
Storage
Understanding blocks
Each block in a Kolme action is made up of essentially five parts:
- Metadata about the block: timestamp, signature, block height, previous block hash, etc.
- The transaction itself: a request from a client to perform some state transformation. This includes messages (individual actions to take, which may be standard Kolme messages or app-specific messages), a submission timestamp, and signature information. (Note: in Kolme, each block always contains exactly one transaction.)
- Any data loads required to reexecute the messages in the transaction.
- Logs generating when running messages.
- The new state of the blockchain. More on this below.
Our goal with Kolme is to maintain a lean chain of blocks that can be used to fully replay the chain history and validate that the state of the blockchain--as claimed by the processor that produces the blocks--is accurate.
What is state?
Now let's return to a question from above: what's the state of the blockchain? It comes down to two pieces of data:
- Framework state: information that Kolme itself maintains about your app chain, such as account balances, public key associations, and bridge contract addresses.
- App state: fully defined by the application. This can be any arbitrary data that an application decides to store.
There are some interesting properties about these pieces of data:
- They can be relatively large, easily in the hundreds of megabytes for active applications.
- The data will change on nearly every single block. For example, the framework state will change every block due to usage of nonces from accounts.
- Most of the data, however, will remain unchanged.
- We need to be able to cheaply clone this data in memory for executing transactions. We need to cheaply clone because we need to be able to maintain the old state, either for supporting concurrent queries or rolling back a failed transaction.
- Since the data is large, we don't want to store the data itself inside a block. Therefore, we store only a hash of the data in a block. As a result, we need to be able to cheaply hash these states.
The storage mechanism of Kolme is built around optimizing for these properties.
MerkleMap
The core data structure we leverage is a MerkleMap
. This is a Rust data structure with a BTreeMap
-like API. Internally, it is a base16 tree with aggressive caching, clone-on-write functionality, and various other optimizations. It won't be faster than a BTreeMap
or HashMap
for most operations. However, it provides an incredibly cheap clone
(just an Arc
clone) and does not require recomputing hashes for unchanged subtrees.
By using the merkle-map
package for maintaining framework and app state, a Kolme application gets aggressive data sharing, further reducing the total storage size needed for holding onto state from multiple blocks, without requiring pruning of the data. MerkleMap
data can also efficiently be transferred over a network, allowing for fast sync.
Pluggable storage
Kolme offers a pluggable storage backend mechanism. Each storage backend needs to hold onto essentially three pieces of data:
- A mapping between hashes and payloads. This is used by
MerkleMap
for storage of state data. - The block history itself. This contains just the hashes referencing the state, not the state itself.
- Some way of efficiently determining what the latest block is. This may be a separate field, or could be derived from the block history itself (such as a
MAX
query on a SQL database).
Additionally, some storage mechanisms provide a mechanism for a construction lock. This is used to allow multiple processors to run in parallel with only one producing a block at a time. This is part of Kolme's high availability mechanism.
The following storage backends are currently provided.
In memory
This is a simple storage mechanism intended only for testing. However, it could potentially be useful for ephemeral services. All data is kept in memory. There is a trivial construction lock provided to allow for better simulated testing.
Fjall
Uses the Fjall crate as a local filesystem key-value store. This backend does not provide any construction lock mechanism.
PostgreSQL
The PostgreSQL backend is primarily intended for high availability processors. It still uses Fjall for Merkle hash storage for efficiency, since in early testing uses a PostgreSQL table for storing hashes was too inefficient.
Side note: It may be worth revisiting this in the future, and at the very least have a background synchronization job between hashes in the local Fjall store and the PostgreSQL database. This would allow for faster launch of new nodes in a cluster without needing to synchronize data from other nodes on the network.
In addition to providing storage of the block data within a PostgreSQL table, this backend also provides a construction lock. It leverages advisory locks in PostgreSQL.