Study greatest practices for debugging and error dealing with in an enterprise-grade blockchain software – IBM Developer

0
53


Blockchain is a shared, replicated immutable ledger for recording transactions, monitoring property, and constructing belief. An asset may be tangible (for instance, a home or a automotive) or intangible (for instance, mental property or patents). Blockchain is constructed on properties like consensus, provenance, immutability, finality.

In a standard enterprise state of affairs, a transaction that entails a number of organizations is recorded in another way by every enterprise. If two organizations disagree on the state of a transaction, then a dispute happens, which may usually be expensive and time consuming to resolve. Blockchain introduces the next ideas:

  • Multiparty transactions: Signed by everybody concerned within the transaction.
  • Shared ledgers: The identical ledger is replicated in each group within the community and saved synchronized utilizing a course of known as consensus. Ledgers are immutable and ultimate; after a multiparty transaction is written to the ledger, it can’t be reversed.

To get an outline of blockchain in additional element, try the Get began with blockchain studying path. This weblog submit focuses on totally different factors of failures in a typical blockchain-based software, potential causes for failures, and really useful debugging and error dealing with strategies.

The significance of error dealing with

In cloud-based software deployments that contain a number of integration factors, there may be all the time a risk of encountering transient failures. Planning for and dealing with these transient failures is essential to take care of a resilient structure.

Unhandled error situations can result in failures and system crashes, they usually usually expose the applying in a weak state. Good exception dealing with might help to anticipate errors or programs crashes upfront after which put in applicable code to recuperate from them. It may not be potential to deal with all distinctive instances or sudden situations, however a effectively designed system ensures sleek exit with out inflicting any main points, inconsistencies, and safety vulnerabilities to the system.

Useful blockchain terminology

  • Peer: Maintains ledger and state, commits transactions, and may also endorse transactions by receiving a transaction proposal and responding by granting or denying endorsement (should maintain sensible contract to endorse).
  • Ordering node: Orders and packages transactions into blocks after which communicates these blocks to committing friends.
  • CA: Points digital certificates to member organizations and their customers.
  • Channels: Channels present privateness between totally different ledgers. Ledgers exist within the scope of a channel.
  • Good contract: Accommodates the enterprise logic that governs how information is written to and skim from the ledger.
  • Transaction: Any operation that modifies the ledger state is recorded as a transaction, akin to asset trade or a switch.
  • Ledger: A ledger is maintained by every peer and contains the blockchain and world state.
  • Id: The assets which you could entry in a community are decided by the id and is usually represented by an X.509 certificates issued by the CA.
  • Endorsement coverage: Describes the situations by which a transaction may be endorsed. A transaction can solely be thought of legitimate if it has been endorsed in accordance with its coverage.
  • Connection profile: Accommodates community info akin to node degree connection info, TLS certs, and extra.

The next picture represents the steps concerned in a daily blockchain transaction movement:

Image showing a regular blockchain transaction flow

  1. Shopper software submits a transaction proposal.
  2. Endorsers E0, E1, and E2 every execute the proposed transaction. None of those executions replace the ledger. Every execution captures the set of Learn Write information (known as an RW-set), which now flows within the material.
  3. Software receives responses. The RW-sets are signed by every endorser and in addition embrace every file model quantity.
  4. Software submits proposal responses as a transaction for ordering.
  5. Orderer sends blocks to committing friends.
  6. Committing friends validate transactions. Validated transactions are utilized to the world state and retained on the ledger.
  7. Software is notified when a block is dedicated to the ledger of a peer.

Potential errors in a blockchain community

Errors can come up at any level of the transaction and are attributable to the underlying community, enterprise logic, outdated information, or community time-outs. You possibly can resolve errors that aren’t attributable to software enter or incorrect utilization by including the correct error dealing with and retry mechanism within the software layer to construct in software resiliency.

Errors usually fall into one in all two classes:

  • Retryable: Errors from chain code or the community must be propagated again to the applying layer for error dealing with and a retry mechanism.
  • Nonretryable: Errors which might be attributable to incorrect utilization must be dealt with resulting in a sleek exit of the code path.

Community errors

A Hyperledger Cloth node or Java shopper communicates with the Hyperledger Cloth community utilizing gRPC. The gRPC know-how handles transferring information reliably between the Cloth community and the Cloth shopper software. The appliance units the gRPC settings primarily based on the applying utilization.

Downside abstract

grpc request timeout whereas submitting the proposal

This timeout usually occurs at any stage of blockchain transaction movement. Within the earlier picture, the day trip happens throughout Steps 1-4 the place the shopper communicates with the community attributable to community latency, unavailability of the chain code container, or poor peer well being.

Error message:

sendPeersProposal - Promise is rejected: Error: REQUEST_TIMEOUT
Peer{ id: 1 , identify: peer1.org1.instance.com, channelName: mychannel,
url:grpcs://192.168.1.1:7051, mspid: Org1MSP} failed due to
timeout(35000 milliseconds) expiration
java.util.concurrent.TimeoutException: null

Request timeout throughout the execution of a proposal

This error happens when the time taken for executing the proposal exceeds the configured execution time default of 30 seconds. As a greatest follow, it is best to restrict the computations or operations that may be carried out in a wise contract, and if the applying logic doesn’t assist the identical, then you’ll be able to enhance the execution timeout to mitigate the problem.

This error may be thrown by the lifecycle system chaincode (LSCC) in Hyperledger Cloth as effectively throughout the startup of the chaincode container. Nonetheless, the decision stays the identical on this case as effectively.

Error message:

"peer1.org.com:7051" failed: message=did not execute transaction 799eb959954a7f2f8f75dee735969a4ba374b4bc98b4bbacd2fc85fc57a860b9: error sending: timeout expired whereas executing transaction, stack=Error: did not execute transaction 799eb959954a7f2f8f75dee735969a4ba374b4bc98b4bbacd2fc85fc57a860b9: error sending: timeout expired whereas executing transaction

  1. Set CORE_CHAINCODE_EXECUTETIMEOUT =<60s or larger> within the Cloth configuration for dealing with the request timeout throughout the execution of a proposal.
  2. Set the next gRPC settings on the Cloth shopper finish to assist enhance the gRPC timeout, which usually occurs due to community latency:

    "grpc.keepalive_time_ms": 120000,
    "grpc.http2.min_time_between_pings_ms": 120000,
    "grpc.keepalive_timeout_ms": 20000,
    

  3. Test the Cloth shopper day trip configuration and tune it primarily based on the applying processing logic and community suggestions.
  4. Test the peer well being and the well being of IO operations on the peer, and if the peer shouldn’t be wholesome, a restart may resolve the problem.
  5. Shopper-side retry dealing with: Timeout Exception must be categorised as a retryable error and dealt with by writing retry logic on the shopper aspect. You should utilize a easy retry dealing with of this exception with exponential backoff to recuperate whether it is attributable to intermittent community points.

MVCC_READ_CONFLICT errors

Learn-write units are generated by the peer when a transaction is submitted to a peer. This learn/write set is then used when the transaction is dedicated to the ledger. It accommodates the identify of the variables to be learn/written and their model once they had been learn.

Downside abstract

Each peer within the community (VSCC) validates the variety of signed proposals and the model of each learn key within the read-write units towards the world state upon receiving the blocks from the ordering service.

If, throughout the time between set creation and committing, a unique transaction was dedicated and adjusted the model of the important thing within the peer’s present world state, then the unique transaction is rejected throughout committal as a result of the model when learn shouldn’t be the present model. This error is usually seen throughout commit.

Error message:

Peerpeer1.org.com:7051 has rejected transaction "c91172484bad08eaae2595522a0a8c0a30891b4a90110e0a4fc490c0aacdb399" with code "MVCC_READ_CONFLICT" , validationCode=11

  1. To handle this state of affairs, on the design degree, you’ll be able to create information and transaction constructions that keep away from enhancing the identical key concurrently. Check out the Hyperledger Cloth samples for an instance of how to do that.

  2. The appliance must keep away from key collisions as a lot as potential and may want to jot down retry logic on the shopper aspect. The retry logic ought to question the newest state of the important thing from the ledger and apply the required modifications on the newest state.

Shopper-side retry dealing with

If(validationCode == 11) {
    Object a = // question the ledger once more for the newest state.the place a being an object on the ledger
    //carry out the writes on  a
    a.setAction() }

In a typical blockchain transaction movement, when purposes are registered to be notified when a block is dedicated to the ledger of a peer, when the transaction fails, the notification usually has VALIDATION_CODE: 11, which signifies MVCC_READ_CONFLICT.

You may also think about VALIDATION_CODE: 12 that signifies PHANTOM_READ_CONFLICT beneath the identical class.

A pattern blockchain occasion:

 [eventTransactionId: f948056aa59d42810d3318dc1df7643152e220d3bb75924d5228a5e9efb95017, ,status: failure,eventName: xxxx,actionType: xxxx,errorCode: VALIDATION_CODE: 11,errorMessage:null,validationCode:11,channel: xxxx,blockNum: 24712079]

The blockchain occasion contains the transaction ID that was submitted and the small print indicating whether or not it was dedicated on the blockchain or not. The standing: failure signifies the block was not dedicated and validationCode signifies the explanation for failure. On this instance, VALIDATION_CODE:11 signifies there was MVCC_READ_CONFLICT. VALIDATION_CODE: 12 indicating PHANTOM_READ_CONFLICT additionally falls in the identical class.

Pattern retry code:

If ( blockChainNotificationObject.getValidationCode() == 11 || blockChainNotificationObject.getValidationCode == 12 ){
              1.Question Ledger to get the newest state of the Object
     2. if the state is already carried out/utilized
                 {//in all probability parallel duplicate invocation
                   2.1 Contemplate the occasion as success  and proceed the applying logic  }
          3.else{  apply the required modifications on the newest state and retry the Transaction }}

Peer lag errors

Community delays can usually result in the friends being out of sync the place one peer may be nonetheless catching up with the newest block. So, any queries made on the lagging peer would return an outdated state that may now not be legitimate for the applying.

Downside abstract

P1 and P2 are two friends, and K1 is the brand new key added to the ledger. Due to community latency, K1 shouldn’t be but added to P2’s ledger. The appliance has queried for K1 on P2, which might obtain a null worth.

It is crucial that you simply program the applying to determine and deal with inconsistent information attributable to peer lag. Functions ought to have a mechanism to hear to dam addition occasions and verify that the block has been added efficiently. Within the case of inconsistent behaviour or null values returned whereas querying, then you’ll be able to implement retry logic to determine the state replace. A retry with exponential backoff offers the time for friends to beat the community points/delay and sync up on the state.

Endorsement coverage failures

All transactions must be endorsed by the endorsing friends within the execution section (second step) of a blockchain transaction movement. The endorsement of transactions can fail for a number of causes, akin to invalid endorser signatures or different technical causes. Many of the potential causes for endorsement coverage failures are due to misconfigurations or transient world state inconsistencies between the friends. The important thing-value retailer, which maintains the world state, is up to date by every peer independently within the validation section. Subsequently, transient world state inconsistencies between the friends are potential. On the identical time, the endorsing friends use the world state to generate learn/write units within the execution section. Thus, the world state inconsistencies result in a learn/write set mismatch within the endorsement response inflicting an endorsement coverage failure of the transaction.

  • Within the case of endorsement coverage failures attributable to misconfigurations, the it’s essential to validate the configurations and proper them.
  • Functions can have resiliency and retry logic in place and take a look at fetching endorsements from each peer in a corporation earlier than giving up.
  • The appliance can change the endorsement coverage, primarily based on its logic. To alter the endorsement coverage, you’ll be able to specify Channel/Software/Endorsement in configtx.yaml to ANY endorsement. Cloth makes use of this configuration because the default endorsement coverage in all chaincode.
  • Within the case of the world state inconsistencies, it’s endorsed to have shopper aspect retry logic as mentioned within the MVCC_READ_CONFLICT errors part.

Chaincode errors

Chaincode to chaincode communication error

There are purposes the place chaincode enterprise logic would require communication with different chaincode to implement a rule or logic.

Downside abstract

Chaincode to chaincode communication can fail due to a number of causes, akin to unavailability of specific chaincode on a given channel or the dependent chaincode not being prepared to simply accept requests. A few of the errors fall beneath nonretryable errors like INVALID CHAINOCDE NAME, INVALID CHAINOCDE VERION, INVALID ARGUMENTS, and CHAINCODE_UNAVAILABLE. It is best to deal with different chaincode to chaincode communication errors by having retry logic on the shopper aspect.

Error message:

Error: INVOKE_CHAINCODE failed: transaction ID: f6ab6dbd747ddf25ebfe158eea5a9b0b7478d6a031b66a08a4f3c2f02fe2f7fe: can't retrieve bundle for chaincode check/1.0, error open /var/hyperledger/manufacturing/chaincodes/check.1.0: no such file or listing"

This message signifies that the chaincode container is unavailable. Nonetheless, if in case you have ascertained that the chaincode is put in and instantiated on the friends and is just unavailable due to the dependent container startup delay, then you’ll be able to implement personalized dealing with on the software resiliency layer to recuperate the transient failure.

To deal with chaincode to chaincode communication errors, distinctive error codes may be propagated by the caller chaincode and the shopper software may be programmed to determine and categorise these as retriable errors primarily based on the identical. You possibly can retry with exponential backoff for restoration.

No LedgerContext discovered error
A few of the errors will not be thrown as a part of the validation section VSCC however happen throughout the execution section. There might be a number of causes for these errors, they usually usually occur if any of the operations on the ledger are taking longer than the anticipated time.

Error message:

ERRO 09f [ddc81d1b] Did not deal with PUT_STATE. error: no ledger context runtime.goexit /choose/go/src/runtime/asm_amd64.s:1333 PUT_STATE failed: transaction ID: ddc81d1bcb69eecd6c6bbcf85ba16b2168486d4b232ef3c03fe5bbc7bb2adea1 github.com/hyperledger/material/core/chaincode. runtime.goexit

These error situations may be dealt with in chaincode whereby a singular error code may be propagated by the chaincode and the shopper software may be programmed to determine and categorise these as retriable errors primarily based on the identical. You possibly can retry with exponential backoff for restoration.

ValidationCode=17 and ValidationCode=18 errors

A few of the chaincode invocation can fail with error validationCode=17 or validationCode=18. validationCode=17 signifies EXPIRED_CHAINCODE and validationCode=18 signifies CHAINCODE_VERSION_CONFLICT.

Downside abstract

The transaction submitted on an older chaincode container that later expired due to the supply of a brand new container; throughout the validation section (VSCC), the transaction will get rejected.

Error message:

Nov 22 16:32:52  peerxxxxxxx-r2yyyy  peer  2021-11-22 11:02:52.180 UTC [valimpl] preprocessProtoBlock -> WARN fdd59 Channel [XXXXX]: Block [22596691] Transaction index [3] TxId [2224af5907927c60fad8b39f82537620b89d438ee5c44229533a3779985b833f] marked as invalid by committer. Cause code [EXPIRED_CHAINCODE]

You possibly can program the shopper to determine and categorise these as retryable errors. You possibly can retry with exponential backoff for restoration.

Abstract

These are among the main forms of errors that you simply may run into in any blockchain-based software. By understanding these factors of failure forward of time, you’ll be able to have the correct configurations, error dealing with, and restoration methods in place from the design section to deployment. That is essential to constructing a resilient structure.

LEAVE A REPLY

Please enter your comment!
Please enter your name here