Pelion Device Management Client error recovery mechanism
Connectivity Error Handling
Device Management Client handles error recovery on behalf of applications, thereby providing a seamless connectivity experience and recovery from temporary network break issues or disruptions to Device Management services. Connectivity between a client and Device Management encompasses network connectivity, CoAP level connectivity and client-service level connectivity (including, for example, handling client certificate expiry or renewal).
The logic handling reconnection to Device Management:
- Establishes a secure network connection.
- Registers to Device Management.
- Resends CoAP messages. More information about resending is in the CoAP specification.
This section explains what kind of connectivity errors an application may receive, what they mean and how the Device Management Client handles them.
Note: Some errors may need to be handled by the user or the application.
Reconnection attempt intervals
Device Management Client tries to establish a new connection to the server with incremented reconnection attempts:
- The client picks a random initial reconnection time between 2 to 10 seconds (to prevent multiple clients trying to connect to Device Management at the same time after a possible service break).
- It tries reconnection after this initial time.
- If the connection fails, the client returns an appropriate error to the application.
- The client continues retrying the connection to Device Management with an increased reconnection time. Every failed reconnection attempt increments the reconnection time by a factor of two, continuing until the reconnection time reaches one week. For example, if the client picks the initial reconnection time of 5 seconds, it tries to reconnect at 5, 10, 20, 40, 80... seconds up to one week.
- The reconnection time does not increase above one week; the client will attempt to reconnect once a week until it reconnects or the device stops operating.
Every successful reconnection resets the reconnection time; if there is another failure, the reconnection attempts will begin with the original reconnection time.
Error codes
Failed connection attempts can return different error codes to a client application. The following list explains the error codes and proposes possible fixes. The actual enumerations for these error codes are located in mbed-cloud-client/mbed-cloud-client/MbedCloudClient.h
.
ConnectBootstrapFailed
-
Bad request Account device quota reached
Device Management Client failed to successfully bootstrap to Device Management and cannot retrieve credentials for the Device Management service. This normally happens when you are using a developer certificate, and have already created 100 devices with that certificate, or if you have reached your account's device bootstrap limit.
To fix this issue, delete some devices through Device Management Portal. Device Management Client continues to retry to bootstrap, and will connect as soon as there is room for new devices.
-
Bad request (no details)
A generic failure that could be the result of a missing certificate, lack of access rights, or failure to upload your CA certificate to the Device Management server.
Alternatively, your certificate may not be enabled (in other words, it might be blacklisted). Please, ask your administrator to enable it.
ConnectInvalidParameters
The application has entered one or more wrong parameters at registration. Normally, this error occurs when the application provides an invalid Device Management URL (the accepted CoAP format is coap:://<URL>:5684
), device name or account ID (for example, a parameter longer than 64 characters).
ConnectNotRegistered
The application tried to call close()
without the client being in the registered
state.
ConnectTimeout
Device Management is not responding to the client's registration attempts. This normally happens when the client cannot finish a successful registration within three minutes and there are no network issues during that time.
ConnectNetworkError
There is a network level issue between the client and Device Management server causing a connection break. The client returns this error when trying to register, or if it loses connection while already registered. It falls back to the reconnection logic and attempts to recover from the lost connection by re-registering itself.
ConnectResponseParseFailed
The application received a malformed CoAP message from the server, which it failed to parse. This can happen if a third party server implementation has mismatching CoAP library implementations, and should not happen with Device Management services.
ConnectMemoryConnectFail
The client failed to store the Device Management device credentials it received during bootstrap. This can happen if the client cannot create a CoAP message due to low memory.
ConnectNotAllowed
The application tried to call an API that the client cannot handle at that stage. For example, the application tried to call keep_alive()
before setup()
.
ConnectSecureConnectionFailed
There was a (D)TLS level failure during the registration phase. This can happen because of an expired device certificate, in which case the client falls back to the bootstrap phase to fetch updated certificates.
ConnectDnsResolvingFailed
The client cannot resolve the DNS query for the Device Management server URL addresses. It continues to retry until it resolves the DNS, then continues the connection process.
ConnectorFailedToStoreCredentials
The client cannot store the device credentials in the secure storage. Check the memory card (in Mbed OS). If it is corrupted, please format it.
ConnectorFailedToReadCredentials
The client cannot read the device credentials from the secure storage. Check the memory card (in Mbed OS). If it is corrupted, please format it.
ConnectorInvalidCredentials
The client failed to get the proper bootstrap credentials from the secure storage. Try to factory reset the secure storage, then try the operation again.
The client returns an Invalid Parameter error
Sometimes, your client application might return MbedCloudClient::ConnectInvalidParameters
while registering with Device Management.
In factory mode, this can happen because you're using a wrong URI format to access the bootstrap service or LwM2M server:
- For bootstrap, the URI format is:
coaps:\\<mbed- bootstrap-server-url>:5684?aid=<your-account-id>
. - For LwM2M, the URI format is:
coaps:\\<mbed-LWM2M-server-url>:5684?aid=<your-account-id>
.
Reflash these values with the factory configurator client (FCC), and run your application again.
The client prints RTX error (Mbed OS only)
If you see:
-
RTX error code 0x00000001 ..
in your console, it means your application has run out of stack memory.Device Management Client handles its asynchronous operation through a separate thread. That thread has been assigned 8 kB of its own stack space, but for some applications, this might not be enough. You can increase the stack from your application's
mbed_app.json
file, by modifying the stack size value from8192
to some higher value:"nanostack-hal.event_loop_thread_stack_size": <8192>,
Remember to check your hardware configuration - it must have enough memory to handle a bigger stack size.
-
If you compiled your application as a debug version, it will require more flash memory than a release version - typically 1.5 to 2 times more. For debugging purposes, you may need to select hardware that is less constrained than your normal deployment devices.
Request frequency issues
Device Management provides sufficient capacity to handle REST API requests. When the frequency of issued requests is above a threshold set to protect our system, our system returns an error with code 429. If you receive this error, please pause request execution for 60 seconds. You can then resume normal work.
Firmware Update Error Handling
During the lifetime of the device, a number of errors relating to firmware update can occur:
- Internal to the Cloud client, that are only printed if the debugging log is turned on.
- Cloud client errors that are printed on the serial even when debugging is turned off.
- Errors that are reported to the Cloud using the 5/0/5 UpdateResult LwM2M resource.
The following errors are reported to the Cloud.
WarningCertificateNotFound
UpdateResult: 6: Unsupported package type.
An update certificate is missing. The update certificate needs to be injected using the factory provisioning flow or, in case of developer mode, included in update_default_resources.c
via manifest-tool.
ErrorWriteToStorage
UpdateResult: 2: Not enough storage space for the new firmware package.
Something went wrong when writing firmware to storage on device or there is not enough storage for the new firmware candidate.
ErrorInvalidHash
UpdateResult: 9: Unsupported protocol.
The hash of the downloaded firmware does not match the hash supplied in the manifest. It could be that the manifest was created with a wrong URL or a wrong firmware image. It is also possible that there was a network error or a storage error which corrupted the hash supplied in the manifest or the firmware candidate.
WarningIdentityNotFound
UpdateResult: 6: Unsupported package type.
The Device Identity, which consists of Device, Class and Vendor IDs cannot be retreived. The device identity files can be injected using the factory provisioning flow or, in case of developer mode, included in update_default_resources.c
via manifest-tool.
WarningClassMismatch
UpdateResult: 6: Unsupported package type.
The Device Class does not match the one specified in the manifest, so the Update client rejects the firmware update. Refer to device indentifiers documentation for more information.
WarningVendorMismatch
UpdateResult: 6: Unsupported package type.
The Device Vendor ID does not match the one specified in the manifest, so the Update client rejects the firmware update. Refer to device indentifiers documentation for more information.
WarningDeviceMismatch
UpdateResult: 6: Unsupported package type.
The Device ID does not match the one specified in the manifest, so the Update client rejects the firmware update. Refer to device indentifiers documentation for more information.
WarningCertificateInvalid
UpdateResult: 6: Unsupported package type.
The update certificate on the device is not valid. This may be because:
- The certificate has expired.
- The data isn't a certificate.
- The certificate has a signature mismatch.
- The certificate is encoded wrong so that it's not bare DER.
WarningSignatureInvalid
UpdateResult: 6: Unsupported package type.
The signature on the manifest is not valid.
WarningURINotFound
UpdateResult: 7: Invalid URI
The device cannot reach the firmware URI specified in the manifest.
WarningRollbackProtection
UpdateResult: 6: Unsupported package type.
The firmware candidate is of an older version (the firmware version is the timestamp when the manifest is created) than the current active image. The rollback protection feature ensures that the update is rejected.
WarningUnknown
UpdateResult: 6: Unsupported package type.
All other errors.