We are currently experiencing issues with Printix Cloud. We are working to resolve the issues. We will post updates to this announcement when we have news. Select Follow if you wish to be notified about these updates.
During this period you may experience slow performance, inability to sign in and release documents via Printix App. Direct print to printers on the local network is still possible.
Incident Report:
Printix Sev-1 Incident 17th April 2023
Overview
Following a Sunday capacity increase for the Printix service adding to the Kafka infrastructure, clients reported service problems during Monday. The decision was taken to restart the Kafka cluster service to level out resource usage. This took 10 minutes after which normal service operation resumed and has remained normal since.
What Happened
3:30:pm UTC 17th April:
cloud help support case received advising of Slow Print, Sign-in, and Ping Failures for Printix. This was reflected in message handling stats within the Kafka service
3:45pm UTC 17th April:
Following direct contact from the senior service manager, it was agreed to restart the Kafka cluster to rebalance resources across the many nodes.
4:00pm UTC 17th April:
After the restart, message processing statistics improved and tech support reported that service performance was returning to normal.
Resolution
The Kafka service restart redistributed event partitions more evenly among the old and the newly added Kafka nodes in the cluster.
Root Causes
The maintenance work at the weekend should have included a final restart to ensure that all new and old instances were using an equally distributed amount of resource. At the time with low load, this imbalance was not apparent to the engineers working on the service and this step had not been identified as necessary when expanding capacity.
Impact
Slow performance caused incomplete tenant creations impacting customer experience for the tenant owners. An unknown substantial number of existing customers were also unable or severely delayed in printing.
Action Items
Documentation is being completed on scaling the cluster up and the steps required after new are brought up.
Comments
0 comments
Article is closed for comments.