Our services are unavailable
Incident Report for SMSFactor
Postmortem

đŸŽó §ó ąó „ó źó §ó żÂ English version

Overview

Our server that contains our API stopped working, due to a RAM issue. In other words, all services hosted on our front-end server were inaccessible (Platforms, API, short links, Hubspot integration, documentation). All API requests made during the incident were lost. To resolve this issue, we had to restart our server.

Timeline (CET)

How the project or issue unfolded and who responded.

  • 14:53 - Start of the incident
  • 14:53 - Start of the investigation
  • 15:00 - Updating our status page
  • 15:19 - Ticket created at our host
  • 15:23 - Call to our host's hotline
  • 15:27 - Server problem detected
  • 15:31 - Server restart
  • 15:31 - Server accessible again

Duration

  • Start: 14:53
  • End: 15:31
  • Downtime: Yes
  • Downtime duration: 38 min
  • Impact on the client: Yes
  • Duration of the impact on the client: 38 min

Follow-up Action Items

We should have an automatic failover system for our servers to allow us to act calmly during such incidents.

After these repeated incidents, we will make improving our infrastructure a priority. Thank you very much for your patience and understanding. If you have any questions, please do not hesitate to send an email to support@smsfactor.com.

‌

đŸ‡«đŸ‡·Â French version

Aperçu

Notre serveur qui contient notre API a cessĂ© de fonctionner, Ă  la suite d’un problĂšme de RAM. En d'autres termes, tous les services hĂ©bergĂ©s sur notre serveur frontal Ă©taient inaccessibles (Plateformes, API, liens courts, intĂ©gration Hubspot, documentation). Toutes les requĂȘtes API effectuĂ©es pendant l'incident ont Ă©tĂ© perdues. Pour rĂ©soudre ce problĂšme, nous avons dĂ» redĂ©marrer notre serveur.

Chronologie (CET)

Comment le projet ou le problÚme s'est déroulé et qui a répondu.

  • 14:53 - DĂ©but de l’incident
  • 14:53 - DĂ©but de l’investigation
  • 15:00 - Mise Ă  jour de notre page status
  • 15:19 - Ticket crĂ©Ă© chez notre hĂ©bergeur
  • 15:23 - Appel Ă  la hotline de notre hĂ©bergeur
  • 15:27 - DĂ©tection du problĂšme sur le serveur
  • 15:31 - RedĂ©marrage du serveur
  • 15:31 - Serveur de nouveau accessible

Durée

  • DĂ©but : 14:53
  • Fin : 15:31
  • Temps d'arrĂȘt : Oui
  • DurĂ©e du temps d'arrĂȘt : 38 min
  • Impact sur le client : Oui
  • DurĂ©e de l'impact sur le client : 38 min

ÉlĂ©ments d'action de suivi

Nous devrions avoir un systĂšme de basculement automatique pour nos serveurs afin de nous permettre d’agir avec sĂ©rĂ©nitĂ© lors de tels incidents.

AprĂšs ces incidents rĂ©pĂ©tĂ©s, nous allons faire de l’amĂ©lioration de notre infrastructure, une prioritĂ©. Merci beaucoup pour votre patience et votre comprĂ©hension. Si vous avez d'Ă©ventuelles questions, n'hĂ©sitez pas Ă  envoyer un e-mail à support@smsfactor.com.

Posted Mar 14, 2024 - 16:19 CET

Resolved
This incident has been resolved.
Posted Mar 14, 2024 - 09:25 CET
Update
We are continuing to monitor for any further issues.
Posted Mar 13, 2024 - 15:35 CET
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Mar 13, 2024 - 15:35 CET
Investigating
We are currently investigating this issue.
Posted Mar 13, 2024 - 15:00 CET
This incident affected: API, Customers Portal, Webhooks, Operator Network, Reminder, Mail2SMS, and VLN.