Posted 10 months ago
by rousso

About the recent downtime

After bringing everything back up and looking at our log files to see what happened, we conclude that this embarrassing downtime could have gone by unnoticed had we realized it sooner.

RSS Graffiti runs on multiple servers. The website and the front end of the application run independently from the processing engine. The problem started on one of our back-end servers while we were still up and working, but we failed to get the alert before calling it a day and going to bed. Then gradually everything started to go wrong to the point the front-end web server became unresponsive. Later, the processing queue stopped functioning too and the rest of the servers that were still up couldn’t do anything about it without human intervention.

During the past days we have been phasing out an old domain name we were using when we started developing RSS Graffiti. Changes in the DNS and MX along with an apparent bug at Google Apps GMail , caused a mail actually originating from to appear to have been sent via our old domain name which in turn led us to ignore the warning.

Disasters usually come about by a number of small coincidences and this one was no exception. Although the problem with the mail will most likely not happen again, we definitely need to find a way to wake up when an alert comes in. RSS Graffiti can sustain failures and recover from them without affecting the end user, but obviously this is not true if the problem remains unnoticed and unsolved for many hours.

We would like to ask you to please accept our apologies for this incident and assure you for our determination to prevent similar issues from happening in the future.