[IaaS] Compute Node Outage

Incident Report for anynines

Resolved

We now have a replacement Cloud Foundry system, hosted on AWS ready. The Signup page can be found https://paas.anynines.com.

To compensate for the extended non-availability the new system will be free of charge for at least two months.

We are currently offering the following services: PostgreSQL, MongoDB and RabbitMQ.

We plan on shortly following with a Redis service. Those services are all on dedicated instances per service instance you create, so you don’t have to share your database node with anyone else, as you did with the old services. While only single nodes are available at the time of writing, we plan to offer clustered services soon.

A Swift service will also be available soon. Our Swift nodes were always separated from the rest of the OpenStack infrastructure and were never affected by the OpenStack meltdown.

We upgraded the Cloud Foundry version to 230, so when you deploy your apps again, you should keep in mind, that you probably have to change buildpacks to accommodate for the new version.

We are sorry for the interruptions caused by failure of our underlying infrastructure. We expect the AWS infrastructure to be more stable than the OpenStack infrastructure.

Outlook:

We currently plan to offer Swift and Redis as soon as possible. We will add our anynines service jumper to the Cloud Foundry setup, so you can access your data directly via your cf cli at any time.

We are evaluating using a managed vSpere Solution of our German data center, so we can offer you “Hosting in Germany” again.

On the medium term we plan to improve the platform that we now built on Amazon and, if it is possible, move back to our German Data Center.

Billing for the new platform will stay suspended for at least two months to make up for the fact, that anynines was not available. We will also conduct tests of the system to assure that the system meets the high standards, that you expect.

I personally apologise again for the extended downtime.

Julian Fischer
CEO

Posted Apr 26, 2016 - 18:42 CEST

Update

There's good progress on the AWS installation of Cloud Foundry. The Cloud Foundry runtime is already running. Now anynines specific components such as the data services will be deployed next.
In the mean while conversations with German infrastructure providers (IaaS) will be continued.

Posted Apr 22, 2016 - 10:00 CEST

Update

Our OpenStack infrastructure could not be restored, so we are evaluating all other possibilities.
We are creating a fallback Cloud Foundry installation on AWS as this was the fastest way to work on an alternative solution.
In the meantime we are evaluating other IaaS Providers in Germany for hosting a new Cloud Foundry/anynines instance.
Please send us an email via support@anynines.com with information about your used production services instances (organization name, space name, service instance name), so we can provide you database dumps in the meantime.

Posted Apr 19, 2016 - 10:43 CEST

Update

As the Infrastructure errors can not be resolved we begun deploying another Cloud Foundry to the european AWS instance yesterday.
This process still needs time to reach a point where we are able to migrate user data from the IaaS storage layers to the new instance.

Posted Apr 12, 2016 - 11:13 CEST

Update

We received analysis results from our datacenter support.
A central hardware switch is affected by a hardware failure.
We won't be able to replace the component before tomorrow.
We don't expect damages on persistent data storage layers,
so your persisted data will be back up and running after replacing the affected component.
We apologize for the longer outage and will try to resolve the issues as soon as possible.

Posted Apr 10, 2016 - 22:25 CEST

Update

We are still investigating the issue on several of our compute nodes.
Our systems are affected by Kernel panics during the bootup process.
Our infrastructure team has contacted our data center support for additional analysis.
We will keep you updated about the progress.

Posted Apr 10, 2016 - 21:11 CEST

Investigating

We are experiencing a major problem with our OpenStack Compute nodes. The problem is caused by Kernel panics on bootup of our virtual machines. We will keep you updated about the problem.

Posted Apr 10, 2016 - 20:24 CEST