A retrospective on our move to NextDC M1

We successfully migrated our Melbourne POP from Equinix ME1 to NextDC M1, improving latency and peering for customers in Victoria and Tasmania. Most customers experienced only 10–15 minutes of downtime, and the move introduced direct low‑latency transit via GSL. Our subscribers now benefit from reduced latencies and better local peering with Edge IX and VIC IX.

Murry Mukhtarov, July 30, 2025

A retrospective on our move to NextDC M1

On 29th July 2025, we completed our migration from Equinix ME1 to our new Melbourne Point of Presence (POP) at NextDC M1. This was a major milestone for Neptune Internet, and while the migration was ultimately successful, we want to be transparent about what went well, what didn’t, and what we’re doing to improve.

This was a significant move involving not only relocating services but also rebuilding our backend systems and Kubernetes clusters that run critical services — including monitoring, alerting, our subscriber portal, website, and more.

In hindsight, we should have migrated our backend ahead of time. However, without a spare IP range to advertise backend systems separately from subscriber IPs, this wasn’t possible. This will be addressed for future projects to reduce the potential blast radius.

What Went Well

Successful migration for VIC & TAS subscribers 97% of customers experienced just 10–15 minutes of downtime during the cutover.
Improved latency & peering Subscribers now benefit from reduced latencies and better local peering with Edge IX and VIC IX.
Direct low-latency transit We now have a direct connection to our low-latency IP transit partner, Global Secure Layer (GSL).

You can check our public dashboard to see a drop of the latency varying from 10 to 40ms on our latency probes.

A latency graph showing before and after cutover

What Didn’t Go So Well

While the migration was completed, we encountered several challenges:

Static IP backfill issues – Subscribers with static IP settings were not backfilled correctly.
Frontend bug – Prevented those customers from re-applying static settings post-migration.
Stale AAAA DNS records in Cloudflare – Caused instability when loading our website and backend systems.
RADIUS authentication errors – Affected a small subset of customers, delaying migration by 40–60 minutes.
Atmosphere1 DNS rebuild – Live migration failed due to hypervisor version incompatibility, resulting in 1–2 hours of DNS query failures between 1 am and 3 am.
Persistent AAAA DNS issue – Took ~18 hours to fully resolve (fixed around 5 pm on 29th July).
Address search outage – Non-functional until ~5–6 pm as a side effect of the DNS issue.
Grafana misconfiguration – Historical data from Thanos/Prometheus didn’t display immediately.
BNG interface pattern change – Prevented certain metrics from appearing in the frontend.
SQS worker permissions – New workers lacked the required privileges to update the neso.au DNS zone.
Status page outage – An Uptime Robot outage coincided with the migration, making our public status page unavailable.

What We’re Doing Next

We’ve taken a set of action items from these incidents and are already working on improvements:

Expanding the breadth of our monitoring to detect more failure conditions before they impact customers.
Implementing automated canary testing for the subscriber portal and key backend systems.
Establishing a dedicated IP range for backend services to allow separate migrations without impacting subscribers.
Strengthening DNS migration procedures to avoid stale records and incompatibilities in future.

A retrospective on our move to NextDC M1

What Went Well

What Didn’t Go So Well

What We’re Doing Next

About Neptune

Support

Policies and Legal

Connect with us