When the Signal Fades

This post was originally published in the JHU Global mHealth Initiative’s Digital Roundup Vol4 June 2015

The phones were out again. No voice. The only thing that could get through were SMS and queued outbound emails when a mobile data connection happened to grace us with its presence. “Do you have a signal?” we asked each other every 10 minutes. Finally, at 5:00pm we got a connection. The emails streamed in as did messages from our communication apps. We furiously replied, “We’re safe” to everyone who messaged. Then the thought occurred, if this is happening to me, what’s going on with my systems?

I deployed two systems in Nepal as a global health IT project manager. We made assumptions that impacted the availability of these systems when they were most needed, just after the April 25th earthquake. The telecommunications infrastructure rapidly returned after the initial event, but usage overburdened the systems denying access for many. I simply hadn’t planned for something of this magnitude. Major projects often have contingency plans that may be overlooked by small-scale implementers and organizations. This article shares some of the lessons I learned in the immediate aftermath of the quake so you can have a view into planning for disaster.

Electricity cut off when the ground started shaking. Immediately, everything switched to batteries or generators and the countdown began until these power resources began to fail. Everyone rushed outside to open areas where things couldn’t fall on them. This included any field, parking lot, traffic circle and even the middle of the street. Then, we started calling relatives to see if they were safe. Dense populations in such a small area made it challenging for any of us to connect.

mHealth systems rely on a backbone of power and network connectivity. Most often the information flows from a device to a central server through that backbone. In this situation, that backbone failed and every individual had to scramble to find an alternative method.

Our phones connect to centralized towers that cover a geographic area. These towers have a limit to the number of active connections at a given time. This is why we often see mobile towers for network operators stationed at major events. These mobile units increase the capacity when masses of people gather in a small geographic area. This means that some areas didn’t have any mobile data problems, but others were completely inaccessible. Therefore, phones with larger or more powerful antennas could get better reception. After some time, we discovered this and started walking around Kathmandu to find an area with access.

Landlines, such as ADSL or ISDN, run from central locations out through a number of signal strengthening repeaters until they reach your building. Each of these repeaters require power. Failure in a single node, such as a generator running out of diesel fuel, could cause the entire downstream system to stop working. We had access to an ADSL line that didn’t work during this time unless the electricity provided by the government was working. Clearly, the normal backup mechanisms provided by the internet service provider didn’t work.

As an individual, this meant that normal services on my smart phone didn’t work. I often got “network busy” messages when trying to make calls, SMS went through with a 10 to 30 minute delay and my outgoing emails were being sent by my Gmail application because it constantly attempted to send them through mobile data. Incoming emails came in only once per day because they require a steady stream of data connectivity to download to the phone. Viber, Whatsapp and Voxer messages came in whenever we had a moment of mobile data connectivity. My maps app didn’t work because I hadn’t saved Kathmandu for offline use. Fortunately, a friend of mine suggested OSMAnd a few months earlier, which I downloaded for offline access to all of Nepal’s OpenStreetMap data. This saved me because I was separated from my family and didn’t know the shortest walking route to meet them kilometers away.

As an IT project manager, the lack of a digital contingency plan caused failure in the deployed systems. Our systems were deployed in the cloud with appropriate security and nightly backups. Each phone had the capability to store offline data until it was synced. We didn’t consider the event of total failure when trying to access these cloud services that were hosted in foreign countries. Of course, we had paper forms needed to complete mission critical tasks but, there was a learning curve and supply issue switching back to paper from digital. Access was restored three days later, but buildings weren’t safe. Only a few laptops were available to return to normal work and perform retrospective data entry.

So, what can we do to be prepared in the future? First, project managers can think through their mission critical assets and data needs. Then identify and test alternate ways to run these workflows without reliable network and power backbones. For example, a local backup of a cloud system may be appropriate for this use case or alternative data entry methods, through voice or SMS, may be warranted. Simulation is key here and I recommend that you practice what needs to happen when normal operations aren’t available.


Contact me if you'd like to talk about this post.

 