AI & ML COVID-19 Analytics

COVID-19 Makes Mobile Operators, AI and Analytics as Critical as Hand Sanitizer (Silvia Veronese,

As the COVID-19 pandemic spreads across the world, the first thing we all did, after rushing to the store in search of masks and isopropyl alcohol, was to crawl the Internet for information. And as the pandemic continued to grow, employees everywhere started receiving mandates to work from home, students switched to virtual learning and, in a matter of days, almost every community started to shelter-in-place or voluntary quarantine using the simplest and primordial rule of isolation.

Can Mobile Network Operators Sustain the Changes in Traffic?

From working off the backbone of the Internet to moving to the last mile at home, network traffic demand suddenly shifted in pattern and surged. It was natural to wonder whether the network could handle the load. The short answer was yes. 

“Many are wondering if the Internet can handle the strain of rapid traffic growth and increased latency,” David Belson, Senior Director of Internet research and analysis at The Internet Society, wrote in his blog post. “Will it cause a catastrophic failure of the Internet? The answer was: not likely.” 

Operators like Comcast, ATT and Charter Communications, who own a portion of the backbones of the US network, have all recorded shifts in traffic patterns. “Our network is built to sustain maximum capacity during peak usage which is typically in the evenings, so a surge during the day would be well within our capabilities to manage,” according to an executive I spoke to recently from Charter Communications, which completed its network upgrade more than a year ago.

Read more: Measuring CCPA Preparedness of Big Data Companies: Facts and Insights

What Is the Role of Operators in Predicting the Virus Spread? 

We have all checked the iconic red-dotted map of John Hopkins Coronavirus Research Center, often many, many times a day.  It is a real-time report of past data, collected by region, by nation, and in many dimensions. Most of what we see on any given day is based on statistical reporting of key indicators, number of deaths, number of positive cases, recoveries, and the rate of spread. 

But given the complexity and the lack of information, very little of that data has predictive power. Biological threats, technically speaking, behave very differently than other diffusion systems which are regulated by other mechanisms.

Everybody is asking the same question. When will this be over? In March, most US states declared a 2-week quarantine and social distancing mandate – 2 weeks later, it was extended to a month with still no end in sight. Basic statistics show us the past, but not the future.

Telecommunications and network data in conjunction with other human statistics hold great promise in the prediction of the spread. With an average of 1 smartphone for every 2 people on the planet, the digital footprint left by phone calls, online services, and video applications can be gathered, analyzed, and used to monitor, inform and help prevent epidemics. “Data is the new oil — it’s valuable, but if unrefined it cannot really be used. Data needs to be broken down, analyzed for it to have value,” as Mathematician Clive Humby stated. 

What’s the Critical Network Information That’s Needed for an AI and Analytics Model?

The importance of building an AI and Analytics-based model to monitor and predict the virus spread is well recognized, but an accurate description of how humans move across boundaries is not free of challenges. In less developed countries, the lack of reliable data sources makes mobility-centric models hard to design. 

In more developed countries, privacy concerns and government regulations lead to ethical and data privacy concerns. Models are also very dependent on various settings and configurations and hence cannot be generalized from the community to the community.

Last but not least is the potential lack of collaboration from mobile network operators and the massive amount of data to analyze. It is estimated that in the US there are 6 billion calls a day, with an average of 250 calls a month per user. In addition to the average 30GB of data used per user per month, a complete set of call and data records for all citizens can exceed hundreds of petabytes a day in a Big Data lake.  

And even if we aren’t all scientists, we have all observed how much the initial conditions of a model and the accuracy of the parameters can influence the downstream analysis. On one hand, we lack data, and on the other hand, we have too much. There is also a lack of historical data that can be used to train the model. 

What Can Operators Do Now to Build an AI-Based Analytics Model to Understand the Spread Locally and Worldwide?

1. Human Mobility Data

Call Detail Records

Call Detail Records (CDRs) provide per-call information about activities on a mobile communication network. CDRs are recorded for every call, are captured per user, and can provide either single or aggregated views. They typically consist of identity tags, time information, call quality information and approximate location of the communication as the call from the mobile phone has to reach a radio tower for transmission. Despite the fact that antennas do not allow us to identify exactly the location of the user, there are several triangulation techniques that allow better resolution.

We know that a typical user in the US will spend approximately 65 minutes a day on phone calls and make on average 15-20 calls a day. The spatial distribution of the cell towers, which is highly responsible for the accuracy of the data, varies according to geography and traffic.

Cell towers are typically spaced 1-2 miles apart in rural areas and 1000 to 2000 feet apart in densely populated areas. With CDRs, by only extracting key parameters and not the content, a network digital path can be reconstructed. It could be argued that the resolution of the data is quite low (in the US the FCC mandates that 95% of the phones resolve within 900 feet to be able to intervene for emergency reasons). What is really needed should be in the order of a few hundred feet.  

The CDR analysis to input data sets would include:

  • International mobile equipment identity (IMEI) of both calling and called party, international mobile subscriber identity (IMSI) of both calling and called party (both irreversibly encrypted for privacy)
  • Time stamp of call-start and call-end
  • Base station calling party identity and coordinates
  • Mobile phone number of calling party — irreversibly encrypted using hash function
  • Activity type: voice, SMS, data, 3G, 4G, LTE, etc. 

Location Services

Location information, an optional feature on virtually every smartphone, is one of the prime sources of data any company can get on a subscriber, with the goal to personalize each subscriber’s news, list of local restaurants, etc.

Location services are the reason why map applications, like Waze, know where you are and can predict the time it will take to go from your office to your home. It is a tradeoff of convenience versus privacy. A subscriber cannot have both, but knowing how it works and when one should activate will help with privacy concerns.  When enabled by the user, location services will deliver the most accurate location information by using Wi-Fi, GPS, Bluetooth, and cellular networks to determine the user location. 

The power of location services in the context of COVID-19 has been recently demonstrated by Google in a mobility index report (available for 131 countries) that shows how traffic to certain categories, such as retail stores, parks, and transportation hubs, has declined since the beginning of the pandemic. This information does not show data for individual subscribers and it is only collected for subscribers who have opt-ed in for location services.

Dynamic Population Estimation Using Census Data

US Census commuting data can offer a rich data set for a Machine Learning model. The US Census Bureau can collect information on mobility and commuting patterns of individuals and make it available to the public in an anonymized and summarized form. The survey can ask respondents about their primary workplace location.

When information about workers’ residence and workplace locations are coupled, a commuting flow can be generated. The origin-destination flow format defines the connection between communities, including the interchange of people in addition to goods and services. Commuting flows can also help shape the contours of metropolitan and micropolitan statistical areas.

These publicly available data sets can then be browsed and analyzed to identify people who live in a particular country and where they go to work.

QR Codes and Other Self-Reported Applications

In addition to passively collected data from the network, a few countries have now been experimenting with applications using QR codes to track health and distribution of movement.

Though the QR code was invented in Japan, in the early stages of the COVID-19 outbreak, China launched a mobile application enabling people to check whether they are at risk of catching the disease. Individuals are asked to complete a questionnaire with their details and when they are done they receive a color-based QR code that indicates their health status. In exchange, they receive health information and quarantine suggestions. Three colors are used: 

  • Green for general public
  • Yellow for people who have traveled from affected countries
  • Red for positive patients

The QR code can then be scanned for planning and warning information. Data from the application can be used in conjunction with other network statistics for the analysis, and the application can then detect if people are coming in close contact with positive carriers.

The New York Times analyzed one application created in China and found that it sends a person’s location, city and state, and the user’s unique ID. While it’s common for tech companies in China to share the data with other entities, this direct method sets a new precedent and it may not be something that other countries can or want to implement soon. Still, this technology is in its infancy and so far there are reports of several false positives, so it’s hard to rely on the application completely.

In most recent projects, QR code application data is merged with community vulnerability maps (typically free maps that show areas at risk) and with public tools that identify populations that are likely to experience severe consequences from the virus. 

Stanford University on April 1st launched a National Daily Health Survey with the goal to learn and predict which geographical areas will be most impacted based on voluntary information provided daily by the user. 

The giants Google and Apple Inc., in an unprecedented partnership, have unveiled a plan to add technology to their smartphones that will alert users if they have come in contact with a person with COVID-19. As with location services, the subscriber will have to opt-in, but the technology has the potential to monitor about a third of the world’s population.

Technology and network providers like Reliance Jio, the world’s second-largest single-country operator with 350+ million subscribers in India and with the world’s largest mobile data network, can play a key role. They own the telecom network, they manufacture phones, and they offer an extensive digital platform of applications from banking to healthcare to e-commerce.

Read more: Bad Data Handling Is Holding Marketers Back, Survey Reveals

2. Addressing Data Privacy Issues

There is clearly a big concern around data privacy and compromising people’s identity and rights, especially in countries where civil laws are not strong enough. However one has to evaluate the risks and benefits of providing such data. The goal should always remain to provide directional and aggregated data that benefit each individual and the community as a whole while respecting the country’s laws.

Network operators are very aware of privacy issues and subscriber information can be reflected by replacing unique identification numbers using a cryptographic hash algorithm that generates new random numbers, ensuring there is no way to rebuild the original identities of the subscriber. 

Beside CDRs, access to the data in most circumstances requires anonymization to preserve privacy. Because of this, anonymization and encryption software are required to be developed independent of the operator and can be provided to them and to local authorities who, with very limited training, can clean the data.

From the perspective of the operator, any data being released involves commitment and risks, including a branding impact over the release of customer data from customers who worry about social control and loss of privacy. Regardless, operators need to work together and decide if releasing data is warranted during a humanitarian crisis such as this and based on corporate social responsibility.

Mobile network operators are in a unique position. By using a strong AI and Analytics-based model to analyze their vast amount network and user data, they can play a critical role in monitoring and helping to predict the spread of COVID-19 and potentially future biological threats. 

Read more: The Significance of Data Cleansing in Big DataTAGS AIanalyticsbig dataCOVID-19Data Privacymobile operatorsSocial distancingtelecommunications Twitter Facebook Google + linkedinMore


Silvia Veronese

Posted by : Silvia VeroneseDr. Silvia Veronese is the SVP Analytics Research and Customer Success at Guavus (a Thales company). Dr. Silvia Veronese is responsible for the analytics research and successful adoption of Guavus technologies – focusing on information analytics based on large-scale machine learning and advancing the use of data mining to help businesses strengthen IT operations, engineer new products, optimize marketing and deliver real-time personalized content – driving innovation and transformation in the largest big data environments in the world.  Silvia has extensive experience in big data, analytics, algorithmic modeling, security, information management and governance in both enterprise and service provider markets. Prior to Guavus, Silvia held various executive positions at Hewlett Packard and HPE. She was also a co-founder in several startups in data analytics, traffic monitoring, and high-frequency trading. And at IBM, she worked on the team that designed the first parallel computer. Silvia spent the first half of her career in academia as a professor of mathematics at theUniversity of Utah doing research in high-performance computing and non-linear dynamical systems. She has published several articles in the area of mathematical modeling, biomedical engineering, network, and traffic flow analysis. She holds a doctorate in mathematics from the University of Pavia, Italy, an MBA in Management of Technologies from the University of Utah and has completed her post-graduate studies at M.I.T.

Leave a Reply