Data: An Achilles’ heel in the grid?

What would happen if artificial intelligence systems controlling electricity distribution made decisions based on manipulated data? For that matter, what if humans did the same? Michael Shalyt considers the implications

Google’s artificial intelligence subsidiary DeepMind announced recently that it’s in preliminary talks with National Grid, the company that owns and operates much of the UK’s energy infrastructure.

The idea is that National Grid would leverage DeepMind’s technology to balance energy supply and demand nationwide, much as DeepMind has already done for Google’s mega data centres.

The stated goal is to cut national energy consumption by some 10 per cent without investing in new infrastructure – just by managing distribution more intelligently.

This is an outstanding idea, backed up by DeepMind’s outstanding technology. However, given the recent rise in cyberattacks on critical infrastructure – including a December 2016 attack on a power plant in Kiev, Ukraine that left tens of thousands without power – security-minded energy stakeholders need to ask some tough questions about the data on which both AI systems and humans base their mission-critical decisions.

AI: garbage in, garbage out?

The Microsoft Tay Twitterbot meltdown is still fresh in everyone’s mind: last year Microsoft launched an artificial intelligence bot named Tay that was aimed at 18- to 24-year-olds and was designed to improve the firm’s understanding of conversational language among young people online. But within hours of it going live, Twitter users took advantage of flaws in Tay’s algorithm that meant the AI chatbot responded to certain questions with racist answers.

Now the idea of AI directly managing critical infrastructure understandably raises eyebrows. The power of AI is driven to a large extent by its ability to learn from and adapt to its surroundings – based on both past examples and present states. This means that despite amazing advancements in recent years, AI to a large extent still adheres to the age-old computing axiom “garbage in, garbage out.” In other words, the end product can only be as good as the input.

Now, when the input to an AI-driven Twitterbot is racist or misogynistic babble, you get a harmless yet sociopathic Twitterbot. However, what happens if the “garbage in” is crucial capacity information regarding electricity transmission lines, the production capability of a power plant, or the internal temperature of turbine? What happens, in other words, when “garbage in” doesn’t result in digital embarrassment, but rather widespread havoc in a country’s infrastructure or even loss of life?

Essentially, the Tay debacle and the possibility of AI-driven decision-making in the National Grid bring what many consider the Achilles’ heel of data-driven decision-making to the forefront of public debate. The question that needs to be asked is jarringly fundamental: can we rely on the integrity of data on which critical infrastructure decisions are based?

Easily manipulated, hard to trace

Decision-making – by computers or humans – is always based on data. And both humans and computers can make poor decisions based on poor data. Bad data exists, and must be dealt with. But what does it look like when bad data is purposely and maliciously created?

Let’s consider a simple illustration of how data can be manipulated in the power grid, and a possible outcome.

If you’re a state-sponsored hacker targeting another state’s infrastructure, you’ve got two overriding concerns: 1) doing whatever you’ve set out to do, and 2) not getting caught. For the sake of argument, let’s say that what you want to do is cause an explosion at a power plant.

Without going into specifics, we all understand that your average power plant is highly secure. It’s no simple matter to get in physically, nor is it simple to breach the network. However, as security professionals, our operating assumption is that every network is breachable given sufficient time and resources.

Once our hacker is in the network, there are a multitude of ways they could go about causing an explosion. But to avoid getting caught, they’re going to look for a way that’s as forensically untraceable as possible. They’re not going to wrest control from the operators, cut power to the SIEM systems, or flood the executive offices with sewage. This is far too overt. What they might want to do, by way of example, is find and tamper with the data stream from a simple wired or IoT temperature or vibration sensor deep inside one of the turbines.

It doesn’t take much imagination to figure out what could happen if the ICS or SCADA system controlling the turbine thinks that the internal temperature or vibration level is significantly lower than it actually is. Decision-making in the control room – human or computer – would continue as normal. Yet the manipulated data could lead to disastrous results, and it would be hard to determine ex post facto whether the disaster had been due to attack, human error or malfunction.

Multiply the sensors in this single turbine by an entire national power grid, and you can understand the potential scope of the data forgery conundrum. Every single element in this incredibly complex and dynamic system could be at risk. When facing the management of a critical infrastructure system as sensitive as the power grid, and lacking validated data health, how can we trust anyone, let alone an AI system, to make the right decisions? And if it was an AI system, what would stop a sophisticated hacker from ‘teaching’ it to create blackouts – much as Tay the sociopathic Twitterbot was taught – based on fake data?

Intelligent data validation

Whether or not AI technology is ultimately integrated in the management of the power grid, the risks of data forgery demand a vigorous reassessment of data health, validation and integrity.

High-level hackers understand that current safety standards and sensor fault detection mechanisms cannot detect forged sensor data – in power plants, critical infrastructure facilities, and even industrial control systems. They know that it’s fairly simple to mislead control systems, mask the actual state of physical systems and leave the control room operationally blind – all by falsifying sensor data.

With or without AI in the decision-making loop, bogus data needs to be exposed in order to avoid not just physical damage but actual danger to the public at large. The paradigm shift required to do this is surprisingly easy to implement organizationally, and the technology to do so is already on the market.

Essentially, we’re talking about a data polygraph. Like the law enforcement version, this data polygraph enables operational teams to discover the truth – to verify the integrity of the data they receive. Using advanced algorithms, these systems can identify and track the unique signal ‘fingerprints’ of each sensor.

These fingerprints exist, and are manifested within the exact fluctuations of reported signals, the physical ‘noise’, and the unique system behavior within and between modes of operation. Once a system fingerprint baseline exists for each sensor, deviations can be characterized and investigated, and the ‘truth’ of a given dataset can be reasonably determined.

Whether or not we turn over control of critical infrastructure to AI systems, as energy stakeholders we need to first and foremost ensure that we have a rock-solid basis for decision-making as a whole.

Michael Shalyt is Chief Executive of Aperio Systems, a cybersecurity company specializing in security and resilience for industrial control systems.

No posts to display