Narrating data

Elaine Maslin

October 2, 2014

Software that can turn your oilfield data into readable reports is coming to an oilfield near you. Elaine Maslin found out how this technology could help create articulate oilfields.

Left: Dr. Robert Dale
Right: Professor Ehud Reiter

When people talk about visualizing data it usually refers to how data is displayed visually, on screens, in infographics, and perhaps in 3D.

The idea is to help an engineer see the data more clearly and quickly, in order to carry out analysis, or make decisions.

As more and more data is generated from the oilfield, from electric subsea Xmas trees, pipeline or mooring integrity monitoring systems, rotating equipment monitoring, environmental data, downhole pressure and temperature gauges, hydrocarbons streams, and so on, the need to not only gather, but also collate, analyze and make decisions based on the data also increases.

Data mining companies are already helping to analyze this data, looking for trends. But what if software could be used to collate, analyze, and then also present, in seconds, reports in narrative format, tailored to a specific audience, based on the data and analysis (work, which would take a human hours)?

Technology to do this has been developed, over three decades, and is now being used by an operator in the US Gulf of Mexico. Its origins is in natural language generation (NLG), a subfield of artificial intelligence. Unlike natural language understanding (NLU), which takes language and turns it into data, NLG takes data and turns it into language. NLU, as a research area, started in the 1960s. NLG then developed in the 1980s. Professor Ehud Reiter and Dr Robert Dale have been involved from the start, from when they were both researching the field at their respective universities, Harvard and Edinburgh as PhD students in the 1980s, before joining forces in the 1990s.

“That is when we started looking at how to take machines and produce language. There was very little interest in the problem at that point,” says Dale, now chief strategy scientist and chief technology officer, at Arria. At the time, an early NLG engine was developed to create weather reports based on meteorological data collected by students.

In 2008-9, Data2Text, a University of Aberdeen spin-out company, led by Reiter, was launched. In 2012, Arria bought 20% on the firm, before taking it over completely in late 2013. Now, the Arria NLG engine is used to write 5000 weather reports a day across the UK for the Met Office, where previously the company only created 60.

“The fundamental goal of the technology is to take data and turn it into text, or voice,” Dale says. “It involves a two-step process. First, the data, such as raw sensor data, is turned into information (through reasoning), and then the information is turned into written text or narrative (communication). In the first step, the engine does analysis to identify patterns and trends and turn that into information. For example, if a piece of equipment stops working, it will look at why that is happening and what other machines are around that, to determine what is happening. The information is then turned into text to tell a story.” Both the reasoning and communication require knowledge “as a fuel” to enable it to interpret and present the data and information. “What significance is a particular sensor sparking a certain alert going to have and at the same time as another sensor going off? This is the kind of knowledge, gained from subject matter experts that the software embodies.”

The Arria “engine” Image from Arria. 

For the oil and gas industry, the firm has started out providing its technology for discreet equipment areas, specifically, an exception-based alert system on rotating equipment on a platform in the Gulf of Mexico. When an alert indicates a temperature or movement threshold has been breached, the NLG system kicks into action. It has 77.6 million sensor points that could be relevant, which it assesses, analyzes and then feeds into a 500 word report, describing what is happening, and why it has come to this summary, all in 60-90 seconds. “Normally, that could take the relevant expert 2-3 hours,” Dale says.

The processing power is based on a standard Intel desktop computer. The engine knows how to analyze the relevant data, including associated machinery, and how to understand what information is important and reportable. It knows how to put together a story to explain the data, emphasizing what is important. It knows how to package up information into sentences of the right size, and it knows the rules of grammar and the right terms to use.

Further applications are planned in the Gulf of Mexico context and ultimately Arria sees a scenario when Arria NLG would be used not just on particular pieces of equipment, but across platforms as a whole, enabling any level of report to be produced, from specific equipment analysis, to a performance summary for the entire platform, each written for a specific audience, at the touch of a button.

“Anywhere where there is a lot of data and people are struggling to deal with that data is where this technology could be useful,” Dale says. “At the moment we are doing some work looking at electrical submersible pumps, and drilling reports is another area people seem interested in. We are starting with components, but you could imagine how you could aggregate that information, then look at chains of equipment and then the entire platform, correlating and integrating that information for a complete report of the system, creating an articulate oil and gas field.”

While it might sound relatively simple, the research to get the engine to where it is has taken years, drawing on technologies developed in artificial intelligence, data analytics, and natural language processing, and has involved a number of blind alleys. For example, in an early explorations of the technology, it was thought that a template could be used into which the data is inserted to create the written report. But, when perhaps not all the data expected was available, the report would be left with gaps. The commercial version has systems that detect what information is available, and also what is the most relevant information that needs to be presented, and then produces the report, organizing the presentation of the material appropriately.

“The holy grail of this space is being able to use machine learning to automatically lean how to tell a story based on data,” says Dale, using data, reports, and statistical techniques to look at correlations between stored data and textual content. But that is 10-15 years away, he says. An element of machine learning, is used by Arria, but the basis of the technology is on telling the system how to interpret the data it is given, to turn it into information and then from information into text.

Gaining and incorporating the knowledge from the subject matter experts also sounds like a lengthy process, but, Dale says, using corpus analysis, a type of linguistics methodology, existing, human-authored reports, can be scanned and “reverse engineered” to aid the process. In fact, this process can reveal tacit knowledge the subject matter does not think to reveal, perhaps because they think it is “obvious,” making it a valuable part of the process. The application already has general knowledge embedded about language – it just requires any specific language, pertinent to the application, adding any specific terminology or linguistics required to suit the application.

So what are the safety safeguards? Dale is keen to point out that the human is still a crucial element in such a system, when it comes to mission critical applications. The report makes a recommendation about what an action should be. The human still needs to safeguard the right action is taken. “If it is a mission critical situation it is important to have a human in the loop,” he says. “The reports are produced for the human to decide what to do.”

“Another question that comes up is ‘isn’t it better to have just graphs and charts.’ To some extent it is horses for courses, but graphs can become unwieldy and to someone not used to those graphs and charts will just see a collection of graphs and charts,” he says.

While the technology has been in development for 20-30 years, it is only now, as data is becoming ever vaster, that the NLG engine will come into its own, Dale suggests. “At the time (we produced the first weather forecasting engine) there simply wasn’t a lot of data around and it wasn’t economically viable to automate report production. Fast forward and now the situation has completely changed and the technology has commercial benefit. The amount of data you have to deal with has a bearing on what is achievable. More and more data, at ever-finer granularity, is emerging every day. The more finely grained data gets, the more operations have to be performed to get from data to information. And there is no sight of that changing. The challenge for us is to scale our techniques to deal with this.”

Click Here to see a video of Dr Robert Dale discuss Arria NLG at SPE in Dubai.