Imagine a library where the books are written in a language so old, few alive can speak it; and the pages crumble to dust when turned. In data terms, this is the state of the UK oil industry’s 50+ years of exploration archives. Elaine Maslin reports on one project aiming to fix it.
Image from iStock.
The UK North Sea industry formed the Common Data Access (CDA) organization over 20 years ago to host subsurface data from the basin and share it within the industry.
Over the years, CDA has amassed an enormous amount of documentation, some of it dating back 50 years, and describing more than 11,000 wells and 2000 seismic surveys. The archives hold some 12 million well logs and half a million different reports.
But, as the industry looked for ways to unlock more value from the North Sea, which is viewed as a mature basin, with untapped potential, some looked to CDA’s data, particularly a category of it known as unstructured data, for inspiration. Unstructured data is information currently not stored in a way that makes it easy to read, or for computers to analyze or use.
While the time it would take to filter through it manually would perhaps be longer than the oil industry has on this planet, it was thought that a growing fleet of data science firms, with the latest software tools might have more luck.
Some 3.6 terabytes of data was offered to data science firms worldwide to see what they could do under the Unstructured Data Challenge. Nine companies participated and some of the results were presented in Aberdeen late 2016.
Three things were clear. First, a lot of this data would be more useful had it been submitted in a more structured and uniform way in the first place. Each operator has its own well report style, different terminologies, different structure forms, some were hand written, or scanned as images, rather than as text, etc. Second, there are plenty of tools out there to deal with these issues and for the issues that cannot be dealt with, new tools will come. Third, overcoming the first two could unlock a lot of potential.
Indeed, one firm goes so far as to suggest that by mining the data you already have could prove more fruitful than that data derived from drilling a new well.
By looking at relationships in the data, useful information can be extracted, says Ed Evans, ex-BG Group and Halliburton and co-founder (2004) and managing director of New Digital Business (NDB).
For a Norwegian client, the firm looked at available wells data for oil and gas shows information adjacent to currently producing fields. NDB ran a project and linked shows to GIS data, partnering with Geofabryka, a firm which crunches spatial or GIS data. The results, screening 50 wells per day, were similar to the company’s manual search, which had taken two months.
For the Unstructured Data Challenge, NDB ran a similar project, looking at 100 wells across the Mid North Sea High. Instead of a manual process, which could see only five wells a day assessed, NDB’s system screened 50 wells a day. It plotted non-productive time on a map and assessed the data to see if the formation or drilling operations were the cause. “Thirty percent of drilling costs are non-productive time. That’s valuable knowledge,” Evans says.
It sounds easy, but, the devil is in the detail. Dave Camden, who founded Flare Solutions in 1998, looked for formation analogs in the data. The key, he says, is organizing data so that it’s easier to search.
Camden used open source software and his own tools to process the text in CDA’s documents, to look at the frequency with which different terms were found together: “you can know a word by the company it keeps,” he says.
The job isn’t made easy by each oil and gas company using different names and acronyms to describe the same things. Nonetheless, Camden analyzed the text of 25,000 reports to build a set of language fingerprints for geological concepts that could be used to work out how similar one formation is to another, filtering on lithology, formation age, and other geological terms.
There’s more work to be done, he says, and future steps include moving his system from a traditional database into a graph engine – better to explore the potential of machine learning and natural language processing tools for use in the subsurface, and in other document-rich parts of the industry.
Going to the movies
Hampton Data Services (HDS) also had issues with the data, and took a different approach. Simon Fisher, Data Management Application Product Manager at HDS, says document titles and sub-titles often don’t truly indicate what is in the data, as documents are copied so much. HDS focused on well logs, curves, and the other images in the documentation, working with Zorroa, a Californian firm, which usually does work in the film industry.
Images were classified using neural network systems – a variety of deep learning – helping businesses understand what valuable geological data may be hidden within the body of a standard report. “This was reasonably successful but the next step is putting in bigger data sets as training exercises,” Fisher says. The more images in the system, the better it can identify them.
Meanwhile, Colin Dawson, Regional Manager, Europe & Africa at Independent Data Services (IDS), put two Robert Gordon University students to work using open source tools to look at information relating to stuck pipe, shallow gas and formations associated with drill bit wear, combining data from CDA and the Norwegian Petroleum Directorate for a full North Sea view.
To mine the data, the students used Elastic Search, an open source text analytics tool that includes Log Stash, to add structure to the data, and Kibana, for visualization. If the data wasn’t machine readable, i.e. a scanned image in a PDF, Tesseract, an OCR tool, was used.
The students found 778 well documents mentioning stuck pipe, which were then geo-referenced, and displayed on a map to give a fresh insight into the knowledge available on drilling hazards for use in well planning.
AGR carried out a similar project using its own iQx software to tackle the CDA data, looking specifically at final well reports, many of which were handwritten with no consistent structure. This meant it was important to distinguish between what in the report was of value and what was added just to comply with the regulator, says AGR’s Håkon Snøtun.
He also stressed that context is important in analyzing data, including knowing what you don’t know, as well as what you do. Håkon referred to a World War II project that looked to have fighter planes reinforced based on data about where the planes that made it back had most often been hit, until someone pointed out that the bit to strengthen was actually where they had no data – from the planes that didn’t return.
Supercomputer firm Cray focused on finding information about the Palaeozoic and non-productive time. The firm concentrated on establishing an analytics pipeline – that combination of the right people and right tools – suited to working with subsurface data, and then applying that pipeline on the entire CDA data set. Their initial results, presented for analysis using Jupyter notebooks, highlighted the difficulties in extracting information from old oil and gas documents in non-standard formats, but set the scene for their next analytical step: applying their graph engine to delve into the detailed relationships between the data.
As the CDA data comes from many different companies, the first step in using it, says Paul Coles of Schlumberger, is to harmonize it – addressing gaps, and applying consistent names, units, and labels so that it can be worked with as a single data asset. He applied machine learning techniques to classify well curves, apply the right units of measure, and join curves together to build a single geological model out of an organized, but highly unstructured raw data set. Coles supplemented his model with geological tops from the Oil and Gas Authority and stratigraphic information from cuttings reports.
“Adding existing, structured data helps the process and complements what is already there,” he says. Then that’s the hard work done. “[There’s] no shortage of tools you can use for analysis,” he says.
Schlumberger tested the approach on 46 wells on the Piper field, generating an automated petrophysical model for the field in just six hours. This would take six weeks manually by a petrophysicist, Coles says. A cloud platform was used to handle the data, as it can scale from a few wells to all of the UK Continental Shelf (UKCS). The idea is to create an evergreen model, maturing through machine learning and automated interpretation, he says.
Maria Mackey, energy sector lead, EMEA-APAC, at Cray, says over half of the UKCS documents provided were stored as images, and must be processed to extract the text – a potentially long winded and expensive process.
“The analysis part is the least time consuming part of the exercise,” she says. “It’s the gather, understand, parse, OCR, clean and organize that takes the longest.”
Documents that can’t yet be analyzed by computers have “locked in potential,” one delegate said. “The biggest question is, have we got the time and money to invest in fixing this? It all comes down to ROI (return on investment).”
However, Camden points out that the machines will quickly catch up. If a human is able to look at a blurry scanned image today and interpret it, machines will soon be able to do this, too.
Indeed, according to one delegate, “data science is the new oil.” If one approach doesn’t work, firms shouldn’t give up, she said. “There’s a lot more potential.”
Malcolm Fleming, CEO of CDA, said: “The challenge is, in the future, how we use these tools to improve subsea and subsurface data. What we do in the data area directly impacts the business. We can reduce costs, add value, improve quality and automate.”
But, one of the challenges around data science is less about technology or even the data itself. It’s a corporate challenge: because information management isn’t always directly linked to the business it serves, its value isn’t often appreciated. This could be changing.