Data integration, System integration, Extract, transform, load, Business intelligence, Information technology AWS re:Invent 2020: Data transformation for reservoir characterization
Justo sanchez and anan maogi housto brings over 28 years of it experience seven with conocophillips to his role as a data analytics and systems specialist he’s, a computer engineer by degree and balances his time in front of a screen with his hobby of painting, where his preferred Medium is oil on campus anand is a consultant solutions, architect at conoco phillips, where he has spent 17 of his 25 years of it. Experience he’s also proud to lead the asian american employee research group at the company on uses his computer science degree and his experience at conoco phillips to mentor his daughter’s aws deep racer club at school now let’s dive into our topic today, the team’s going to give You some background and then kind of take you through their journey of addressing an important need at conocophillips cousteau. Would you kick us off by telling us a bit about conocophillips and the specific challenge that we’re going to address today good day? Everyone. Thank you, sarah for your introduction, we’re here today to tell a story regarding our current digital transformation. Our company is beginning to leverage the power of the cloud more and more, but as many of you can appreciate, it is not a simple transition. We’Re going to talk about one of our most recent examples, including where we started what we build, but, more importantly, how we were able to accomplish so let’s start talking about who we are conocophillips is a large oil and gas company, and it will soon become the Largest independent exploration and production company in the world, while based in in houston, we have operations and offices all over the globe.
Our talk today involves our canadian business unit. The canadian money asset is an unconventional gas play it’s located in the northeastern british columbia. Western canada includes about 300 000 net acres of land. Our money formation has three to five different horizons that we can target and develop, as you can imagine, there’s a whole pile that can be developed and we’re in the early days. So the important point to take from this slide is that this asset isn’t an appraisal. Therefore, we must try to find out through gathering loss of data, what is exactly there, so we can best develop this area based on how many of those layers we target and how close together we put our wells, we will drill anywhere between 800 and 3000 watts, But the initial 50 wells will provide guidance to the rest. The amount of capital that we have to put into the ground will obviously defer whether we drill 800 or 3 000 wells, and the variance is measured in billions of dollars so it’s something we really have to get right in a sense where, in the early days Of the appraisal and using sensors, we have to try to figure out what we have down there and the best way to develop the entire play on this. We recently finished our first big appraisal pad. There were 14 horizontal wells thrilled. We gathered millions of dollars in data to try to understand what we actually have under the ground and how to develop it.
The problem is that the data comes from seven different vendors there’s, more than four terabytes of data among more than 8 000 files. A lot of them in csv format, but there are so many other formats. These files come to us with multiple location formats, time zones, time and death resolutions etc, and that makes them very difficult to work with. That meant that when our geoscientists first started trying to interpret what this data meant, it was taking over eight hours just to pull together and prepare the data to begin interpreting one stage and there’s over 300 stages to interpret and reinterpret it. Because of this, the analysis and visualization of each data type was completed in isolation. Data observations were done manually outside of any tool. Kit plots were scaled and matched per stage directly into powerpoint. Hence they were, they were unable to do any data analytics or produce any learnings due to the complexity of the data. As you can tell, this was not a viable strategy, so the team reached out for some help to our internal analytics and it teams. In other hand. This is where the canadian business unit was at. We had newly announced that we were a cloud first company. We had recently set up our very first aws cloud environment or data lake, but it meant that we had limited in house cloud skills and experience. Basically, we could imagine the solution that our development team needed, but we had no ability to deliver it, and we were not happy about it.
Excellent thanks, husto for giving us some context about conoco, phillips and setting the stage for your challenge now anan. If we could move over to you, would you talk us through what you built to address this challenge? Thank you. Sarah it’s, a great question. As we heard from usto, there are multiple desperate data sources and formats, so we knew we wanted a solution that can handle massive amounts of data with the simple architecture that architecture would, you know include as many as serverless services, which provides elasticity and also the minimal Maintenance for our infrastructure and our it support teams. We didn’t use any complex services in our architecture. It is simple yet very elegant. We chose s3 as our storage layer to accommodate various stages of our data and aws glue as the data ingestion and transformation and curation, and finally, we have utilized the athena database enabling our bi tools to consume data for analytics. One item i will point out is the lake formation service, which provides a robust data lake security that allowed us to meet our business requirement of table level access and it simplified greatly simplified role. This solution into our bu, we have also consumed other aws services, such as step functions, lambda and sns cloud watch and others in support of solution governance. But i think it came out very well, but, as you can see, these are all the you know various services, but we still had uh.
We still had to figure out how we are going to wrangle these massive amounts of data that makes a solution useful to the business let’s. Look at that so step. One um here we ingest the data into our s3 staging bucket from various source systems and in the step two and three that’s where the glue came in handy and we, we aligned the data and the spatial and the formats and the resolution of the data. And then we also added the context such as you know, which part of the well is being fracked at any given time and then stored. Finally, this transform data into our curated bucket, the step four, which is uh, probably the most important thing we did to optimize. Uh is athena. Query performance was to leverage athena’s data partition features remember. This is one second data from dozens of sensors for 24 by seven for six months. This much amount of data would crush any bi 2.. So we were able to aggregate data letting geoscience efficiently, navigate to specific events. Then, once they start interpretation, they can get the more granular data. We have also enabled performance monitoring metrics, allowing us to tune the the whole workflow in support of the operational support and also the notification. Finally, we have made this solution result summary and the details available via athenadb for enterprise, existing enterprise, bs tools to consume this data. For the for the analytics so remember how this used to take over eight hours now now they can view this relevant data just in seconds.
So we have collaborated with the business customers to seek their feedback during this through 10 week sprint project and we were able to optimize the partition scheme based on their data analysis and consumption patterns. Now we have a solution that helps business, but what about the i.t teams, which is ultimately responsible to manage it? So let’s, look at that. So, as you can see, uh then on your right here. The architecture which talks about the the ci cd so, which is the continuous integration and the continuous delivery very useful principle and the mechanism to be able to deploy, develop and deploy the code. So we have decided to leverage the aws’s ci cd services, along with the cloud formation for the infrastructure as code. So without this you know the developing and deployment could take weeks with this uh features. Now we can even deploy in hours – and you know even days and hours that would drastically include our our capability to you know, enhance the solution and the maintained solution by our it support teams. So it also gives the ability to replicate and deploy the solutions uh to various business cloud environments which to deliver the operational excellence with faster and better software release processes. So here are few few few more considerations that we did when we developed this cicd mechanism. Is you know we have uh employed, a mechanism which is using the uh? The class account cloud deployer roles which allowed us to for the developer to give freedom in in the development access to build his resources and deploy, as we wish, through the approval processes? And we have also identified a a common set of tags that are required to be incorporated with every resource, but that we deployed.
So these tags would be useful for our cost analysis of each of the resources and also in the long run, we’ll be able to identify if we have, if we have to do a security mechanism or security architecture based on the tag values. We have also built templates uh from this project, which is the the cloud formation templates to deploy uh solution repository and the ci cd pipelines. So we can quickly improve this workflow and repurpose this solution for other business solutions, because, as we use the templates we it lets us make, you know simple changes to the template and we can sim. We can reuse this uh, the mechanisms and the code behind it. For other other project purposes, so basically, this architecture will allow us to scale thanks almond, very helpful description of what you built housto. Could you help us understand your approach on how you achieved what you built? Who was involved? How did you structure the development and then, where are you at today? We started with hosting a workshop that included cop business experts, i.t members, advanced analytics and aws teams. We define the problem and discuss current processes and various data sources and quality of the data to assist in considering aws existing services. As part of the solution, architecture design, we agreed upon the scope of the solution and we came up with an initial reference architecture. Basically, a first best guess at what architecture will look like, and it was something to build upon.
We engaged then, on a 10 week intensive sprint, agile project. This project had a remote collaboration approach with amazon professional services involving people from different locations in in the u.s and canada. We agreed on shooting for a proof of concept ready to use product in a test environment. We also used agile methodology with chrome framework and we reviewed periodically with our analytics innovation center of excellent and our cloud advanced teams to come up with final architecture that you that you saw before and remember. We didn’t ask aws professional services just to build a product, but we wanted to get involved and quickly build our own skills. Everyone on prem date, architects, developers, data integrators, participated in both planning meetings and daily springs that involved knowledge transfer. Following this method, we arrived in a with a 10 week period with a minimum viable product in test and way more skills, competencies, comfort and confidence than when we started now. Let’S take a look at what we built all right, as this is not a geoscientist class. What you see here is an auto picasso generator just kidding. This is a report, an actual report created by a geoscientist using our tool which will show up in seconds this integrates acoustic temperature, strain pressure, data from different vendors time formats, etc. Everything we have mentioned before in just one report. This is now allowing our geoscientists to do their model and understand the direction of the frags and understand how far they they went from the world war and how effective the penetration was in time now.
This is just not raw data, but qaqc data and calculation based on them. So we can make interpretations like this. We have all these huge amounts of data pulled together, stored and managed on a performant platform and we’ve got visualizations that can work in seconds created with our existing bi tools. So we’re really happy now great to hear about the results in progress very impressive, now anan. Since you’ve laid the foundation with these improvements, where do you go from here? Sarah, thank you for the question. I know it’s important to understand where we were and where we are going next, so it is not a project that is in a one time project. So we want to take this project outcomes and uh. You know deploy into the production. I know, aws professional services helped us deploy into a test environment. Now we are working uh with our canadian and houston teams to establish the support for this solution and we are in the process of deploying the solution into into production. Next thing is, we are also begin sharing our knowledge, so we, as this is a solution for the unconventional assets just in canada, but not only that we have operations in several countries and within the u.s. So we are going to share this learning and maybe take some of the existing solutions that we have deployed and consider whether this solution architecture could could be leveraged for this further business problems. Then also, we have developed the best practices for the ci cd and also the foundational architecture and which we want to.
We. We are in the process of you, know, implementing some of the outcomes of this project and into our other business units as well as in the enterprise. You know solutions, so what we are building is means what what we’re building is just for this purpose. We know there are other solutions that can leverage uh this architecture. Basically, this tool allows our geoscientists to visual the interpretations, but you know, unlike they have done, we may be able to automate the interpretation itself, because now we have the data that we need. So now they can spend more time and uh and the end analysis rather than in the in the data preparations. So we have also enabled a mechanism to gather the drilling and the completions data in real time into into our cloud platform, enabling any interpretations and improve the performance for the frac operations. This architecture also provides machine learning capabilities for any future anomaly detection and correlation analysis. I know we have been discussing about uh the solution and architecture and how successfully we were able to implement, but i would like to make sure that all of the audience can take. You know a few important points from our story, so here three things are there. So there’s the big data we have built it fast and we have a broad bright feature here. So as as we were discussing earlier, the amount of data is massive amounts of data. Now gone are the day long days update data preparation.
I know the data geoscientists were spending a lot of time in the in the in the data preparation. Now the data is prepared. You know in automatically using our cloud platform with all the other architecture in place and they can spend more time doing the interpretation and also any automating some of those work. Also in gestrix we not only build a solution. We also build the tools and skills to sustain it. That will help us build other solutions. Almost everything we built here can be shared and adopted globally. Now we can spend more time in doing the experimentation. We can do a lot of data science work. We can fail fast. We can try things that we didn’t have opportunity to try earlier now. We can take the data and make into very useful information for the data for the data analytics, not only that i don’t think any of these possible without the vision of our executive leadership team and their declaration that we are a cloud first company. Thank you very much. We are we. We appreciate aws support uh throughout this project. Thank you. Thank you. Anan sounds like it will be an exciting next steps. I’M. Looking forward to seeing you all continue this development houston on. Thank you for sharing your story of data transformation for reservoir characterization at conoco phillips. I hope others can learn from your experience and apply to their own operations, and thank you all for listening to this aws energy reinvent session.
Please be sure to check out our re.