Now, the museum is turning its attention to the future, unveiling its new gardens in July as a centre piece of the Urban Nature Project. The project is the NHM’s response to a growing need to monitor and record changes to UK urban nature and support its recovery in the face of challenges such as pollution and land development.
The NHM intends for the five-acre site surrounding its South Kensington home to become one of the most intensively studied urban nature sites in the world. For this, the museum is working with Amazon Web Services (AWS) to provide the back-end technology for this work.
The NHM and AWS have built a new Data Ecosystem cloud platform, which will be used to collect and share biodiversity data from various sources. The gardens will also host a network of 25 scientific sensors, which will gather environmental and acoustic data from around the site, including pond sounds, bird calls and traffic noise.
The biological recording will cover various research streams, including visual wildlife observations, extracting DNA from soil and pond samples, and audio recordings.
The AWS technologies are essentially bringing all those different data types into one place. The NHM is using Amazon DocumentDB and Amazon S3 for data storage, plus the AWS Glue serverless data integration service for ingesting data into the central databases.
Each different data type has its own microservice, so there is one for environmental DNA, one for audio and one for visual observations. Each of those microservices have their own underlying Document DB and S3.
“Glue is the AWS product for moving data. Every time you want to create a new set of data and move it from A to B, that requires something to be developed to move it,” said NHM’s data ecosystem product manager, Jason Hale. “If we wanted to combine environmental DNA with acoustic data, we can use a Glue job to read those two separate data sources and bring them into the same place.”
There is a front-end sitting on top of these microservices and Glue layer, which researchers can use to access the data.
The museum will not be relying solely on new data for this project. The NHM has been monitoring wildlife in the garden since 1994 and has recorded more than 50,000 visual observations of wildlife, where a species has been seen and accurately identified. That data is ready to import from the existing iRecord software platform straight into the Data Ecosystem. The museum can continue to use this biological monitoring software to record future observations, which will be drawn straight through to the Data Ecosystem.
A move for change
The idea behind this project is that it is looking at real-world challenges and solutions, rather than serving as an academic piece of work.
“We know there’s an ever-increasing, pressing need to understand the nature around us, and how and why it’s changing. We know that we need to use different methods to understand which species we have in this garden, with 3,500 species just outside our door,” said John Tweddle, head of the Angela Marmont Centre for UK Biodiversity at NHM.
“The visual observations, the environmental DNA [eDNA], the acoustic biology, the environmental data we can gather, which will go into the Data Ecosystem, they all need to come together to build that holistic picture of what’s there and how it’s changing.”
Once the NHM has gathered this information, it can start to explore the why – if things are improving, is the way the landscape is managed or the city is designed helping that improvement? If things are declining, what can be done to slow that decline and hopefully reverse it?
Getting to this stage requires a huge amount of data from different sources, but without shared tools to aggregate the data and produce biodiversity metrics from across disparate data types, it is difficult to get useful insights.
“Until this partnership, the biodiversity sector would have different people specialising in different things. Our data would sit in different places, maybe on different cloud servers, maybe on our laptops. It wouldn’t be combined. It would be in formats that we can interpret ourselves as individuals, but it’d make it harder to share,” Tweddle added.
The Data Ecosystem makes it possible to collate and analyse all that data from those different sources to understand what actions to take.
“That’s where the real benefits are from this. We can use this to develop easy ways to capture data, share it and interpret it that many other people can use,” Tweddle continued.
This could include people with their own land holdings, businesses, conservation charities, local community groups or park managers.
“We are working with them to look at how they can collect this data, share it with our system, and how we can help them interpret it. There’s so much potential, but you have to have that technical infrastructure with the expertise to put it together underneath it,” said Tweddle.
Now the gardens are open, the NHM is ready to throw huge volumes of data at the system and check everything holds. The next step is to start looking at putting an interpretive layer on top of the system to enable users to put their data in and have some interpretation come back out.
The NHM is using the AWS SageMaker machine learning platform to enable that functionality, with the data product microservice acting as the backend, feeding through the data.
“We’re building a system where SageMaker can connect to that data product service. That enables us to have a physical separation between the raw data that gets collected, so the initial data that gets brought in, versus the data that we want users to use for their research,” said Hale. “We have that process that takes that raw data into the data products and that can then be accessed by SageMaker.”
The NHM is expecting to generate reams of data from its new system, about 20 terabytes in the first year, the majority coming from audio recordings. The 25 sensors throughout the museum gardens will be recording audio on a continuous basis and writing it to the Data Ecosystem.
While currently, the system is restricted to internal researchers within the NHM datacentre, longer term, the NHM plans to share data with the wider biodiversity sector. This sector has gone from being data poor (and lacking enough information about nature to analyse events and take action) to being data flooded – but this brings its own challenges, according to Tweddle.
John Tweddle, Angela Marmont Centre for UK Biodiversity at NHM
“We’re at the point now – with acoustic sensors, environmental DNA, visual recording – that we’re close to being data flooded. Then it’s the big questions around how do we handle, condense and combine that data, and what does and doesn’t work when you try to combine data types. This infrastructure with the Data Ecosystem gives us the opportunity to really delve into that,” he added.
The NHM hopes the technology available as part of the project will inspire more people to connect with nature. The UK has around 70,000 to 90,000 volunteers who visually observe wildlife in terms of acoustics and eDNA, but Hale estimates it is probably fewer than 500 people who are active researchers.
“Once these technologies become more accessible, you can imagine the amount of data that’s going to be generated. You can really picture the potential for impact with a community group being able to record audio on their phone and upload it to a system that tells them immediately what bird was in the recording. Or if you’ve got a school taking a pond sample and they can detect the biodiversity on their site.”
The ultimate goal of the Urban Nature Project is to “give people across the UK the motivation and tools to safeguard nature in towns and cities”. As the UK is one of the most nature-depleted countries, hopefully the combination of the NHM’s wealth of data and scientific expertise along with AWS’s technology will see a turnaround in restoring our wild spaces.