Making global satellite imagery cloud-free
Published Jan 25, 2022 Updated Feb 04, 2022
Our technical team has created a beautiful new cloudless image of the world. This is the story about how and why we did it.
Getting a clear view
Have you ever wondered why images of the world rarely look like this on online mapping websites?
It would be pretty hard to find a location that didn’t have cloud cover, right? In reality, almost 75% of it is constantly covered by clouds, making the image a very accurate depiction. However, unless you are studying the weather, you just want to look at the land surface, so how do we clear away the cloud to see the land below?
Perhaps you are asking yourself, “It can’t be that hard; just choose the images that don’t contain any clouds? “However, the process isn’t as easy as it first seems!
If you want to choose an image that doesn’t contain any clouds for a location like Bernes Alpes in Switzerland because you love the mountains, just like we do at MapTiler, you will have to go through dozens of images to find one. There are places on Earth that are almost constantly covered by clouds, so it simply isn’t possible to do this manually.
A problem like this requires an automatic solution, and our technicians went down this route to create the Satellite data you see in our services. Again though, it is not quite so simple, and to ensure you get a good result instead of a bad or average one, you need some key ingredients in your recipe:
- An excellent data source
- A refined and honed algorithm
- Huge computational power
Without this last ingredient, you can run into some very tricky time issues when dealing with huge global datasets. If you only had access to a desktop PC it would have to run for 4512 days which is almost 12.5 years; for storage, you would need 360 computers with 500 GB SSDs. If you used HDDs, the processing time would increase to 18.5 years. All this means that by the time you have finished creating the map, it will be nearly 19 years out of date and not much use to anyone!
As this is such a difficult undertaking, why bother when others have already made their own cloudless layers? Let’s go back to the point about the quality of your results; look at the image below from Google maps. This is a positive result in the sense that it is cloudless, but it is not what we at MapTiler would call a good result. The colors are poor and there are clear boundaries where images from different times of year, even different years, have been stitched together.
The patchwork effect often seen on Google maps compared to the natural colors achieved by MapTiler.
At MapTiler we wanted to do better and set a goal to bring the most beautiful cloudless satellite map of the entire world to our customers with the help of our cutting-edge technology in a reasonable amount of time, just 1 year. (1800% faster than it would take on a desktop!) On top of this progress, we aimed to make the map available to anybody for just a few dollars per month, more about that later.
Satellite imagery from Sentinel 2
When you want to create a global satellite map our approach was:
- Look for the best data economically available, for free if possible.
- Find data with good spatial and temporal resolutions.
- Spatial resolution for the right amount of detail.
- Temporal resolution to get as many images of the same place in a short amount of time. The higher the revisit time of the satellites, the better your chance of finding images that do not contain clouds.
- Find data with good spatial coverage, ones that cover the entire globe.
This is where data from the Sentinel-2 mission from European Space Agency’s (ESA) Copernicus project perfectly fits in. The revisit time is only 5 days, so every 5 days you get a new image for the same spot on Earth.
The resolution is also pretty cool, 10m/px and the data are in JPEG 2000 format.
Sentinel 2 provides plenty of data bands thanks to its Multispectral Imager (MSI). The satellites acquire data in 13 spectral bands (from the visible to the short-wave infrared)
|Bands||Central Wavelength (µm)||Resolution (m)|
|Band 1 - Coastal aerosol||0.443||60|
|Band 2 - Blue||0.490||10|
|Band 3 - Green||0.560||10|
|Band 4 - Red||0.665||10|
|Band 5 - Vegetation Red Edge||0.705||20|
|Band 6 - Vegetation Red Edge||0.740||20|
|Band 7 - Vegetation Red Edge||0.783||20|
|Band 8 - NIR||0.842||10|
|Band 8A - Vegetation Red Edge||0.865||20|
|Band 9 - Water vapor||0.945||60|
|Band 10 - SWIR - Cirrus||1.375||60|
|Band 11 - SWIR||1.610||20|
|Band 12 - SWIR||2.190||20|
Finally, the coverage is excellent, with all continental land surfaces, islands greater than 100 km2, and coastal and waters up to at least 20 km from the shore.
The Sentinel-2 mission details
The Sentinel 2 satellites were sent up as part of the Copernicus Project, with the goal to acquire high-resolution (both temporal and spatial) satellite images of the global surface, to help with monitoring of land-use change, landcover changes, agriculture, forest, and water changes. The mission provides data for all land surfaces, large islands, inland, and coastal waters.
The mission consists of 2 spacecraft, Sentinel-2A which was launched on 23 June 2015 with an orbiting period of 10 days. On 7 March in 2017, the Sentinel-2B was launched with the same orbiting time of 10 days. In combination, these satellites provide 5 days revisit time. The nominal mission time is 7 years for each satellite.
Sentinel-2 data portals
The European Commission has funded the deployment of five cloud-based platforms to distribute the data produced by the satellites. These platforms are known as the DIAS, or Data and Information Access Services.
The five DIAS online platforms allow users to discover, manipulate, process and download Copernicus data and information. All DIAS platforms provide access to Copernicus Sentinel data, as well as to the information products from the six operational services.
For browsing the catalog, we had the best experience with Sobloo. But we did not have a good experience accessing the data for processing from these DIAS services.
We used the Sentinel-Hub, which managed to store all the Sentinel-2 L2A data on the AWS S3 bucket. L2A means they are atmospherically corrected by ESA. Thanks to the Sentinel-Hub python library and well-documented API we could start with processing the data quickly.
Removing the clouds: MapTiler does it differently
To remove clouds from a location in the data you need to start with a time series of images for a location.
Once you have a time series, there are a couple of approaches to removing the clouds of the clouds from the image. The most popular one used by other companies is based on pixel compositing. In this method, you select a specific pixel from a series of pixels that come from different images, based on some statistics, like getting the first quartile pixel from the set of pixels. This method has the benefit of giving you a real pixel, but there are drawbacks.
We didn’t use this approach at MapTiler because it didn’t work well in different locations; it created visual artifacts, e.g., you could have groups of black pixels visible on a glacier.
We devised a new compositing algorithm based on aggregating the pixel value from a set of values, rather than just choosing one; this brought a much more natural-looking output without these pixel artifacts.
Scaling up the area
Once we developed and tested our new strategy and had one beautifully looking cloudless location, we scaled up the process for the entire globe.
However, the magic is not just in the compositing algorithm itself but also in the preselection algorithm, which selects the best images to be processed by the compositing algorithm. This part is crucial, especially if you want to process the whole world; the more image files you process, the more resources will be needed, leading to a more expensive solution.
To reduce the number of input files, we created a time window algorithm that selected the best set of months from the year with the least clouds in it. It was a 4-month time slot, with the starting month varying based on geographical location.
With the time window in place, many images could still be filtered out based on their quality. We used the SCL layer from the Sentinel-2 data product to create a quality mask to filter out the bad images from the time window. In this layer, the pixel values range from 0-11, what they represent can be seen below:
With the quality mask and time window, we could ensure that only the best images went into the final compositing algorithm. However, with over 237 trillion pixels to process, it is still a huge task to undertake, so we turned to MapTiler Cluster to automate it.
MapTiler Cluster is a cluster computing solution that consists of a master server and worker instances and runs on any cloud computing platform like GCP, AWS, or Azure. The master server divides bigger geographical locations into smaller geographical job tasks, which are then processed by individual workers. Thanks to such a solution you are able to decrease the duration of a project from years to just a couple of weeks.
The result: A beautiful satellite imagery layer
The last thing that we needed to do was launch MapTiler Cluster and wait for a couple of weeks. Then we could enjoy the most beautiful looking world satellite map in its stunning natural colors.
Find out more about satellite and aerial imagery on our website:
If you are a fan of numbers and stats, here are the highlights from our project:
|Bands||R, G, B, NIR|
|Projection||WGS 84 Web Mercator|
|Date of input data||2021|
|Output format||WebP or 16bit TIFF|
|Total size of the layer||500 GB|
Do you want to carry out research like this?
MapTiler is always looking for new talent, either to work on cutting-edge research like this or other areas of computer science and GIS. If you want a role working on big data analysis, cloud-optimized processing, or a wide range of geo-technologies, we may be the company you are looking for. Have a look at our jobs page to see who we are recruiting right now, or just send a C.V. and covering letter to [email protected] to see if we can make you part of the team!