The combination of satellite imagery and artificial intelligence (AI) is a powerful tool for data journalism investigations.
But, what kind of stories can be told? How to get started?
With the aim of paving the way for journalists who want to use satellite imagery and AI in their work, in this two-part article I share what I learned after talking to experts and doing my own experiments.
In this first part, you'll find suggestions on how to get started and avoid the most common pitfalls. If you are looking for story ideas, visit the second part by clicking here.
Working with satellite imagery and AI models takes time and patience. There is no general rule: you have to find the right model for each case, in a process of trial and error, while crunching large amounts of data.
That is why the advice of Anatoly Bondarenko, data editor of Texty, is crucial: "Find a task or story where you will not be sorry to put many resources in."
The combination of satellite imagery and AI is useful for providing general context where no data exists, discovering patterns over large territories or over time, counting scattered objects or finding needles in a haystack.
"AI makes sense when you want to do things at a larger scale or repeatedly. Either because you're working on such a big area that it's impractical to do it as a human or because you want to do it again and again over time," says Edward Boyda from Earthrise, who worked on the Amazon Mining Watch project.
Before you go any further, ask yourself: is this the best tool to tell the story?
Perhaps there is an easier or more efficient way to get the information you are looking for. There may already be a dataset available, or you may be able to get it through a public information request. Or you may manually count objects in a satellite image, without the need for AI.
If you decide to go ahead, the next step is to familiarise yourself with your region of interest (ROI) and the basic tools you will use.
One of the advantages of satellite data is that it allows you to investigate from afar: you can observe any corner of the globe from your computer. But you should still familiarise with the ROI in order to interpret your findings correctly.
If you can visit the site, so much the better; if not, gather as much information as you can about it. Find out what the problems or hot topics are, get non-satellite images, contact local people or experts that work there. This will help you to minimise errors during the analysis.
Edward says that satellite imagery is useful for investigating areas where it is "hard for journalists to operate", but that it should always be used "in a combination with traditional on the ground reporting", as satellites are "useful for illustrating or quantifying, but don't bring that human story element in".
As for the tools, take some time to experiment and choose the ones you are most comfortable with or can best adapt to your case. Start with the familiar: if you already know R or Python, for example, look for libraries such as RSToolBox or Earthpy.
You don't always need to write code: there is free software with AI algorithms for classifying images or detecting objects. Two very useful ones are QGIS (with the SCP and dzetsaka plugins) and SNAP (ideal for European Space Agency satellite data).
For an initial analysis, you don't need anything too fancy:
Most journalistic pieces that use AI and satellite imagery are collaborative projects and rely on a data expert.
You can look for help in specialised forums or contact other journalists who have used such techniques. But also look beyond journalism: Earth Observation (EO) and remote sensing are used in many fields such as agriculture, ecology, biology and disaster management.
If you seek advice from others before starting the technical work, you will avoid some of the dead ends that others have already gone down.
It is not just about solving technical problems. An NGO working on environmental issues may be able to supply high-resolution images, give you hard-to-get information about your ROI or put you in touch with local experts.
"This knowledge is not something that is intrinsic in the news industry, so news organisations need to go where they can find it, looking at civil society organisations or startups focused on using the satellite imagery," says Mathias Felipe de Lima Santos, project manager of data-driven projects at InfoAmazonia and researcher in the Digital Media and Observatory at the Federal University of São Paulo (Unifesp). This way they will be able to "produce investigations that go beyond their own limits".
Always start small. Even if your ultimate ROI is a much larger area, start by limiting your analysis to an area that is representative of the phenomenon you want to investigate, but as small as possible.
Using AI on a single satellite image takes a fair amount of intellectual effort, computational power and processing time. As you add more images, the difficulty grows exponentially: the information to be processed increases and new problems arise, such as how to stitch the images together, work out the overlaps, and compare images from different dates.
Do all the tests you need in a small area and only when you are satisfied with the results extend the model to your entire ROI.
You may be using a lot of information from different sources. Satellite imagery in raw format often has complicated names, combinations of letters and numbers that reference the coordinates, date and type of processing used.
It is good to be consistent: using a spreadsheet as a master file can help you to organise all this data. As when writing code, it is better to use extra-explicit nomenclature rather than generic names that will make it difficult to remember what they refer to.
AI comes in different flavours but you will probably use one (or a combination) of the following tools:
If you plan to use a supervised model for object detection, find out if there are already image datasets available to train it. Creating datasets is a time-consuming task.
Try different models - and different parameters - to see which one suits you best. Some of the best known are Random Forest, Supporting Vector Machine (SVM) and Nearest Neighbours. Multi-layered Deep Learning (DL) models tend to give better results, but are much more complex to build.
If you have to process a lot of data, consider cloud computing. There are free tools such as Google Colab or Google Earth Engine (GEE) that can crunch huge amounts of data and get results in a short time. GEE also provides easy access to datasets from Landsat, Sentinel, and MODIS missions.
Amazon Mining Watch, for example, uses Descartes Labs, a paid cloud computing service to process large amounts of satellite data.
"It's built around in a way that you can break the images into tiles for parallel processing and that can be ingested directly by the model. So, we don't have to think about that part of the pipeline," says Edward.
If you work on your personal computer, it's essential to have a solid-state disk (SDD), at least 8 or 16 GB of RAM and a good amount of available storage: a single raw satellite image often weighs several gigabytes. A powerful processor, a dedicated video card and a good-sized monitor will also make your work easier.
Scepticism is a powerful weapon in journalism. Treat AI predictions like any other data: don't take them for granted. Like all technology, AI does not work magic or replace traditional journalism methods. Rather, it complements them.
"It's always this golden rule: 80% of the work was data. Also, a correct interpretation of data, because we had to find images in big resolution, but no less important was to find some first examples of the specific patterns of mining," says Anatoly about Texty's investigation into illegal amber mining in Ukraine.
A good part of your job will be to clean up false positives or understand why there are so many false negatives to improve your model. If you are working on a story about change, for example, also pay attention to what didn't change and ask yourself why.
It is important to get some ground truth to check the results of your model: field data obtained directly in the area you are analysing. What looks like a crop field in the satellite image is really a crop field?
That doesn't mean you have to travel: you can contact local experts to help you evaluate your findings.
When you publish, share the methodology you have used, either in the piece itself or on an external site such as GitHub. It's a good transparency practice, allows understanding your analysis process and helps others interested in working on the same topic.
But also think about the audience, who don't necessarily know how AI works, and explain its scope and limitations.
"Journalists need to be transparent and explain not only about the metadata, but also explain what AI is, what computer vision is, and how we approach that. And make clear the limitations [of the model]," says Mathias.
Always make explicit the gaps in your research, the things that are unverified and the levels of uncertainty in your model. Putting on record what you couldn't solve doesn't diminish your work; on the contrary, it makes it more reliable.