Image by Maria Fleischmann / World Bank
  • Blog
  • 1 December 2023

Is climate finance wrongly reported by over a billion dollars per year?

We used AI to assess how the World Bank and the UK are recording climate funds. Our model identifies one in five of the Bank's projects as appearing suspicious and warranting further investigation, compared with the UK's one in 50.

As part of a programme of work on climate finance reporting practices, Development Initiatives is exploring the extent to which providers are misreporting their figures. Such misreporting clouds attempts to understand exactly what is being spent in the name of climate finance. This means there’s low accountability and no way of knowing how to improve public finance investments. As one approach, DI is trialling AI methods to probe the consistency of reporting from two major providers: the UK Foreign, Commonwealth & Development Office (FCDO) and the World Bank. Initial results cast doubt on billions of the World Bank’s climate finance.

For the international community to successfully tackle climate change, wealthy countries need to deliver on their climate finance promises. Accurate and consistent reporting is needed so that progress against these promises can be measured. However, as things stand, there are no rules on what can be reported as climate finance, leading to multiple approaches that produce wildly different numbers. Given the lack of any monitoring framework, reporting may not even be consistent across projects from the same contributors. Ahead of COP28, we conducted a pilot study; training and testing a machine-learning model to analyse reporting consistency using projects labelled as climate finance by the World Bank and the FCDO as our test cases. Our initial results suggest that billions of dollars of World Bank funds are misreported every year and highlights the need for more robust reporting systems to improve data and accountability.

DI’s previous work shows that the multiple inconsistencies in how climate finance is reported make it almost impossible to discern which projects have a genuine climate focus and which have been labelled that way for political expediency, based on what providers themselves report. To really understand the extent to which projects address climate change, careful evaluation of each project is needed. But given that there are hundreds of thousands of projects claiming to contribute to climate goals, it is impractical to manually assess each one. Instead, we used a natural language processing (NLP) model to see what light it could shed on reporting practices by assessing whether individual contributors are consistent in their reporting i.e. how accurately they’re applying their own climate finance codes to their projects. This exploratory approach can help us identify which projects are worth investigating in more detail, and which providers appear to be following a more coherent framework for designating projects as climate finance.

How does the process work?

The model we developed allows us to identify patterns in contributors’ reporting: words and phrases in project descriptions that are strongly associated with the contributor labelling that project as climate finance. The model can then be used to identify projects that don’t conform to these patterns: for example, projects labelled as climate finance despite not containing any words or phrases that signify why it should be and thus having a dubious connection to climate goals. This gives us an indication of:

  • How many projects are incorrectly labelled as targeting climate change
  • The US$ value of these misreported projects
  • The number (and US$ total) of projects that do target climate change but aren’t reported as such.

You can read the technical specifics of the approach we applied at the end of this blog.

What proportion of projects might be incorrectly labelled?

The World Bank

Of the 1,900 World Bank projects with climate-change sector codes, the model identified 380 (or 20%) as being potentially incorrectly labelled. Together, these projects represent spending of US$5 billion between 2018 and 2022. This is more than the total GDP of climate-vulnerable Fiji, a small island developing state. Examples of World Bank projects that were identified by the model as being only tangentially related to climate change include: National and Regional Roads Rehabilitation, Support to Uruguayan Public Schools Project and Gambia Electricity Support Project (oil-fired electric power plants).

Likewise, the model identified 298 projects that the World Bank has not identified as being related to climate change which may need that label, including: Myanmar Southeast Asia Disaster Risk Management Project, Niger Disaster Risk Management and Urban Development Project, and Pacific Islands Regional Oceanscape Program Forum Fisheries Agency.


Of the 1,939 FCDO activities tagged as International Climate Finance (ICF), the model identified 47 (or 2.4%) as being potentially incorrectly labelled. Together, these projects have associated spending of US$295.5 million over the last five years. Examples of projects the model identified as being discordant with FCDO’s internal use of the climate codes include: Support to World Food Programme for Trunk Road Improvement and Maintenance, Tax Advisory Service for Investment components and a Human Rights Monitoring Project.

Likewise, the model also identified 50 projects that were not initially labelled as ICF by FCDO, but may deserve that label based on the project text. Some examples of these include: PROVIA (Programme of Research on Climate Change Vulnerability, Impacts and Adaptation), World Bank: Global Environment Facility (GEF)1 Small Island Developing States Funding Window, and Carbon Trust Ag - INNOVATION Centres and AMCs.

This model cannot perfectly identify mislabelled projects. For example, one of the World Bank projects that the algorithm suggests should not be marked as climate finance is a solar-power project in Morocco that few would find controversial. Generally, if a provider has only a few projects of a given type (such as solar power) then they are less likely to be recognised as climate. However, this method can nevertheless be used to indicate which projects are worth investigating more. In addition, the accuracy of the model across climate finance providers could also be informative. The model will have a low accuracy if providers are randomly labelling projects as climate-focused, as it will struggle to identify any pattern in the data. Conversely, if providers are labelling projects as climate-focused according to a coherent framework, the model should be able to identify this and achieve high accuracy.

How might this research be used to improve reporting?

Natural language processing (NLP) models may be useful as diagnostic tools to identify inconsistencies in content coding. Our trial suggests that the FCDO is reasonably consistent in how it labels climate finance, with our model identifying very few questionable projects given the language used in project descriptions. By contrast, the model raised concerns about roughly one-fifth of World Bank climate-finance projects, totalling over US$5 billion in the last five years. Although it’s likely some of the projects identified by the model may include climate change components upon closer inspection, this approach could be developed into a tool to support and guide the manual reviewing of project labels’ accuracy.

Our next steps

This study is a part of a larger body of work that DI is undertaking to analyse contributor reporting; identifying best practices and developing recommendations that can inform a more robust reporting framework for future targets. To hear more about our work, join our mailing list, follow us on Twitter or LinkedIn, or get directly in touch with questions or comments about the approach set out in this blog.

The methodology used in our pilot study

Although it is difficult to use NLP to code development activities from scratch, language models are adept at learning the general patterns of how contributors apply their internal coding schemes. It’s possible that the low accuracy issue which has frustrated NLP efforts is not entirely due to model insufficiencies, but also inconsistencies in the way in which contributors have been labelling their projects.

To carry out this study, we trained a machine-learning model to perform natural language processing (NLP). The base language model we used was open source and has been trained only on copyright-free online text and public-domain books. This allowed us to make good use of the technology while minimising many of the risks discussed in our recent blog, Artificial intelligence for public good.

The use of NLP to automatically classify development activities is nothing new. Almost a decade ago, Alex was experimenting with NLP to assign sector and purpose codes to project titles and descriptions as a part of AidData’s granular recoding scheme. Although the techniques have evolved and improved over time, automatic coding of project text continues to be plagued by low accuracy. Ultimately, the quality of the detail within project text varies greatly and given that most coding tasks are subjective, fully automated approaches tend to fall flat.

To study the internal validity of contributor-applied climate-finance labels, we developed a document classification model based on the Bidirectional Encoder Representations from Transformers (BERT) language model. BERT is a large language model (LLM), though it has not been fine-tuned to answer queries as OpenAI’s ChatGPT does. It was first released by Google in 2018 and was trained on all copyright-free content like Wikipedia and other public-domain works. For more information on what this is and how it works, see BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. We applied this model to project-level text data downloaded from IATI for two contributors: FCDO and the World Bank. FCDO activities were identified as targeting climate when they used a tag with the code “ICF”, short for “International Climate Finance”. The World Bank employs a separate sector vocabulary to denote the extent to which projects count as climate.

For FCDO, 1,939 activities were identified as targeting climate out of a total of 24,338 activities available in IATI, and a random sample of 1,939 non-climate activities were drawn in order to form a balanced training set. For the World Bank, 1,900 activities were identified as targeting climate out of a total of 4,196 activities available in IATI, and a random sample of 1,900 non-climate activities were similarly selected. In both models, 33% of the datasets were reserved for evaluation.

Evaluations were conducted on each model by extracting the entire corpus vocabulary and finding which words yielded the highest activations for each state. Once the evaluation showed the models had correctly learned climate concepts, internal validity was assessed by running the text of the entire training set through the trained model, and identifying any discrepancies between the contributor-assigned code and the model predictions.

The model trained on the balanced dataset of FCDO activities achieved 93.1% accuracy after 22 minutes of training on a T4 GPU. The evaluation of the model showed a good understanding of related concepts, revealing these 20 words as those most related to activities targeting climate out of a total vocabulary of 4,852 words: ecosystem, biodiversity, catastrophic, environmentally, renewable, drought, upland, basin, ecological, pollution, climatic, crop, pipeline, sustainable, withstand, biomass, electrification, basins, climate, peat.

The model trained on the balanced dataset of World Bank activities achieved 83.9% accuracy after 1 hour and 12 minutes of training on a T4 GPU. The additional training time was due to the fact that on average, each World Bank activity had about 2.5 times the amount of text as each FCDO activity. Despite the lower accuracy, the model evaluation shows it still picked up on some of the keywords we would expect to see for climate finance, with the 20 most indicative words out of a total vocabulary of 9,998 being: emissions, renewable, waste, sustainable, grassland, natural, covenant, biomass, sustainability, grasses, watershed, wetland, wetlands, livestock, savanna, plants, reservoirs, peat, mangrove and fishery.