Open Up Guide: Using Open Data to Combat Corruptionbeta

Español

Section 3: Making use of open data

Open data can be an anti-corruption resource for governments, civil society, journalists and the private sector. The appropriate strategies for data use vary between sectors, and the stage of the anti-corruption cycle that is being addressed.

From data gathering, to data use

Unlocking resources for anti-corruption action

Many of the pioneers of data-driven anti-corruption work have not been able to draw upon proactively published open datasets. Instead, they have had to gather the data they need through Freedom of Information requests, scraping data from inaccessible websites, and, in some cases, working with leaked datasets. Where data has been available, it has often been low quality, requiring substantial investments of time and effort before it can be used - and limiting the extent to which tools from one country can be used in another.

In this section we detail a number of cases where different stakeholders are working with the datasets described in the last section - either directly from open data, or using data they have manually gathered. The more governments move towards proactive publication, the more use-cases like these can spread, and effort can go into data use, rather than data gathering.

Over the coming year we hope to revise this section with additional cases that demonstrate direct use of open data - as governments deliver on their commitments to provide structured open data for anti-corruption.

Case study: The Panama Papers

The Panama Papers are an unprecedented leak of 11.5 million files from the database of the world’s fourth biggest offshore law firm, Mossack Fonseca. The leaked files reveal information on more than 214,000 offshore companies, connected to people in 200 countries and territories. The data includes emails, financial information, and corporate records that in some cases link world leaders and other prominent figures to illicit activity.

The International Consortium of Investigative Journalists worked with the leaked documents, and imported structured data extracted from them into a graph database, providing this to a network of 100s of investigative journalists. This made it possible to find leads in the dataset, and to follow up potential stories. The investigations and stories from analysis of this data have led to multiple resignations and prosecutions.

Although the Panama Papers dataset itself was not open data, published by a government, the investigations that followed demonstrated the investigative potential of corporate ownership data - and the value of having linked and structured data, as opposed to just documents.

Crucially, open data did play an important role in the follow up Panama Paper news stories. Open Corporates, who host open data on millions of companies and shareholders worldwide, reported a substantial spike in searches from countries where political leaders were implicated in offshore company scandals - revealing citizen interest in finding out more about their politicians business dealings.

Prevention

Prevention involves actions, mechanisms and tools that reduce corruption risks, or increase the costs of corruption in ways that deter corrupt activity. Open data can be used in prevention across a range of sectors.

In the private sector

In the financial sector, governments have increasingly introduced Know Your Customer (KYC) regulations that require banks and other financial institutions to conduct due diligence checks what taking on a new client, or processing funds. Firms entering into new deals may also wish to carry out due diligence on potential business partners. These checks often involve:

  • Identifying the owners and beneficial owners of a company client;
  • Checking clients and their owners against a list of politically exposed persons or public officials;
  • Checking clients and their owners against court records;

At present, these checks are often carried out using ‘black box’ private due diligence services. These services are often expensive (banks may pay-per-search), and often rely on a limited range of sources, such as media coverage, to flag up potential client risks. If the media have not reported on a given corruption case in the past, the due diligence databases might have a blind spot.

But - as the source data fo become available as open data, and if government and private firms demand better quality due diligence, then there is scope for innovation in how these processes take place. However, it is important to note that many regulations and businesses processes for due diligence rely on documentary evidence. An entry in a dataset may need to be backed up by other sources of evidence before a bank or financial institution will make a due diligence decision based on it.

Case study: Open Ownership

The OpenOwnership.org project is working to build a global database of beneficial ownership data, drawing on existing published data, and self-submitted information from companies and beneficial owners.

Beneficial ownership refers to “the natural person or persons who ultimately owns or controls a customer and/or a natural person on whose behalf a transaction is being conducted. It also includes those persons who exercise ultimate effective control over a legal person or arrangement.”

Knowing the beneficial owner(s) of an asset is vital to be able to truly follow the money, and see through layers of shell companies and complex ownership structures. Some jurisdictions are now introducing registers of beneficial ownership, requiring companies and land registrations to provide details of their ultimate beneficial owners.

By combining data from different national registers, using a common Beneficial Ownership Data Standard (currently under development) and allowing self-submission of data OpenOwnership it is aiming to provide a ready-to-use source of information for due diligence.

The project is being developed as a multi-stakeholder partnership involving Transparency International, One, the Open Contracting Partnership, the World Wide Web Foundation, Global Witness and The B Team.

In the public sector

When interest and asset disclosures are filed on paper, it can be easy for politicians and officials to leave out certain disclosures, and simply hope these will not be detected. But when disclosures are published as structured data, it becomes easier to cross-reference between the information provided in a disclosure, and the information held in other sources, such as the company register.

This increase the complexity and costs of hiding information, and creates a pressure for more accurate disclosures.

Case study: 3x3

Proactive publication of structured data about interests and assets is relatively rare. More often, declarations are made on paper forms, or hosted on scattered websites of each institution. To respond to this challenge in Mexico, Transparecia Mexicana and partners launched the ‘3 of 3’ campaign calling on politicians and public officials to publish three key declarations using structured templates and covering their:

  • Statement of assets;
  • Interest declarations; and
  • Tax returns

At the moment, officials need to provide this information directly to the 3 of 3 campaign, who then re-publish it in semi-structured forms. However, the information is captured using Excel templates, offering opportunities for further analysis and cross-linking of declarations.

Case study: ProZorro

Ukraine’s public procurement system was once notorious for corruption and inefficiency. Since launching ProZorro, the country’s open source, open data e-procurement system the government has saved 14% on its planned spending (more than 300 million Euros) and seen a 50% increase in companies bidding for contracts - helping build business and citizen trust in the government process.

Detection

With thousands of procurement processes taking place every month, and hundreds of spending transactions by governments every day, it is effectively impossible to audit every one of them manually for signs of corruption. But with structured open datasets, large-scale analysis can be carried out on a rolling basis

A common approach is ‘Red Flag Analysis’. Here, a set of indicators are designed, that can be assessed either using a single dataset (e.g. procurement data), or a collection of joined-up-datasets (e.g. company registers, asset registers and spending data). Software is created or configured to then read through incoming data, and analyse activities against the red flags. When a certain threshold is hit, users of the system will be notified by alerts, or through a dashboard, that there are cases in need for deeper investigation. Red flags dont prove corruption is taking place - but they highlight areas which may, statistically, be subject to higher corruption risk. This can help in targetting scarce investigatory and enforcement resources.

The Open Contracting Partnership has been leading work to develop a common framework of red flags, and to assess which fields from the Open Contracting Data Standard (OCDS) are required in order to be able to detect certain corruption risks.

Case study: Open Contracting Red Flags Framework

In “Red Flags for integrity: Giving the green light to open data solutions”, the Open Contracting Partnership have identified a range of metrics that can be calculated from Open Contracting Data Standard (OCDS) data on public procurement processes in order to surface corruption risks. The study identifies “a set of over 150 suspicious behavior indicators, or “red flags” [that] occur at all points along the entire chain of public procurement-from planning to tender to award to the contract, itself, to implementation-and not just during the award phase, which tends to be the main focus in many procurement processes.“

By building on standardised open data, tools built around these metrics can be more easily applied to datasets from different countries.

Investigation

There are a number of advantages to the use of open data as part of anti-corruption investigations, whether carried out by journalists, or by auditors and law enforcement. Open datasets are available across national borders: meaning that citizens, journalists or officials in one country can draw upon data from another easily - and without having to go through various administrative processes to access information. This may assist investigators working in risky contexts, allowing investigations to proceed without political interference, or placing a spotlight on the investigator. It can also support easier investigation of cross-national corruption networks. For example, the coordinators of Sinar Project in Malaysia report using the UK Beneficial Ownership register to investigate the foreign company holdings of politically exposed persons from Malaysia.

Case study: Building investigatory tools

We have already mentioned the work of the International Consortium of Investigative Journalists (ICIJ) in building datasets around leaked data to support investigations.

In Slovakia, the Fair Play Alliance has created Datanest - a platform that compiles information on government spending from a variety of sources. This information, covering government expenditure, including on subsidies, government contracts, and elections can be queried by investigative journalists, analysts, watchdog organizations, and ordinary citizens to explore potential stories, or search for evidence to back up an investigation.

Similar projects exist in Latin America, where the PODER network have built the ‘ QuiénEsQuién.Wiki’ platform to combines procurement information and company ownership information to support journalistic investigations.

Open Data, Investigations and Law Enforcement Workshop

In October 2015, Global Witness, published a story reporting on Myanmar’s multi-billion dollar jade industry. The report focussed attention on the powerful military, government and narcotics actors benefiting from Myanmar’s jade wealth, and the way in which they are using a web of anonymous companies to hide their gains at the expense of the rest of the population. The report was covered in the Wall Street Journal, Guardian, BBC, New York Times, Reuters, Wired, ABC, AP, and AFP.

The investigators from Global Witness relied heavily on company data from the Directorate of Investment and Company Administration (DICA), made accessible as open data through OpenCorporates.com. This data included key fields, such as company names, directors, and unique identifying numbers for those directors. Although some source data was removed from the DICA register during the period of the Global Witness investigation, it remained accessible in the open data copy, enabling researchers to continue following up leads - combining digital analysis techniques with conventional journalistic interview practices.

Enforcement

There are three main use-cases for open data when it comes to enforcement. Firstly, trend analysis with open data can be used to target scarce enforcement resources: highlighting emerging issues in need of exemplar cases, or surfacing areas where there is a good possibility of successful prosecutions. Secondly, open data of all forms can form part of the evidence in a case that prosecutes or sanctions corrupt activity. Thirdly, open data on courts, enforcement and sanction processes can be used to scrutinise the effectiveness of the enforcement system itself, and to highlight areas in need of systemic improvement.

Box 7: The importance of evidence

We have already mentioned the work of the International Consortium of Investigative Journalists (ICIJ) in building datasets around leaked data to support investigations.

In Slovakia, the Fair Play Alliance has created Datanest - a platform that compiles information on government spending from a variety of sources. This information, covering government expenditure, including on subsidies, government contracts, and elections can be queried by investigative journalists, analysts, watchdog organizations, and ordinary citizens to explore potential stories, or search for evidence to back up an investigation.

Similar projects exist in Latin America, where the PODER network have built the ‘ QuiénEsQuién.Wiki’ platform to combines procurement information and company ownership information to support journalistic investigations.

Case study: Uncovering the Illegal Jade Trade

Ahead of the UK Anti-Corruption Summit in May 2016, we held a workshop with investigative journalists and law enforcement practitioners to explore the potential uses of open data. This workshop explored how: Open data can help civil society and media to put corruption issues on the law enforcement agenda, particularly when law enforcement has limited time and resources;

  • In general, corruption law enforcement have more referrals than they can process. Whilst some agencies might look at bulk data analysis to identify crimes (e.g. cyber crime, child exploitation), this is presently not as common in fraud and anti­corruption work;
  • Law enforcement are governed by very strict rules framing how evidence can be gathered and used, and requiring that there is a replicable process, carried out be experts, between any original source of evidence, and any analysis that might be presented to a court;
  • Defence lawyers won’t challenge the evidence – they will challenge the process of evidence collection and processing.

This raises some key challenges for open data use that ultimately aims to secure convictions or sanctions through legal process. Although the use of open data is generally exploratory, it is important in working with open datasets to track their provenance and address the ‘four Cs of evidence’:

  • Context. Where has the data come from? How was it acquired? Law enforcement can’t use any data that was illegally obtained.
  • Corroboration. ­Data on its own is not enough. Behind every data is a human, and it is important to prove what human actions were.
  • Currency­. How up to date is the information? For example: There are different filing times across company registries and information can be out of date within a day
  • Completeness­. Has the data been changed in any way? If so the defence will want to be able to follow the same trail and mirror it. How can we be sure that this is a complete picture that someone else can replicate?
  • Source: Open Data, Investigations and Law Enforcement Workshop, April 2016, London.

PREVIOUSNEXT

Section 2: Anti-Corruption Open DataConclusion