Section 2: Anti-Corruption Open Data
A solid anti-corruption data infrastructure needs a range of different datasets, published to a high quality and in ways that allow connections to be made across between them. The adherence to open data standards contributes in securing that a larger number of users can benefit from the data available.
Overview: Data against corruption networks
Core data for setting an anti-corruption data infrastructure
Acknowledging the way corruption works, we have identified datasets relating to each of the core elements of a corruption network: a group of individuals and organizations, organized through a series of agreements and schemes – in some cases violating laws and government procedures– to extract certain rent from the public or obtain an undue benefit for a private gain (see table 4).
These datasets form a basic core that countries should strive to make available and interoperable. The approach is of course a general one as what data is available, and in what format, will vary from country to country, and case to case. Nor is this list definitive: there are many other datasets that can be relevant to specific anti-corruption efforts. However, together these datasets form the basis of a solid anti-corruption data infrastructure.
Table 4. Classifying anti-corruption data
|Core element of a corruption network||Description of the related data to the core element||Examples of datasets|
|Individuals and organisations||Refers to any dataset containing records and information on entities (individuals or organisations) that can be potentially involved in a corruption scheme. Datasets under this category should provide information about the nature and characteristics of any entity, as well as its connections with others.||
|Public-related resources||Refers to any dataset containing records and information on the resources which belong to governments or are intended for public purposes and that could be involved in a corruption scheme. Datasets under this category should provide information about the status and transactions related to those resources.||
|Regulations, rules and government procedures||Refers to any dataset containing records and information on the channels used, avoided or violated to commit an act of corruption by a corruption network. Datasets under this category should provide information about the procedures, events and legal acts potentially linked to corruption schemes.||
|Rent extractions||Refers to any dataset containing records and information on the destiny of public resources that were potentially extracted as a result of a corruption scheme. Datasets under this category should provide information about the income sources and ownership of the assets owned by members of a corruption network.||
This data takes many different forms. Data may be drawn from public registers created to serve broad public functions, or developed with specific transparency and anti-corruption goals in mind. It may be transactional data generated during the daily operations of government, and released in as close to real-time as possible. Or it may be drawn from public disclosures mandated by law or policy.
Governments manage many different registers: from company registers, and land ownership registers, to lists of registered lobbyists, or lists of public servants.
The UK Government Digital Service (GDS) describe a register as “...an authoritative list of information you can trust”. This is an ideal. Every effort should be made to ensure government registers are authoritative. GDS have developed principles for public registers, and an open source software stack that provides open APIs for access to ‘living registers’.
However, sometimes government registers are not kept up to date, or they are maintained in non-interoperable and error-prone ways. This can lead to third-parties maintaining their own open data registers based on aggregating together and checking on the quality of government provided data.
Box 5. Registers: Every Politician
EveryPolitician.org is an independently maintained datasets with the goal of providing “data about every national legislature in the world, freely available for you to use”. Using the Popolo standard to manage data, its dataset is populated by a mix of ‘screen-scraping’ official resources, and crowdsourcing information.
If governments provide official registers of political figures, then the EveryPolitician Bot can more easily keep the platform up to date.
Every day hundreds of land deals take place; thousands of government tenders are issued, and contracts signed; and millions of payments may be made to and from government.
Hidden within these transactions may be red-flags for corruption, or information that, when linked with information from a register, could show illicit benefits received by a government official.
Transaction data can be made available in real-time through APIs, or provided periodically in bulk downloads. Timeliness and disaggregation can be an important factor in the use of transactional data, but care must also be taken to respect privacy.
Box 6. Transactional data: Brazil's transparency portal
Brazil’s Transparency Portal provides detailed data on five key categories of transaction:
- Direct spending by federal government agencies through contracts and tender processes;
- All financial transfers to states, municipalities and the federal district;
- Financial transfers to social program benefactors;
- Administrative spending, including staff salaries, staff travel expenses and per diems and office expenditures; and
- Information on all government official credit card spending.
Some transactional information is updated on a nightly basis. The portal has over 900,000 unique visitors per month.
Transparency policies often create an obligation on public bodies, public figures or private entities to disclosure information. For example, disclosing a record of meetings between lobbyists and officials, or publicly posting voting records. Sometimes this information is recorded in registers, but often the obligation is worded so that bodies post their own disclosures on local notice boards, websites or in gazettes.
Frequently such disclosures are made in non-standard formats, in word processed documents, making it difficult to join up this information to other datasets. If standard formats were used, and data was more easily discoverable, the anti-corruption value of these disclosures could be increased.
Summary: Priority Datasets
This table includes 30 priority datasets that can be used to fight corruption and the key attributes needed so that they can talk to each other. To address corruption networks it is particularly important that connections can be established and followed across data sets, national borders and different sectors. The table includes the following on each datasets:
- What type of database it is,
- What type of information it holds,
- What stage of the corruption cycle it’s useful for,
- What other datasets are relevant,
- Links to examples of the data set,
- Potential standards to help develop the dataset, and
- What are the key attributes that are needed in order to link the datasets together.
Foundations of a solid anti-corruption data infrastructure
Joining up data and standards for anti-corruption
A solid anti-corruption data infrastructure can only be built when it is guaranteed that the different datasets that are part of it can communicate with each other. The higher the number of connections, the higher the number of possibilities of articulating data for an anti-corruption intervention. Based on the priority datasets for building an anti-corruption data infrastructure (see table 4), a series of core data elements have been identified and have also been matched to available open data standards.
A data standard can provide reference of the format a specific data field must have to secure its interoperability with other datasets. The use of open data standards allow to generate unique identifiers for individuals and organizations, set specific parameters to register events or transactions, and collect and organize data with minimum quality requirements. Moreover, the adherence to open data standards contributes in securing that a larger number of users can benefit from the data available.
It is desirable that both, governments and civil society, can together review the availability of such data and agree on a route map to disclose it as open data. At the same time, it will be important to review how data is structured and assess if it is worth to adjust it to match with open data standards that will allow linking it with other data.
Table 5. Summary of priority data standards for building an anti-corruption data infrastructure
|Name of data standard||Description||Sponsor|
|Open Contracting Data Standard||Data guidance for disclosing public procurement data in open formats about contracting processes from planning to implementation stage. Extensions for other types of contracting such as public private partnerships and concessions is under development.
More information: http://standard.open-contracting.org/
|Open Contracting Partnership (CSO)|
|Fiscal Data Package||Schema for publishing and consuming quantitative fiscal data, especially data generated during the planning and execution of budgets. It supports both data on expenditures and revenues.
More information: http://specs.frictionlessdata.io/fiscal-data-package/
|Open Knowledge International (CSO)|
|Popolo||Popolo is an initiative on open government data specifications. Its goal is to "define data interchange formats and data models so that organizations can spend less time transforming and modeling data and more time applying it to the problems they face". It allows standardization of data related to people, organizations, motions and voting, events, speeches, among others.
More information: http://www.popoloproject.com/
|Global Beneficial Ownership Register||Open schema –under development– for collecting and publishing beneficial ownership data globally. It will enable users to register in a standardized way data about the ultimate beneficiary or owner of a certain good (such as land) or an organization or entity (such as companies) across different countries.
More information: http://openownership.org/get-involved/
|Open Ownership (Global Coalition)|
|Open Corporates Schema||Schema for publishing and consuming data on companies worldwide, including data on jurisdiction, incorporation date, shareholders and subsidiaries. It recently incorporated beneficial ownership data released by the UK Government.
More information: https://github.com/openc/openc-schema
|Open Corporates (Private firm)|
Box 6. The G20 Open Data Portals: enablers of Anti-Corruption Data?
The G20 has recently pushed the open data data globally. Accounting for 85% of the gross world product (GWP), 80% of world trade and two-thirds of the world population, actions implemented by these countries can detonate trends across the world. Taking it into account, open data portals from the G20 countries were reviewed to understand the ease for identifying anti-corruption related datasets.
To start, only 16 out of 20 members have an open data portal. China, South Africa, South Korea and Turkey have not yet launched a portal where open government datasets can be accessed and downloaded. In total, these open data portals contain 593,220 datasets. The top three countries with more datasets available are Canada (41.3%), the United States (33.7%) and the United Kingdom (4.4%).
Based on this sample, a series of related-corruption words —in the portal’s official language— were looked up through their own search engines. For example, when the words “Corruption” and “Anti-corruption” were key search words, a total of only 114 and 311 datasets were respectively found. This means that only 0.05% of the available datasets is directly classified as a resource that could be used for anti-corruption purposes. Saudi Arabia, Mexico, Germany, Brazil and Argentina yield 0 answers for both requests.
Although, these results are not conclusive regarding the existence of anti-corruption data, they are prove that better categorizations or search mechanisms are needed to access such data. As matter of fact, the number of data fixed categories goes from 9 up to 33, making difficult to find data on similar issues across countries. Also, 50% of the open data portals reviewed (Australia, Argentina, Brazil, France, Germany, Indonesia, Japan and the USA) offer users the possibility of tagging freely datasets, allowing to search for information outside the standard categories. Regardless of the approach to be chose by each country, it is clear that there is a great opportunity for G20 governments to make their open data portals enablers of anti-corruption strategies.