Open Data

Alternative Data Category Description

Open data is alternative data that can be freely accessed, used, modified, and shared by anyone. The open data movement can be traced back to the concept of open access to scientific data and the formation of the World Data Center system in 1955. The Human Genome Project is one of the most famous examples of open data in action as it was built upon Bermuda Principles stating that all human genomic sequence information “should be freely available and in the public domain in order to encourage research and development and to maximize its benefit to society.”

Following the open data movement rise, large volumes of open government data were made available through a variety of portals and repositories. Governments and international organizations anticipate that allowing access to their data would help to provide transparency, accountability, and value creation for several groups:

Citizens – open data provides immediate access to the information which belongs to them, reinforcing the transparency vision. For example, open data can be used to track the spending of public funds, monitor the performance of government agencies, and discover patterns of corruption. Governmental institutions – the use of open data can help them become more transparent, efficient, and effective, also reinforcing their public service role. Business sector – may reuse open data to create applications, platforms, or other alternative data products. For example, open data has been used to improve weather forecasting, create better transportation apps, and improve the efficiency of energy production. Other sectors, such as journalism, university research, and non-profit organizations. For example, open datasets can be used by NGOs to better execute development targets. According to McKinsey, economies could see GDP gains of between 1 and 5 percent by 2030 if they adopt open data for finance. However, there are several challenges associated with the open data movement. When data is made openly available, it can be difficult to ensure its quality and accuracy or that it has been collected in a consistent and unbiased manner. Creating the infrastructure to support the storage and sharing of open data presents another challenge.

Subcategory - Open Source Data

Datasets are made available under a permissive license, such as a Creative Commons license, that allows anyone to use, republish, and modify the data without having to ask for permission. This encourages collaboration, experimentation, and innovation by making data widely accessible. Examples include Common Crawl, a repository of web-crawled data, and ImageNet, a large-scale image dataset for training machine learning models.

It is important to note that data vendors from other alternative data categories also actively use open data sources to improve their product offerings. Economic trade data vendors compile monthly import and export figures from customs offices and statistical research organizations. Patent data vendors aggregate data from authorities across the world to get a perspective on innovation activities.

Unstructured open-text data is another popular source for vendors developing their data products. SEC’s EDGAR (Electronic Data Gathering, Analysis, and Retrieval) dataset, for example, is being used by multiple data vendors. Other NLP alternative data vendors use natural language processing techniques to parse through SEC filings, extract relevant passages that contain discussions of company events, and compute sentiment models.

The use of linked data and the semantic web, which allows data to be linked and shared across different databases and platforms, is becoming more widespread. This makes it easier for organizations to discover, share, and use open data and facilitates the creation of new applications and services. Data.world, for example, has over 125K open datasets contributed by thousands of users and organizations across the world.

Most public and open data sources do not come mapped to the ticker. There are various financial data APIs available online that can be used for entity mapping. Vendors and other users of open data can use open-source APIs for symbology capture and apply this to messy datasets. Both Bloomberg’s OpenFIGI API and Refinitiv’s OpenPermID API can be used to build or enhance internal securities masters and also match names to financial IDs. For alternative data to be used for financial service or investment purposes, the data will have to be mapped to financial identifiers at some stage.

Bloomberg’s OpenFIGI API allows mapping from third-party identifiers to FIGIs – Financial Instrument Global Identifiers. This API allows mapping and search and is free to use and open to the public. You can access the OpenFIGI API here. Refinitiv’s OpenPermID API allows access to Refinitiv’s Permanent Identifiers and associated entity masters and metadata to the market. These IDs are also open, permanent, and universal identifiers, but are capped at 5,000 requests per day. This API allows record matching, intelligent tagging, and entity data download and also gives access to a wider developer community. You can access Refinitiv’s OpenPermID here.

Subcategory - Public Sector Data

This subcategory of alternative data is collected and maintained by various government agencies or international organizations and made publicly available for anyone to access, use, and republish without restrictions. Examples include data on government spending, crime statistics, weather forecasts, transportation schedules, health statistics, etc.

The EU’s Open Data Directive, adopted in 2019, is an important piece of legislation in this regard as it aims to make data produced by public sector bodies more easily accessible and reusable for commercial and non-commercial purposes. One of the key aspects of the directive is the requirement for public sector bodies to make their data available in machine-readable formats. This means that the data must be structured in a way that can be easily processed by computers, making it easier for businesses and individuals to use and reuse the data.

Data.gov is an online portal created by the US government that provides access to a wide range of open datasets from federal, state, and local government agencies. Data.gov.uk is a similar portal created by the UK government and the EU has its own portal with over 1.5 million public sector datasets available. The World Bank’s and the OECD’s open data portals provide access to development indicators such as GDP, population, education, unemployment rates, and health.

Data Structure

  • Most data vendors don’t offer the data mapped to a ticker or company. Data vendors that use or productize a derived or separate dataset typically will map the data to the ticker.
  • History can vary. Typically, at least five years of history but it can be much longer.
  • Delivery frequency can be daily/weekly going up to monthly depending on the underlying data source. Some open datasets are updated in real-time, such as weather forecasts or transportation schedules, while others may only be updated periodically, such as census data or financial reports.
  • The data is typically delivered via platforms or by API, or FTP.

Compliance Considerations

When data is made openly available, there is a risk that personal information may be exposed. Organizations must ensure that the data they release does not contain any sensitive information about individuals and that any personally identifiable information is removed or anonymized before being made publicly available.

Organizations must also ensure they have the legal right to make the data publicly available and that they are not in violation of any intellectual property rights or other legal agreements. They should also ensure that they are in compliance with all applicable laws and regulations regarding the collection, use, and sharing of data.

Spread the word