Alternative Data Category Description
Both fundamental and systematic asset managers employ event-driven strategies trying to predict the outcome of calendar events or the occurrence of significant events. Typical event-driven strategies include merger arbitrage, activism strategies, distressed investing, and special situations.
Broadly speaking, there are three high-potential use cases for event data. Standalone alpha-generating strategies based on just guidance, buyback, or a combination of these events; as an overlay with traditional or alternative data alpha-generating strategies - events that are captured provide context and can be used as an overlay – current, historical, or as a future predictor; and also risk management use cases.
Event detection has become increasingly important across a range of application fields in the current age of overabundant information and massive production of web data. Natural Language Processing and Big Data analysis are two of the study areas that have been established to approach the issue from various angles, with the aim of delivering beneficial resources to support decision-making.
The goal of event detection extends beyond organizing the contents and providing analytics and serves as the foundation for additional algorithmic processing, such as the creation of automatic trading strategies. Vendors would also often use sentiment analysis to complement their data analytics.
Subcategory - Event Risk Data
Event risk refers to any unforeseen occurrence that can result in losses, or gains, for investors in a company or investment. Unexpected buybacks or corporate reorganizations may positively or negatively impact a stock price. Gap risk events where large swings in market prices happen could result in a change in portfolio value. The most common event risk that asset managers monitor closely is earnings and corporate meetings or presentations where information that could be positive or negative could impact the stock price. Most importantly the earnings calendar and potential revenue and earnings per share meet/beat/miss scenarios. These alternative datasets are important for potential alpha generation and to backtest stock price movements.
A possible restructuring due to mergers, acquisitions, or leveraged buyouts can also come into play. These types of events may force a business to take on new or extra debt, maybe at higher interest rates, which it might find difficult to repay. Companies must also consider situations when essential products could be recalled, allegations or investigations brought forward, macro conditions might change that will suddenly increase the price of key input, and so on.
ESG-related risk is also another category that is gaining attention due to the growth of a sustainable investment. All companies face ESG-related risks, and some of these risks even have the potential to become material and cause financial and/or reputational damage if failed to be addressed them promptly and effectively.
Subcategory - News NLP Sentiment Data
News items, globally and regionally, are an authoritative and noise-free source where qualitative information can be conveniently obtained. Signals or sentiment created from news can be used to monitor companies on events like store closings, strikes, and environmental issues, which are often published first in very specific and local news sources.
Heston and Sinha (2016) used a dataset of more than 900,000 news articles to test whether it was possible to predict stock returns. They found that positive news was quickly incorporated by the market, but negative news had a delayed reaction of up to one month (which is consistent with short sale constraints). Event-driven strategies like special situations strategies cover a variety of events that may not be formally announced. Examples include spinoffs, asset sales, product launches, or sanctions. Identifying these opportunities ahead of the market has the potential to deliver impressive results. Hackbarth and Morellec (2006) analyzed stock returns in mergers and acquisitions using a sample of 1086 takeovers of public US firms in the 1985-2002 period. The researchers demonstrated that news of a merger positively impacted the target company’s stock price while having little impact on the acquiring firm’s stock.
In recent years ESG related news items have become influential across the three pillars and across various ESG frameworks. For example, a company may self-report positive metrics for pollution or employee work practices, but news items may pick up on stories at the country or industry level that shows less positive real-world actions. NLP can be used to check what a company is doing rather than what management is saying they are doing.
Subcategory - NLP Transcripts Data
Event detection based on financial reports, conference call transcripts, press releases, or even TV releases can point to a company’s direction, executives’ behaviors, and decisions, providing additional insight about the entity or the resonance of the events.
Performing NLP on earnings call transcripts can help asset managers identify related topics and tone changes in the transcripts, provide call transparency and sentiment, etc. Price et al. (2012) showed that earnings call sentiment or tone can be predictive of return and trading volume abnormalities in the U.S. market.
Clara Vega, Assistant Director at the Federal Reserve Board of Governors, researched announcements and how they affect asset prices. Finding that the effect of public surprises was mostly associated with the arrival rate of noise and informed traders. Demers and Vega (2008) demonstrated that optimism in earnings announcements was positively related to abnormal returns. On the other hand, less resolute language in earnings announcements was associated with stock valuation uncertainty.
Data Structure
- Most data vendors offer data mapped to ticker and PIT.
- Since news and transcripts data is usually extracted publicly or from 3rd parties, the history of this category of alternative data is generally over 10 years.
- NLP new sentiment datasets are usually updated and delivered at a daily or real-time frequency.
- The most common delivery method is API and platform.
Compliance Considerations
Alternative data in this primary category is usually collected from public sources, and common compliance issues relating to web scraping would apply. E.g., different laws under various jurisdictions, copyrighted materials, etc. Issues like PII and MNPI usually don’t apply here.
Users of the data need to ascertain if the alternative data provider collected the data legally and complied with all the conditions on a company’s and/or social network website. The principal concern is that the data is not obtained behind a paywall or login. This would be particularly relevant for the News NLP category where there are paywalls across many financial newspapers like the Wall Street Journal.
The hiQ Labs versus LinkedIn case was a ground-breaking court case related to web-scraped data specifically related to employment data scraped from LinkedIn. This case spent many years in the US court system. It originally looked like hiQ won the case and thus scraping data was deemed legal. However, scraping data is only legal if conducted on a public-facing website with no login used to access the data. The profiles on LinkedIn are public and therefore can be scrapped. It turned out that after many years in the court system, hiQ admitted to scraping behind a login and lost the case.
The ruling leaves open a broad interpretation of scraping in general and vendors must adhere to company website T&Cs, the use of rotating IP addresses, and robot.txt restrictions. Specifically for hiQ, the ruling said that if a vendor has their own corporate profile on LinkedIn, then they need to comply with the T&Cs of LinkedIn, which does not allow scraping. The restrictions from the ruling have yet to be seen to play out beyond hiQ.
The broad hiQ ruling may lead to more cease-and-desist letters and make compliance and data provenance more difficult for any data derived from web scraping. Like everything in the alternative data ecosystem, data provenance is of the utmost importance.