Geospatial Data

Alternative Data Category Description

This type of alternative data refers to any type of data that can determine with precision the location of any object (Building, car, ship, etc.) or person in the world. This alternative data is obtained from electronic devices such as mobile phones, tablets, smartwatches, and connected vehicles. Geospatial data derived from mobile devices can yield timely information on visitation trends. Common industry applications include amusement parks, retailers, restaurants, malls, hotels, travel, transportation, and REITs. In addition to observing the levels of foot traffic, this data can be used to identify the impact of promotions and weather events. Cross-brand loyalty and regional idiosyncrasies may be identifiable. Geo-location data providers receive location data from mobile app owners, Bluetooth beacons, and sensors.

This type of alternative data can be extremely granular down to the longitude and latitude coordinates and include the dwell time of the person or object at a location. An example is observing people at a store like Starbucks and being able to decipher between employees and consumers. The well-established providers of Geospatial data will anonymize the data with a hashed user ID to follow privacy regulations.

Geospatial alternative data providers will have a panel. They might obtain location data from users of certain apps, and this means that they will only see location data from users within their panel but there are techniques to normalize the data across a population. One method is by overlaying the location data with census data. This is done when demographic data of users within a panel is also provided so that if the location data provider sees a certain cohort of users from the panel at a store in a certain zip code, they can overlay it with a consensus number to provide a prediction of how many users from that cohort will be in that store.

This data can also be rolled up to a macro level where you can look at the footfall at a company across a region like zip code, state, or country which would help further alleviate privacy concerns but still observe the ebbs and flows of consumers at stores/restaurants/hotels/buildings of companies.

Other use cases include tracking the number of visitors to POIs/Stores over time, and tracking migration patterns e.g., people moving out of urban areas with increased remote working options. An interesting use case is monitoring activity and productivity levels at factories or warehouses by tracking when employees are at the POIs (Points of Interest) and if there are employees working on holidays. For example, tracking Tesla employees at Tesla factories and what days/holidays they are working.

There are numerous challenges in working with geospatial data. There are privacy concerns around the data. There are regulatory requirements like GDPR (General Data Protection Regulation), and users now have to opt-in to provide location data which meant there were panel losses. The churn rate is also another issue. If users within a panel uninstall apps, they disappear from the panel so it can be a challenge to ensure that a panel is consistent over time and that there is no huge drop-off of the rapid increase in panel size. This can be somewhat handled by obtaining geospatial data from apps that users do not generally uninstall like weather apps over dating/gaming apps and from obtaining geospatial data from multiple sources. Ensuring that the data is accurate can also be a challenge. Geospatial alternative data providers that make use of polygons can provide more accurate data and decipher if a user has entered a POI (Point of Interest) and if they are a potential consumer or if they are an employee or a resident that resides above/below a POI.

The top geospatial alternative data providers also polygon (map) points of interest to increase accuracy. This increases accuracy and helps determine if a person was in a point of interest or passed by without entering the point of interest. Insight can be further increased by using multiple sources of geospatial data from apps, IP addresses, GPS, cell towers, and Wi-Fi Networks.

Apps - A popular method for collecting geospatial data includes apps and the use of Software Development Kits (SDKs) which are designed to provide app developers tools that help build and operate an app. The benefit of using SDKs means that developers do not have to start from scratch to build features. Companies provide these to app developers for free and in return, they can collect information from them like geospatial data and lat-long coordinates. SDKs can also help apps communicate with 3rd parties by using APIs. Common apps that use these SDKs include weather apps, dating apps, social media apps, and gaming apps. GPS – GPS is extremely common and is used by apps on mobile devices and in connected vehicles. GPS works by triangulating the location from signals that the connected device emits and a physical sensor, typically on a cell tower or from satellite mapping. This provides the lat-long coordinates. Accuracy is less than app data. Cell Towers – Cell towers can locate devices in an area by triangulating a call or data signal between cell towers. This happens when someone places a call or uses cellular data. Accuracy is less than app data. The Internet – Geospatial data can be obtained when someone uses the internet. A person's IP address corresponds with their device's location or if a person connects via a Wi-Fi network their location may be revealed by the network's SSID. Transaction Data – Transaction data can reveal the location of a person when they pay for items at the POS (Point of Sale). Consumer Transaction data and specifically credit/debit card data generally provides the address of the POS when it is a sale from a physical store location. Geotagging – Geotagging can be found when social media networks allow users to tag themselves or photos at physical locations. Users will also check in at physical locations.

The cost of a geospatial dataset varies widely and, in most cases, will depend on the amount of data that a data buyer requires and how granular they need it. The most expensive and robust Geospatial data will be at the row level, tagged to tickers, with 5+ years of historical data. It will include granular metrics like lat-long, address, dwell time, demographics, timestamp, ticker symbol, altitude, other stores passed on their route, the journey from home area to location, activity at a location, and more with a consistent large panel that is representative of a population and available daily. This type of alternative data can often be priced premium and over $100k per year. Higher level geospatial data is also available that will be cheaper to acquire but might just show a footfall number of users at a company and have it delivered weekly/monthly. An example of this would be 10,000 clients at Starbucks stores in NYC on the week of November 7th-13th. To acquire a list of POIs can be extremely cost-effective. If your use case is to track store locations and their openings and closings, this data can be obtained for hundreds of dollars. Most geospatial data providers will offer access to the data via a 12-month subscription and will slice the data for your needs but some will also offer bespoke project work which can be useful and cost-effective for once-off projects, especially where an abundance of historical data is not required. Typically, these once-off applications can cost a region of $25-35k.

Subcategory - Geolocation Data

As discussed Geo-location data tracks anonymized users and objects at specific locations and can be very granular with details on anonymized users at the row level with metrics on the lat-long coordinates, store location, company, ticker, timestamp, dwell-time, altitude, other stores passed on their route, the journey from home area to location, activity at a location, demographics and more. The data can also be aggregated and rolled up to provide the footfall numbers at stores of companies. The data can be obtained daily, weekly, monthly, or quarterly.

Subcategory - Point of interest (PoI) Data

POI data tracks the physical location or points of interest. This data can be used to track store locations and a common use case is tracking the number of store openings and closings.

Data Structure

  • Geospatial data can be delivered in several different formats and even directly to your preferred cloud environment. Some alternative data providers can deliver it in CSV via SFTP and can offer a dashboard view via their platform. If you need the data faster some have APIs set up. Some data vendors also have a data platform for visualized delivery of curated output.
  • There can be a wide dispersion of data vendors providing data mapped to the ticker. Some of the larger vendors provide data at the ticker level for ticker-level insight.
  • History varies but is usually 5-10 years of data.
  • Delivery can be daily T+1 or T+2 but sometimes rolled up to weekly or monthly.
  • Data is delivered by API, FTP, S3, and platform/UI.

Compliance Considerations

PII (Personally Identifiable Information) is the most important consideration for geolocation data. MNPI (Material Non-Public Information) is not that high on the list of concerns, but data provenance is, as always, important. PII is of utmost importance as the data could be used to track a person’s activities. As most geospatial data use App data from cell phones the risk is quite high that an individual’s activity could be monitored. This concern became elevated in the summer of 2022 after the US Supreme Court ruling on Roe V Wade. and several Geospatial alternative data providers came across possible legal concerns regarding the potential use by certain US states with restrictive abortion laws in using data. Data vendors across the geolocation market removed “sensitive locations” from their data. All data vendors endeavor to anonymize their data, but a high level of diligence needs to be given to the data, particularly at the time of ingestion and ETL. As is usually the case regarding MNPI, data provenance is extremely important.

There are some key considerations here regarding geolocation data. The first is the move by both Google and Apple to enforce greater privacy controls and opt-in by consumers. Informed consent is critical in both instances, but a user of the data should also understand the exact terms and conditions that Apple and Google offer. How restrictive or otherwise is the approval given on the use of the data by the end consumer? Users of the data need to be extremely comfortable with the compliance issues around the extent of the data opted into. Another critical point on data provenance became known in late 2022. Some third-party geolocation alternative data brokers were reselling old data and re-time stamping it as current. The geolocation data vendors call this “replaying” old data and it appears it was quite prevalent in the industry. Making assurances that data is clean and not manufactured in this instance makes data provenance that much harder to confirm.

Spread the word