How do you get ahead of the game by using external data?

Benjamin Naderi
4 min readMay 4, 2022

The COVID-19 pandemic changed consumer behaviour and market demand enormously leading to many AI and Machine Learning models being obsolete. As a result, some companies accelerated their efforts to use external data rather than focusing on internal operational data.

Although many companies are leveraging their internal operational data very well, still relatively few companies have discovered the value of tapping into third-party or public data sources. Leading companies know you cannot build high-quality models by using only internal data. While operational data is great for plotting a picture of what you did last month, external sources can give you insights into what you can do tomorrow.

Photo by Viktor Krč on Unsplash

Which companies are ahead of the game?

50% of the top 100 hedge funds use external data. They use consumer spending and lifestyle data to rate their portfolio’s potential growth. Investors collect job postings, employee-turnover data (from professional networking websites), and patent filings to predict the financial performance of a certain company.

For example, Orbital Insights collects anonymised satellite images from parking lots across the US and provides data about where and when people are shopping. Orbital Insights serves industries such as Financial Services, Consumer Goods, Supply Chain, and Energy.

Benson and Magee from MIT showed that the recency and importance of patent filings in technological domains can forecast how fast a certain technology will grow. For example, Dyson patented their high-end supersonic hair dryer in 2013.The hair dryer came to market in 2016. This insight could be key information for a good investment.

Jørn Lyseggen highlights in his book ‘Outside Insight: Navigating a World Drowning in Data’ that a glass window manufacturer uses geographical data of reported crimes to improve their demand forecast for each region.

Kabbage is a fintech company that funds small businesses through an automated lending platform. Brown from MIT has explained that Kabbage is using data from social media, sales and shipping history to determine the creditworthiness of small businesses.

Real-estate companies are predicting the potential future value of certain properties based on foot traffic data, restaurant ratings and entertainment activities in proximity to their target area.

Aaser and McElhaney from McKinsey have illustrated the use cases of tapping into external sources in the following overview.

Source: Aaser and McElhaney

How can you start using external data?

For a research and development approach or an experimental project, a small team can start with the idea of using some external data and building a minimum viable product (MVP).

For a scalable project and for a more sustainable approach, the following steps from McKinsey can be followed to create a centralised pipeline.

Step 1 — Define Use Cases

Identify solid use cases that can take advantage of external data. A team consisting of business stakeholders and data strategists can draft use cases that offer uplift or value.

Step 2 — Define Roles

Establish roles to support the efforts — including data scouts and data reviewers.

Standard roles:

  • Purchasing experts negotiate contracts with data providers
  • Architects and DevOps engineers develop platforms to integrate and manage access
  • Data engineers ingest and prepare data
  • Data scientists and analysts build models and apply data to use cases, measuring the value of models and analyses

New key roles:

Two new roles are essential in an effective data project with external sources. Depending on the size of the project, these roles can be combined with standard roles.

  • Data reviewers evaluate how to use external data and assess the risks (for example GDPR related risks)
  • Data scouts/strategists work with the business to map relevant external sources

Step 3 — Know your sources

What are your data options?

Public data: Government agencies publish high-level data for free. For example, Statistics Netherlands (cbs.nl) provides public data in 22 different categories ranging from hotel guest trends per city to drone flight path data. The U.S. Government’s open data (data.gov) has more than 200,000 datasets from a variety of government agencies.

Data marketplaces: External Data Platforms are new players in the analytics industry to provide data and signals to their customers. Gartner has started to review and rank these vendors since 2021.

Step 4 — Evaluate Data Value

In most cases, you should be able to evaluate the data before purchasing. You can leverage manual or automatic methods to understand the potential value of the data. Data reviewers are key for interconnecting the business needs and value of data before data scientists and analysts start consuming the external data.

Conclusion

The COVID-19 crisis accelerated the trend of tapping into external data sources. Businesses realised their predictive models are not performing well by just using internal data. To have a successful leap towards new sources, a centralised strategy should be in place. A proper strategy synthesises the external data with internal data to make some intelligence that is relevant to your use cases.

Are you using external data in your projects? Do you know any other successful stories?

--

--

Benjamin Naderi

Analytics Expert, Writes about Data and Machine Learning