Return to site

CornerstoneAI: Automated data cleaning & no-code data science

CornerstoneAI unlocksthe value of healthcare data by abstracting away the arduous task of datacleaning, making data scientists 10x more productive and enabling morecompanies to make data-driven decisions. 

Healthy Ventures led a $5m seed round to secure10% ownership and a voting board director seat. 

Data Isn’t Valuable If You Can’t Use It 

The healthcare industry talks a lot about its data, and for good reason: organizations like large pharmaceutical companies have petabytes of data, and each petabyte is akin to 500 billion pages of standard text. Pfizer, for example, has over 10 petabytes of data.   

The dirty secret is that most data is useless. To use that data ­­­— whether to improve drug design, run faster clinical trials, or even help patients get the financial assistance needed to afford their drugs — companies must be able to make sure the data is correct, well organized, non-duplicated, and formatted. In short, the data must be “cleaned” before it can be used.  

This is anathema to data scientists, who anecdotally estimate that they spend 90+% of their time on cleaning their data, and only 10% on creating insights. 

The dirty data problem over-indexes in healthcare 

Cleaning data is expensive in any industry. Data scientists often cost $500k per year, and they’re hard to find. There is a critical shortage of data scientists – in the US, 2/3rds of all positions are unfilled. Without enough data scientists, companies are unable to marshal their data (at any cost). They’re flying partially blind.  

This blindness is particularly acute in healthcare because the sector is going through an explosion in the types and volumes of available data. Healthcare data already accounts for 30% of the world’s data volume being generated today, and is expected to grow at a 36% CAGR by 2025, outpacing other industries like manufacturing, financial services, and media.

All data is NOT the same: cleaning healthcare data requires a healthcare company

In healthcare, unlike other industries, the outlying datapoints are often the most important – why did these few patients have severe adverse events? Why are only these few patients responding? Why do patients in this region seem to not have access to this drug?  

Data cleaning solutions in other industries (e.g., IBM,Osmos, Trifacta) do not work in healthcare because they discard all these outlier data points – they mistake the signal for noise. Instead, healthcare has historically relied on people to manually check data in clinical trials, and data scientists paired with clinicians to do further manual adjudication.    

By productizing and abstracting away the task of data cleaning, Cornerstone frees up data scientists time to make them 10x more productive, thus assuaging the labor shortage and equipping companies with better data-driven decision making power.

Cornerstone’s Solution: Making Data Insights Easy  

Data cleaning is Cornerstone’s foothold into its ultimate goal: insight. Cornerstone will be known for universal data science algorithms productized via no-code / low-code data. 

Data cleaning serves as a smart and necessary beachhead as: 1) it is necessary for creating insights, and 2) because it anchors Cornerstone into their clients’ data science pipelines (without needing data rights).  

To get to insights, Cornerstone has prioritized its product pipeline around 2 principles:  

  1. Creating customer stickiness via its flagship data quality product, which makes it easier for customers to combine and interrogate their data by standardizing their data sets around Cornerstone’s schema. Partnerships with business intelligence companies, like, will further embed the product. 
  2. Maintaining current workflows with no-code data science-as-a-service products. These willc allow companies to perform more sophisticated data science tasks without needing a data scientist, right in their current workflows. Such insights may include: 
  • Clinical Validity. Answering the question: is this datapoint a relevant clinical finding? 
    • Classification: Very similar to synthetic data generation (creating look-alikes). Use a small group of known data to project an outcome on unknown data. 
    • Comparability Across Experiments: Ensure the comparability of experimental data in a rigorous and reproducible way acceptable for regulatory submissions (e.g., FDA).   

As Cornerstone allows its customers to do more with their own data, it will work its way up the stack, coming closer to end-users.

Who are the customers?

Cornerstone’s customers fall into two categories: 

  1. Those that have a single large dataset that grows over time. Examples are: healthcare software companies, medical device companies, and pharmaceutical companies. 
  2. Those that have many discrete data sets and are continuously generating more. Healthcare services companies, such as contract research organizations (CROs), are examples of companies with this need.

Scaling with SaaS 

Cornerstone’s product is scalable software that can be used across multiple use cases in life sciences. Customers choose between using: 

  1. Light-weight product hosted in Cornerstone’scloud 
  2. “On-prem” product hosted at the client site or in the client cloud. 

In both cases, customers can choose between using Cornerstone’s UI or integrating Cornerstone’s APIs into existing data workflows. Neither product requires human-in-the-loop and both are scalable to large and varied datasets.

The competition isn’t built for scaling in healthcare

Tuva Health is the only dedicated healthcare data cleaning company. It is open-source and is written on top of dbt (referenced above). While Tuva Health begins with data cleaning, their goal is to be a data exchange - not a data science company. It uses a rules-based approach.   

Healthcare needs a self-learning, not rules-based approach. Meaning: because the results from the data are context dependent, it is literally impossible to create large enough databases to train a rules-based approach without clinicians manually tagging, which is prohibitively expensive and time consuming. Cornerstone is this self-learning approach.

The founding team experienced the problem first-hand and built acompany to solve it. 

Michael and Andrew founded the company in mid-2021, immediately started working with 2 companies who acted as development partners and were ready to commercially launch the Workbench product in Q1:2022. By Q1:2022, the company had already signed 4 commercial contracts.


All Posts

Almost done…

We just sent you an email. Please click the link in the email to confirm your subscription!