Alternative medical data, AI, and new bundlings/unbundlings in healthcare

Reza Khorshidi
6 min readFeb 12, 2020

In January 2017, in advance of Johnson & Johnson’s announcement of its $30 billion acquisition of Actelion — a Switzerland-based pharmaceuticals company — the Johnson & Johnson jet stayed parked near Actelion for five days. To some investors, this flight’s information could have been a leading indicator for the performance of Johnson & Johnson’s — or even other pharmaceutical companies’ — stocks. That’s why companies such as Quandl try to turn corporate aviation data (like the Johnson & Johnson one I mentioned) into insights for financial services professionals. Of course, the story is not limited to flight data; for years, participants in the financial markets such as investment banks, exchanges, and hedge funds have been using various sources of data to build a competitive edge in quantitative trading and investment. This led to a booming industry in investments and asset management, called “alternative data” (or “alt data”, in short).

(Right) The exponential growth in the alt data market — both in terms of money spent and the number of vendors — makes it a key topic of interest for many investment and asset management groups. (Left) The diversity of the main data modalities that fall into this category is high; ranging from social media to ocean vessel tracking.

While finding similarities between investment banking/asset management and healthcare is not common (and at times viewed by some as a taboo), I think a similar trend is happening in healthcare, which I refer to as Alternative Medical (or Health) Data (AMD or AHD). The field of medicine has relied on a range of symptoms, and tests (e.g., blood chemistry, medical imaging, and more recently, genetics), to open a window into an individual’s health and indicate their risks. Yet, a growing number of research studies show the use of new modalities of data — that were not thought of as medical before — to generate reliable medical evidence. For instance, data such as daily step count and the number of push upshave been shown to be significantly associated with all-cause and cardiovascular mortality, respectively.

Kaplan-Meier survival curves (showing percentage of people still alive [y axis] as time goes on [x axis] after a baseline observation) for the cumulative risk of cardiovascular disease outcome in 5 different “push-up categories” (shown with different colours and labeled by the number of push-ups performed during baseline examination). The difference between the lowest and highest number-of-push-ups categories (‘>=41’ vs ‘<=10’ lines) shows a clear difference, and hence can be seen as a predictive signal.

While some AMD require one to collect data actively (such as entering the number of push-ups to an app) or passively (wearing a watch that records step count, for example), there are other examples where individuals require minimum to no explicit data collection effort or contact. For instance, in a 2019 study by a group at Imperial, researchers successfully mapped the pictures from one’s neighborhood (similar to the street imagery provided by Google Maps) to his/her overall health and wellbeing scores; the predicted and actual scores for many social, environmental and health variables were nearly 80% correlated. In another 2019 study, researchers showed how a simple video recording technology can result in “high-throughput, contact-free detection of atrial fibrillation”, by analysing videos of people’s faces. And another recent paper, used “unobtrusive monitoring of behaviour and movement patterns to detect clinical depression severity level via smartphone”.

Alternative medical data (AMD) are digital signatures of health that can provide an alternative to the standard/traditional health data (or predict them, or give them new applications) in tasks such as phenotyping patients, indicating their risks, and predicting their medical outcomes.

A vast genre of such seamless/unobtrusive AMD category is emerging for voice; there is now an Alexa skill that can detect cardiac arrest from your voice with nearly 100% accuracy; or, read this paper from NeurIPS 2019 on how GANs has been used to predict what one looks like, based solely on their voice. And of course, by now you probably know the story of how deep learning models have successfully mapped images of one’s retina (which now can be taken by an iPhone equipped with a simple additional sensor) to the deterioration of diabetes, as well as their blood pressure, smoking status and various other cardiovascular risk factors; the same technology has been shown some promising results for early indication of Alzheimer’s disease. And lastly, who knew a visit to a tourist attraction can function as a breast cancer screening; read this story. I’m sure there are plenty more of such research findings (e.g., have you heard about Human Screenome Project?) whose review will go beyond the scope of this post.

By feeding simple time-frequency features extracted from voice, to fairly standard classifiers such as SVM and Random Forest, researchers have trained a strikingly accurate model for detecting agonal breathing vs sleeping sounds (TNR = 99.5% at TPR = 97.2%).

Of course, the magic in AMD usually comes from the seamless collection of such data at a higher frequency and at scale (i.e., large N), plus machine learning (including deep learning) algorithms’ ability to learn the reliable predictive patterns hidden in such data. I believe that the result of this combination has the potential to redraw some of the lines in the medical ecosystem that we know today. In an interview with HBR, Marc Andreessen said: “In business, there are two ways you make money: you either bundle or unbundle” — similar to a quote by Jim Barksdale. While extremely simple, this statement captures the journey many tech innovations went on to become commercial successes that we are familiar with: The bundling and unbundling of messengers in social media companies, and the new bundlings and unbundlings of credit cards, payments and other services in banking today (see this talk, for more on this in finance). In my view, the collection of multi-modal data at scale (e.g., through our smartphones, wearables and various environmental sensors) has the ability to unbundle major parts of medical ecosystems (particularly in primary care, where continuous screening, risk assessment and provision of personalised lifestyle and medical advice play a key role). In the longer run, this can lead to the rise of new players in the ecosystem who can bundle some of these new advantages into new care pathways and offerings for patients; they can position themselves as the optimal owners of some of the health risks and care pathways (through their efficient operations, superior insights, deeper specialisation, and unique offerings). This effect will be even more significant when considering that we will soon have a generation of “digital native” patients.

In summary, I think:

  • In the short to medium term, the effects from ADM+AI will be more in the form of “better, cheaper, faster”. That is, making the collection of data, assessment of risk, and so on, better, cheaper and faster.
  • In the longer term, such early results and POCs, are likely to be paired with reimagination of care pathways and lead to the growth of AI-first (and in big parts, AMD powered) solutions and business models in health (e.g., an AI-first virtual hypertension management or diabetes care; or, an AI-first life and health insurer only carrying chronic cardiovascular risks).
  • The effect is likely to be bigger in areas such as cardiovascular medicine and mental health, where the need for continuous multi-modal data collection is higher, and early alt data POCs have shown great success.
  • With many measurements (or variables) coming from AMD, researchers walk in a risky territory of “search for true signals vs. P-hacking” (i.e., finding spurious correlations that can exist by chance alone).
  • While AMD+AI is likely to do a great job in predicting risks, its usability for treatment planning will be a bigger challenge (compared to techniques such as MRI, for instance, which can assist the identification of where the intervention should target).
  • And of course, this will lead to many ethical and social questions about how various forms of AMD should be collected, owned, handled, and used.

I will write some follow up articles regarding the bullet points listed above. In the meantime, I would love to hear your thoughts on alt medical data modalities/developments/examples data that I missed, additional challenges you see in these areas, and of course, any critique you have about the ideas mentioned in this article.

Disclaimer: This is a personal opinion and does not necessarily reflect the viewpoints of the organisations that I am affiliated to.



Reza Khorshidi

Chief Scientist at AIG, and PI at University of Oxford’s Deep Medicine Program; interested in Machine Learning in Biomedicine and FinTech