• Adam Tilton

The Software Infrastructure Problem In the Wearables Market (and a Solution): Part I

Wearable and consumer health devices are useful because of what we can learn, and how we can use what we learn to make more intentional decisions. As I have been speaking to founders in the industry and exploring the roadblocks they encounter while bringing these devices to market, I’ve discovered that a common challenge faced during both development and production is the software infrastructure. By that I mean the software that supports building data-enabled experiences people find useful—not only in the first week of wearing their device but months, and years, down the line.

In this article, I attempt to offer a solution to that problem. I share an overview of the wearables technology stack, explain why too much focus on metrics like data velocity and model accuracy lead to software infrastructure issues and optimizing for the wrong outcomes, and why focusing on development cycle time and information loss enables the most economic iterations and guarantees we learn as much as possible from these devices.

The Wearables and Consumer Health Tech Market in 2022 Before we get into the wearables technology stack, let’s first look at the three types of wearable and consumer health device companies driving the market:

  1. Device manufacturers deliver a connected device with a dedicated app. Examples are Oura, Whoop, and Withings.

  2. App companies deliver a consumer experience on top of existing wearable devices, like Future, Cadoo, and Basis.

  3. Enterprises using wearable devices for purposes beyond the consumer experience. At the time of writing, 1,608 clinical trials on mention a “wearable,” with many companies using wearables to study or treat patient populations, like Empatica, Athelas, and HumanFirst.

This wearables and consumer health tech market is different from the previous era (think: FitBit, JawBone) in a few ways. First, circa 2014, everything was a pedometer, like FitBit, or some version of a smartwatch with notifications, like Pebble. Today, as this recent Economist article reports, “a rapidly growing array of electronically enhanced straps, patches and other ‘wearables’ can record over 7,500 physiological and behavioral variables.” I expect there will continue to be a significant increase in the coming years. In addition to the inability to track much data, there was also no meaningful differentiation amongst competitors during the previous era. In 2014, Apple consolidated demand with the Apple Watch and won the wrist (Apple’s annual revenue for wearables, home, and accessories exceeds $32bn).

Second, a host of companies are now building on top of the wearable stack that do not sell their own wearable. This wasn’t possible with the previous generation of devices, both because there’s only so much one can do with a step count (such as competitions amongst friends a la Stridekick), and because the device manufacturers weren’t interested in enabling an app community. Only Apple and the App Store were able to do this successfully, and now there are great success stories like Future (which maybe I should have seen coming, but did not). Similar to the growth in the number of devices, I think we’ll see an even larger growth in the applications built on top of other wearable devices.

Third, beyond the value to consumers, there’s increasing interest in wearable data for enterprise use cases. A number of pharmaceutical companies are using wearable data to study the impact of new interventions on metrics that wearables can measure, and insurance companies are thinking about how wearables can play a part in better serving their community of customers. There are also many companies using the metrics one wearable can measure in the development of another solution, e.g. using Oura sleep measurements to demonstrate the effectiveness of a sleep intervention, where the user of the intervention doesn’t necessarily need to wear an Oura (it’s only required to help develop the intervention).

The innovation and development in the wearable and consumer health tech market is encouraging. Although what each of these companies hopes to achieve varies, the tools and technologies required are similar across the stack. In other words, each company needs the same set of software and infrastructure to build data-enabled experiences. This means the software infrastructure problems many of these companies are experiencing are driven by the same root causes.

Software Infrastructure for Wearables and Consumer Health Tech Devices To understand the software infrastructure problem, we need to understand the technology stack for biometric sensing devices. Here, in broad terms, is how it works:

  • Sensor data is saved on the device by logging it to flash.

  • Logged data is transferred to a gateway device, typically over Bluetooth, and typically the gateway is a mobile device.

  • The gateway device reformats the logged data into a structured data format (e.g. JSON) and transmits the object to the cloud.

  • The received object is stored in an object data store, e.g. AWS S3 or a GC bucket.

  • The object file is extracted and transformed using some form of a directed acyclic graph, e.g. Athena at DataFlow (or a NoSQL solution like Glue Crawlers or DynomoDB is used here).

  • Transformed data is loaded into multiple datastores. For example, the structured metadata is loaded into a relational database (e.g., Aurora or BigQuery) while the time series data is loaded into a columnar store (e.g., Parquet or BigTable).

  • Access controls for privacy and security are applied on top of the data stores to protect from malicious activity, constrain access, or protect against data breaches.

  • Operational utilities including data labeling (Heartex or ScaleAI) are used to enrich the data with critical metadata.

  • Data exploration utilities like Collab or Sagemaker are needed for prototyping with the data on scalable cloud computing.

  • Data catalogs and model tracking utilities like ML Flow or a feature store are used to keep track of data products for future use.

Two great resources for diving deeper into modern data infrastructure are here and here, although neither includes the specific treatment of wearable (or even IoT) solutions. It always seems to surprise teams working with wearable data how much backend software infrastructure needs to be built to operationalize the data.

The Problem: Optimizing for the Wrong Outcomes The development of a new and novel wearable device can be (roughly) separated into 1) the physical device construction and 2) the algorithms and analysis of the sensor data. I’ll discuss the physical constraints in a bit of detail in an upcoming post, but looking solely at the software, I’ve found a common issue: too much focus is put on data volume and model accuracy.

Data Volume is the total amount of data collected, usually from targeted experiments to support algorithm development.

Model Accuracy is the performance of a specific machine learning model on a specific data corpus, e.g. F1 score, confusion matrix, sensitivity, specificity, etc.

These two metrics are important, but they are not the correct, high-level metric. Here’s why: optimizing for data volume puts too much attention on the front half of the pipeline, and can result in collecting tons of data that is useless in the end, either because it’s the wrong data or has the wrong system parameters. For example, data was collected using a version of the hardware that has a different sensor than the final product, or with a sensor configuration that changes how the data will be captured. You can tell this has happened when you hear a team say, “We have all this data and we can’t use it because of XYZ.” On the flip side, model accuracy puts too much focus on the back half of the pipeline and often results in complicated models being over-optimized on the wrong data. One common example with PPG sensors is building a model that doesn’t take into consideration the variety of skin tones in the real world. You can tell this has happened when a team deploys a model that doesn’t work as well in the field as it does on the trained data set or doesn’t scale well as the population variability increases.

Although these metrics are important to the overall product, attempting to optimize them too early often leads to delaying investment in other areas. For example, if your primary metric is accuracy, you might be tempted to spend extra time fine-tuning model performance instead of hardening the data pipeline that feeds model training, or hiring another machine learning engineer before you hire another backend infrastructure engineer.

In my experience, I’ve found focusing on data volume and model accuracy leads to optimizing for the wrong outcomes, and usually cutting corners on software infrastructures, which is where the problem lies. The Solution: Information Loss and Cycle Time I believe better metrics for the end-to-end infrastructure stack are cycle time and information loss because they limit the risk of being wrong. The primary objective is to extract whatever can be extracted from the data. Implicit in this objective is that we don’t know what we’re looking for, but we believe something is there to be found. It makes sense, then, to focus on building systems that increase the number of hypotheses that can be tested. Information Loss is the result of inexact approximation or discarding data, e.g. calculating and storing features on the data instead of the raw data.

Cycle Time is the amount of time to go from discovering a new hypothesis to testing it with new or existing data, measured in weeks, sprints, etc. Minimizing the system's information loss guarantees you can always provide the best possible answer given the available information to whatever query is being asked. This is important in the case where a new and novel product feature is introduced after the data collection has taken place, and which requires a different bit of information than any previous product feature. For example, even though the expected set of product features requires a known set of data features, it’s prudent to collect the raw data in any case rather than only the expected set of features. Minimizing information loss will often require the development of technologies to do so which are not needed to optimize accuracy or data volume.

Minimizing the team’s cycle time pays back enormous dividends over time, but requires investment upfront. Since it’s easier to solve one problem at a time, this is usually best done by focusing on a benign feature to deliver first. For example, select a model from open literature that is known to work well enough (but maybe not well enough to meet the product's specs), and build out the tools needed to deliver that model. In this way, the focus is on the problem of building the infrastructure and not the problem of training a model. When the infrastructure is in place, add the complexity of training additional models, and iterate concurrently on both infrastructure and model development. Minimizing cycle time might not minimize model accuracy or total data volume collected, and may actually negatively impact these metrics in the short term.

Challenges to Minimizing Information Loss and Cycle Time

In my experience, the biggest challenges to minimizing information loss and cycle time are due to the physical constraints of the device, the custom nature of the hardware, and the plethora of workflow tools required to work with the data. In a few follow-up posts, I’ll discuss each of these in more detail and outline a few approaches I’ve seen work well to overcome these obstacles. Stay tuned!

Acknowledgements: Special thanks to my friends Karthik Bhaskara, Abhijeet Patra, and Brinnae Bent for their feedback on this post!