Published on December 22, 2020
Orthogonal was excited that
Bernhard Kappe, our CEO and Founder, spoke at Virtual Engineering Week on Friday, December 4th, 2020 on the topic of . This talk was hosted by AI & Machine Learning in Medical Devices: It’s Getting Better All the Time Informa and their Medical Device + Diagnostic Industry (MD+DI) media property. (Shout outs to Laurie Lehmann and Naomi Price for setting this up and guiding us through the process.)
If you missed the talk and want to check it out, or you did catch the talk and want more or want to share it, you have a few options:
I’m Bernhard Kappe, CEO of Orthogonal, a developer of Software as a Medical Device and Connected Devices. AI can be used in many ways, from improving manufacturing to predicting when your hardware needs maintenance or the best route for a UPS driver to deliver packages. These are all valuable, but I’m going to talk about using AI as a part of your medical device because that’s where I think the greatest value is. My background is in Mathematics and Artificial Intelligence. I’ve been in software development for about 25 years, worked at Microsoft on Business Intelligence, then in financial services, and about 12 years ago got into MedTech and eventually into Software as a Medical Device. When I got into this space, I was intrigued by the gap between best practices outside of medical devices and what was being practiced in the space. And since then I’ve worked on bringing in agile and other fast feedback techniques and using them to optimize for both speed and quality.
My company, Orthogonal, helps medical device and pharma companies build Software as a Medical Device (SaMD) and Connected Device Solutions. We have deep expertise in technologies like mobile, cloud, AI algorithms, and device connectivity. We’ve worked on everything from insulin pumps and CGMs to active Implantables for neurostimulation, and with non-medical device companies like Google and Bose. We’ve developed efficient processes within our ISO 13485:2016 quality management system designed to accelerate our clients’ pipeline of successful SaMD products.
Let me give you a bit of background on how AI became so important in our world. 9 years ago, Marc Andreesen, founder of Netscape and partner in the VC firm Andreesen Horowitz, wrote a now famous article in the Wall Street Journal. His observation was that “software is eating the world, in all sectors. In the future, every company will become a software company.” How has that statement aged?
Pretty well, I think. If you look at the world in 2020 this continuing to happen. Companies like Apple, Amazon, Google, and Microsoft are all worth well over Trillion dollars each. The software keeps becoming more important, and is disrupting more industries, as we continue to become more connected.
Some of this is because the hardware that supports all of this software keeps following Moore’s law – Doubling in capability for the same cost every 18 months. Just take a look at the original iPhone from 13 years ago compared to today. Now think of what that will look like in another 5 years. This has happened in so many areas – connectivity, storage, processing power, that what you can do with software just continues to accelerate.
30 Billion IoT devices. 4.4 Billion Internet Users, 3.5 Billion Smartphone Users. That’s a lot of data. In fact, it’s 40 trillion gigabytes or 40 zettabytes, and more and more of that is on the cloud. It’s increasing by 15% to 20% per year. Software is creating a lot of that data and consuming that data. And AI is a big way that that data Is being consumed. AI can find connections in that data, draw inferences from it, create algorithms, and continue to improve them, which is beyond what people can do on their own. As a consequence, software is eating the world even faster.
Let me give you an illustration of this. A recent story in the Wall Street Journal reviews the Apple AirPods Pro a year after their initial review. This is not a review of the new AirPods Pro, this is a new review of the same AirPods Pro, one year later. What’s different? Better Software. Apple kept collecting data and feedback, and continued to improve the software, to the point where the journal did a whole new review of the product. A lot of that was using AI.
Or take Tesla – one of the reasons that Tesla is valued as much as the three largest carmakers combined is because they work more like a tech company than a car maker. There’s hardware here, but it’s modular, and software plays a much bigger role than in traditional vehicles. Data is continuously collected, algorithms are working – and there are updates that improve the performance on a frequent basis.
These devices get better in large part because of AI. Real-world data comes in from usage at at scale, from other data about users, their environment, the world they live in, and what they do. AI can discover new patterns and correlations that humans can’t, turn those into algorithms much faster than humans can, and continue to improve them while humans are sleeping. These same patterns are happening in medical devices…
Let me give you an example. This is from our client Quidel, who makes many point of care diagnostics, including some for COVID-19. They announced this at their investor day on November 1. In this new system, rather than a large standalone machine: There’s a small diagnostic device able to read a standard Sofia Fluorescent Immunoassay Cartridge. Images of the cassette are captured on the Sniffles device and transmitted via Bluetooth to a mobile device. The result is then interpreted using a proprietary AI software model that is downloaded to a mobile device. st
The system has a number of benefits: First is Affordability – The manufacturing costs are less than 20% of the costs for their current Sofia 2 platform. Second is Mobility and Connectivity, Cellular connectivity and cloud integration enables testing in new markets beyond the traditional point of care. Third is rapid manufacturing – The boxes are much simpler, so manufacturing at scale is much easier. And fourth is ongoing improvement – This AI algorithm can be improved over time through continuous offline learning.
PhysIQ is another customer of ours, and they have a number of clearances for AI-based personalized cardiac anomaly detection for at-risk patients. Their approach is based on detecting changes to baseline heart functioning, data they get from off-the-shelf wearable vital signs monitors. The idea originally came from their previous company, SmartSignal – which had initially put sensors on aircraft engines, did monitoring to establish a baseline, and then could predict engine failure based on differentials to that baseline. They sold the company to GE Digital in 2011, and the technology is now monitoring more than 16,000 assets worth $37 Billion across many industries. They applied the same approach to heart monitoring, originally by constructing algorithms in a similar way, but as they went on and got more data, they switched to using neural networks to generate the algorithms for detecting changes. They found that this is just a much faster way of generating new algorithms.
The FDA cleared their first AI-generated algorithm in 2014, and since then the number of cleared and approve algorithms has been increasing at an accelerating pace – about 50% per year. This infographic from the Medical Futurist shows some of these approvals and clearances, broken down by area, like Radiology, Cardiology, Psychiatry, and Endocrinology, with new areas appearing every year. Most of these algorithms are locked algorithms – they were generated by AI but don’t change in production. This makes it easier for regulators like the FDA to evaluate them – they’ve been evaluating algorithms for a very long time. If you notice, radiology has the most clearances. And there are good reasons for this: This is all about data analysis, and there’s a lot of data that has been analyzed. And the reference standard that algorithms are evaluated against are two Radiologists looking at an image, which is part of every workflow. So in that case, it becomes theoretically possible to do continuous updates to the algorithm in production.
That’s a new problem set for the FDA, and while they haven’t issued guidance on it, they did conduct a workshop on it and issued a discussion paper on changes to AI and Machine Learning based Software as a Medical Device. This is a pretty good paper in terms of laying out processes that should be in place and fits in with both the SaMD Precertification program and their guidance on when to do regulatory filings for software updates.
I wanted to share some things that we’ve learned along the way that you should consider as part of your AI journey. First, think about where AI might add value. Any time you can potentially discover something that you couldn’t before. Trends, correlations. For example, FICA credit scores are strongly correlated with therapy adherence and outcomes. When you can optimize an algorithm with more data. Like Quidel, Apple, or Tesla, often combining these two. There are lots of examples out there already, and you should review them and think about how they might apply to you. Both inside medical devices and outside, as in the example of SmartSignal and PhysIQ.
Second, you need to understand how complex your problem is, both in terms of the domain and inputs. The more complex these things are, the more data you need to train and to verify your models. For example, in Quidel Sniffles, there’s a standard assay, one camera with fixed distance and lighting conditions. Now compare that to using a smartphone camera, to take pictures of skin to identify skin cancer lesions. You would need a lot more data. I mean, a lot more data. We’ve done both, and it’s exponentially more, even if you don’t take all the different smartphone models into account. If you need more precision, you’ll need even more data.
Third, think about the availability of a Reference Standard. Whenever you’re testing an algorithm, you need think about what you’re comparing it to – the reference standard. As I mentioned before, in radiology the reference standard is two radiologists interpreting the data, which is already part of the workflow. So the cost of validating an algorithm, and a new version of an algorithm – is pretty low. That’s one reason there is so much AI activity in radiology. If you look at Glucose monitoring by contrast, one of the things you need to do to assess the performance of a CGM system is to generate a MARD score – mean absolute relative difference. MARD is the average of the absolute error between all CGM values and matched reference values, which are gotten from blood tests. Getting those blood tests has a cost and is not part of everyday practice – so you need to pay for those when you update your algorithm. That makes it much more expensive to evaluate and validate new algorithms.
Another thing to consider is data availability and quality: This includes your own data – how are you making sure that it’s good data, that you can trace where it came from, isolate bad data, make sure that training data is separate from verification data, etc. For third party data, and there’s potentially a lot of it – large data sets that may be available from registries, non-medical data sources like weather, pollution, FICA scores, etc. How do you deal with potential issues with data? Evaluating tradeoffs can be tricky. There’s been a lot of studies around bias in data – both collection and interpretation. Just because something works for one population, does not mean it translates to another.
Of course, it wouldn’t be a medical device if you didn’t have to consider Patient Risk. Let me give you a few examples, from low to high risk. So a solution that helps determine when and how to remind patients on therapy adherence – might be pretty low risk, or not even be a medical device. A diagnostic aid to help a physician make a better decision, like many imaging solutions, or PhysIQ, as mentioned before – is still pretty low risk. That’s one reason why we’ve seen more in diagnostics – the benefit is high, the risk is relatively low. An instrument that provides a diagnosis, independent of a physician, like Idx-DR, which provides an automated diagnosis for diabetic retinopathy – higher risk. An algorithm that automatically changes insulin dosage based on history, CGM readings, patient inputs, etc. – a lot higher. And one that automatically changes pacemaker settings is even higher. This will affect how much testing, how much risk analysis you need to do, and how much your algorithm will be under scrutiny.
Last, you need to think about where you will deploy your algorithms. Deploying them in the cloud is in many ways the easiest – you have lots of processing power, you can scale easily, you can define the specific graphics processing unit (GPU) or field programmable gate array (FPGA) that you’re using, likely the same one you trained your algorithm on. That’s the case with something like PhysIQ. A device typically does not have the same processing power, so we see this very rarely. We do see a lot of deployment of algorithms on mobile phones like Quidel Sniffles. If you’ve engineered your apps for modularity, it can be pretty easy to update the algorithms in your apps, but there is a lot more work in translating and testing your algorithms – different smartphones have different floating-point precision in their GPUs, and you may need to do a fair bit to tune your algorithm for specific smartphones.
There’s a lot that goes into this, but here are a couple of best practices to get you started. First, whether you’re using AI or not, you should design for data capture from the start to use data from instruments, devices, and apps. This includes patient-generated data including data on user behaviors and smartphone sensors. It also third-party data such as data on outcomes and reference standards. Think about where to put the data how to show where it came from and how it was used. This is not just for a clinical trial, but for real-world data when you’re out in production. Second, design for (continuous) improvement. As we’ve seen with other areas – more and better data keeps becoming available, as is better AI. So you need to think about how you design for efficient updates. There are a lot of best practices for automation in SaMD that can be leveraged here.
I’ll leave you with a quick story from outside of medical devices. In 1997, Deep Blue, a computer program that was hand-coded, beat Gary Kasparov, the reigning world champion, for the first time. It was crushing. Looking back on it, Kasparov has said that he was likely the first knowledge worker whose job was threatened by a machine.
Since then, computers have been steadily improving, mostly through AI. AI programs routinely beat the world’s best human players, and the new challenges are around humans playing programs that start with a pawn down. My brother Dietrich, also Orthogonal’s CTO emeritus, and a much better chess player than I am, has been working on the new Dragon algorithm for Komodo this year: Applying new NNUE – Neural Network Updated Efficiently) techniques, he just beat American Grand Master Hikaru Nakamura with a 2-pawn deficit – 5 wins and 3 draws. A knight or bishop is next.
Kasparov has had a long time to think about AI since his defeat in 1997. Here’s what he had to say in a recent Wired interview: “I think it’s important that people recognize the element of inevitability. When I hear outcry that AI is rushing in and destroying our lives, that it’s so fast, I say no, no, it’s too slow. Every technology destroys jobs before creating jobs. I think it’s important that, instead of complaining, we look at how we can move forward faster.” (towards the job creation part. With … future machines, I describe the human role as being shepherds. You just have to nudge the flock of intelligent algorithms. Just basically push them in one direction or another, and they will do the rest of the job. You put the right machine in the right space to do the right task.”
That’s it for my talk. Thanks for listening, I hope you found this useful and I look forward to your questions.