MedTech Teams Should Stop Paying for Compliance Twice

Randy Horton

Published on: June 5th, 2026

Most MedTech software teams do not struggle with compliance because they lack discipline. They struggle because teams create too much compliance evidence after the software has already been designed, built, and tested.

Software gets built in one system. Tests run somewhere else. Requirements sit in another tool. Verification protocols live in documents. Then, as a release approaches, engineering, quality, and regulatory teams try to reconstruct the full story: what was built, why it was built, how it was tested, what passed, what failed, and how it traces back to user needs, risks, and requirements.

That reconstruction effort is expensive.

The obvious cost is the time spent gathering documentation. The higher cost comes from delayed releases, late-stage rework, avoidable defects, context switching, and engineers spending time proving work after the fact instead of building the next product capability.

For teams building SaMD, connected devices, mobile apps, and cloud-backed medical products, this model breaks down as software complexity grows.

The question is no longer, “How do we document compliance at the end?”

The better question is, “How do we make compliance evidence a byproduct of the way software is built?”

Orthogonal will explore that shift in its upcoming webinar, “Injecting Compliance into Code: Automating Compliance with AI in the MedTech SDLC.”

Late Feedback Is Where the Cost Starts

In a recent conversation about the webinar, Larkin Lowrey, CTO of Orthogonal, pointed to three metrics that show whether test-driven development and behavior-driven development are improving software delivery:

Production defect rate
Late defects found during verification
Developer velocity

Those are business metrics as much as engineering metrics.

A defect found in production does not just damage customer trust. Depending on its severity, it can trigger an expensive and rigid regulatory process. When a user reports an issue, the team may need to treat it as a formal complaint, assess whether it creates a safety risk, determine whether corrective action is required, and document the decision trail. Instead of shipping a quick hotfix, engineering may have to pause roadmap work, support formal verification and validation testing, and generate the compliance evidence needed to deploy the patch safely.

A defect found during final verification can also create a major release cost. During the “run for the record,” a defect may force the team to reset, rerun the entire suite, or spend time proving which subset of tests must be repeated.

By that point, the original work may be weeks or months old. The developer has moved on. The product decision may need to be revisited. A small defect can force a much larger review.

This is where TDD and BDD improve release predictability.

TDD pushes teams to define how code will be tested before it is written. That forces developers to think earlier about edge cases, failure modes, and how the software will be used.

BDD adds the user’s perspective. It asks what condition the system starts in, what action the user takes, and what result should follow. Because users are variable, BDD forces teams to think through the different ways someone might actually interact with the product.

When teams define behavior up front, developers have fewer assumptions to resolve later. QA gets clearer criteria. Product owners can see whether expected behavior matches user intent. Release planning becomes less dependent on late discovery.

The third metric, developer velocity, matters more than many teams expect. Without early feedback, developers often finish a task, move on, and only hear about a defect one or two sprints later. They then have to reload the old context, debug the issue, fix it, and wait for QA to test it again.

That is time they are not spending on new product capability.

The earlier teams find defects, the less rework there is. The later they find defects, the more likely the fix touches architecture, requirements, testing, documentation, and release timing.

Documentation Should Produce Value Before the Audit

Larkin distinguishes between documentation that sits outside the workflow and documentation that development teams actually use.

A verification protocol in Word or Excel may be necessary, but it does not help the team unless someone opens it, reads it, interprets it correctly, and applies it at the right moment. That adds cognitive load for developers because the protocol sits outside their normal workflow.

The same is true for spreadsheets that need manual updates, redlines that drift across versions, and traceability matrices assembled after development is mostly done.

That workflow gets harder to sustain as the product, team, and release cadence grow.

A better approach is to write acceptance criteria and verification logic in a structured format that humans can read and machines can execute. Larkin discussed Gherkin, a plain-language format built around “given, when, then” statements.

The “given” defines the preconditions. What state is the application in? Is the user logged in? Is the phone paired to the medical device?

The “when” defines the action. Is the user pressing a button, entering a password, viewing a screen, or trying to turn on a device function?

The “then” defines the expected result. What should the system do after that action occurs?

A product owner can understand it. QA can review it. Developers can run it. Automation can execute it.

When Gherkin statements live in Git alongside the code, they become part of the team’s daily work instead of a separate documentation exercise. Developers can see the acceptance criteria while they build. Automation can execute the tests. Results can be uploaded into tools like Jira and Xray.

That changes the role of documentation.

Verification protocols, test results, verification reports, and traceability no longer have to be recreated as separate documentation projects. Test results can link to stories. Stories can link to requirements. Requirements can link to user needs, risks, and controls.

At the end of development, the team is not recreating the evidence trail. It has been building that trail all along.

Larkin put the end state simply: when the work is already captured in the tools developers, QA, and product owners use every day, many documents become, in effect, “a button press.”

That does not mean compliance work disappears. It means the work happens earlier, in a form the whole team can use.

You Cannot Hire Your Way Out of Software Complexity

MedTech software now has to work across a growing range of phones, operating systems, cloud services, Bluetooth behaviors, cameras, and user environments.

No team can test every possible combination manually.

For Bluetooth-connected medical devices, Larkin described testing real over-the-air communication with a phone and a Bluetooth simulator, sometimes built with Nordic developer kit boards or USB dongles. Instead of mocking Bluetooth behavior, the team can test through the full phone Bluetooth stack and verify how specific phone models and OS versions behave.

That matters because Bluetooth behavior is not perfectly uniform. Error codes can vary by phone model, operating system version, and chipset. A device farm does not need to include every phone on the market. It needs to represent the expected user base well enough to expose the combinations most likely to create risk.

The second layer is field instrumentation. When the software encounters unexpected Bluetooth behavior, it should log the event, report it to the cloud, fail safely, and give the team data to decide whether a particular phone, OS version, or hardware combination belongs on a blocklist.

Camera-based diagnostic features create a different BYOD challenge.

Image quality can vary by phone model, lighting, lens condition, case obstruction, fingerprints on the lens, and user environment. If an AI model depends on image quality, small differences in camera performance can determine whether the feature should be enabled on a given device at all.

The answer is not to test everything.

It is to understand where variation creates risk, automate the tests that cover highest risk scenarios, instrument the product in the field, fail safely when something unexpected happens, and restrict unsupported configurations when the data says you should.

That is a risk-based strategy, not a brute-force strategy.

Safe Failure Has to Be Tested Before Release

Automation also changes how teams test failure.

Cloud-connected medical products rely on services, event buses, databases, integrations, and message flows. The system may work perfectly when all services are healthy. That is not enough.

For example, a blood glucose reading may trigger an event bus and then flow to multiple services: one records the reading, another tracks time in range, and another determines whether the user needs an alert.

The harder question is what happens when one service never receives the broadcast.

Does the service retry? Does it return the right error? Does it move into a safe state? Does it preserve the right data?Does a single non-fatal failure remain contained, or does it create a broader system issue?

Larkin described using infrastructure-as-code tools, such as Terraform, to spin up a fresh cloud environment, run automated tests, force specific failure conditions, and then tear the environment down. That gives the team the same starting conditions every time, rather than relying on a long-lived QA or staging environment that may retain leftover configuration from prior work.

This matters because failure testing is hard to do reliably by hand. Teams need the ability to place the environment in a specific state, trigger the failure, observe the system’s response, and repeat the test under the same starting conditions later.

Automation does not replace engineering judgment. Teams still need strong architecture, risk analysis, and FMEAs to identify the right failure modes. Automation will not tell the team what can go wrong.

But once those risks are known, automation lets teams test them repeatedly and predictably.

That repeatability makes the evidence easier to trust, review, and reproduce.

The Business Case Is Not Just Fewer Testers

The business case for automation often starts with labor, but it should not end there.

Larkin shared an example from a project with roughly 40 manual testers, where the work was replaced by about 5 software development engineers building test automation. Those engineers cost more per person, but the automation made the team far more productive.

That comparison gets attention, but the deeper point is cost per test.

Manual testing is expensive every time a team runs it. It also depends on people performing the same steps the same way every time. MedTech companies already account for human variability when they design products for users. They should apply the same thinking to internal development and testing processes.

Humans get tired. They get distracted. They miss steps. Manual testing will always carry that risk.

Manual testing also forces teams to ration coverage. They test the most critical scenarios because testing everything they would like to test would require too much time and too many people.

Automation changes the economics because each automated test becomes an asset. Once the team writes it, they can run it repeatedly at very low incremental cost.

That lets teams run tests more often, add scenarios that would never have made the manual test plan, catch defects earlier, and reduce late-stage rework. It also creates faster feedback during development. Instead of waiting until the end of a project, or even several sprints later, teams can learn much earlier whether a feature works as intended.

Many companies do not simply save the difference. They reinvest it into broader coverage and faster feedback.

The advantage is structural: quality costs less to verify, engineers spend more time on product work, and the team relies less on end-of-project heroics.

Compliance Evidence Belongs Inside the SDLC

The best version of compliance automation is not a shortcut around quality or regulatory discipline.

It is a way to make regulated software development less dependent on manual reconstruction.

It connects requirements, code, tests, results, traceability, risk analysis, and release evidence inside the tools teams already use. Quality and regulatory teams get clearer evidence. Engineers get faster feedback. Product teams get more predictable delivery.

That matters because many MedTech products are no longer standalone devices with occasional software updates. They now depend on mobile apps, cloud services, connected devices, data flows, and software updates that must ship without compromising safety or compliance.

A document-heavy SDLC will struggle to keep up.

In the webinar, Orthogonal will show how TDD, BDD, Gherkin-based documentation, automated testing, resilience testing, BYOD strategies, and AI-assisted workflows can reduce manual compliance work without weakening design controls.

Save Your Seat