Introduction

If you've ever stepped foot into a hospital or have had the chance to be near medical imaging devices, you've indirectly encountered DICOM. It's a term that might sound like jargon but is actually an essential part of how medical images are stored, accessed, and shared. I'm a physician and medical researcher, and though I use these images daily, I realized I knew little about how they were generated or manipulated - this gap in my understanding felt particularly glaring.

What's DICOM Anyway?

DICOM stands for Digital Imaging and Communications in Medicine. In simple terms, it's a two-pronged standard used across all modern medical imaging equipment, like X-Rays, Ultrasounds, CT scans, and MRIs. One prong is the file format that stores not just the medical images but also crucial information about the patient and the specific conditions under which the image was taken. The second prong is the networking protocol, a sort of language that allows different medical systems to talk to each other, sharing these DICOM files and other types of medical data.

The real magic of DICOM is how it enables different devices from different manufacturers to work together. Before DICOM, getting medical imaging systems to talk to each other was a real headache. It was like trying to get someone who only speaks French to understand someone who only speaks Japanese. DICOM essentially created a common language. Now, when you plug in a new piece of medical imaging equipment at a hospital, it can easily access previous images, share its own, and even follow along with treatments and procedures.

So Why Build Another DICOM Viewer?

A lot of the current DICOM viewers are built on older technologies, relying on the same few libraries that are slow to adopt new features. The software often feels like it's from a different era and can be hard to debug or modify. I wanted to create something from the ground up that was easier to use, while deepening my understanding of the DICOM protocol. More importantly, I wanted to create a platform in which I could tailor every aspect, from the UI/UX, all the way down to the machine learning algorithms churning away during inference on the hardware of my choice, including the Apple Vision Pro.

When Apple Vision Pro was unveiled, its capabilities immediately struck a chord with me. It wasn't just a high-performing piece of hardware; it was an innovation that opened new frontiers for Augmented Reality (AR) in medical contexts, specifically in surgery and radiology. What captivated me most was its fluid transition between mixed and immersive reality experiences. It does this with an almost uncanny knack for understanding human gestures, making interactions in AR feel like second nature. The hardware also offers unprecedented computational power, capable of performing complex tasks in real-time. This makes it ideal for rendering high-definition medical images and even running real-time machine-learning algorithms during surgery or diagnosis. With Apple Vision Pro, the possibilities for implementing AR in medical settings become not just feasible but revolutionary, transforming the way we think about medical imaging and data visualization.

Enter Left Stage Avicenna

Named after the pioneering polymath in medicine, Avicenna is my attempt at a "toy" DICOM viewer built entirely within the Apple ecosystem with the help of Swift and Metal. The use of Swift offers type safety, performance, and modern syntax, making the development process smoother and more efficient. Metal allows for high-performance, low-level access to the GPU, enabling real-time 3D rendering and complex computations for image/DICOM processing. Together, they create a solid foundation for Avicenna, while enabling cross-platform support out-of-the-box with SwiftUI, and first-class support for machine-learning inference via CoreML. These modern tools promise not only speed and efficiency but also open doors to AI-driven medical imaging analytics, thus setting the stage for the next generation of medical imaging tools.

What's Next?

While Avicenna has already implemented a variety of standard DICOM features such as support for various modalities (i.e. 2D, 3D and 4D), annotations, measurements, and 3D reconstructions, it's far from being a finished product, let alone an FDA-cleared application. So what's on the horizon for this nascent project? One of the first milestones ahead is enhancing Avicenna to work with various DICOM compression schemes. As of now, this is a noticeable gap that I plan to fill, making the viewer more versatile and capable of handling a wider range of medical images.

Beyond just viewing images, the next iteration of Avicenna aims to incorporate advanced AI algorithms, particularly enhanced dictation, report generation, and support for medical vision-language models such as Almanac Chat. The idea is to have the software not just display images but also provide context and potentially even diagnostic suggestions, enriching the utility of the viewer in a clinical setting.

Looking further into the future in the Hiesinger Lab, we're also exploring the possibility of employing Avicenna in an augmented reality (AR) capacity during surgical procedures. The vision here is to overlay real-time medical imaging data onto a surgeon's field of view, providing an additional layer of information during critical moments.

While Avicenna serves as a "toy" viewer for now, the ultimate goal is to meet all the stringent requirements for FDA clearance. This is a complex and time-consuming process but one that would enable Avicenna to be used more broadly within the healthcare ecosystem.