false
Catalog
Artificial Intelligence in Abdominal Imaging (2024 ...
M7-CGI07-2024
M7-CGI07-2024
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
All right, hello, everyone. Thanks so much for joining us today. We are going to get started with a session on AI for GI imaging. And to start off, we have Dr. Tessa Cook, and she will be talking about bias in AI. Thank you, Dr. Magudia. Good afternoon, everybody. Let's start with the definition of bias. So bias is a systematic error. It causes disproportionate weight in favor of or against an idea, an individual, and a group. So if you think about bias in health care, sadly, we have quite a few examples of this. And I'll share a few of them with you, which I'm sure you've seen before. The Gerlach and Pyle atlas that we've all used for bone health estimation used x-rays that they started collecting in 1929 from white children in the US. It wasn't published for 30 years after that, but we use it across the world. Only recently did a lot of our organizations adjust the EGFR calculator, which previously included false assumptions about muscle mass and serum creatinine in black patients that led to them not getting needed care for chronic kidney disease. I'm a cardiovascular radiologist. The Framingham Heart Study is the basis for a number of cardiovascular risk estimators, but it actually overestimates risk in certain populations. So if you consider bias in health care, how does that translate to bias in AI? Well, in this very nice paper from Dr. Banerjee and colleagues, they remind us that bias can enter AI models at different points in the pipeline, possibly starting in the data, in the modeling phase where the model is actually developed, and even at the inference step. If we think about bias in AI, we think of two broad categories. There's statistical bias, which occurs when a model's output doesn't represent the underlying population, doesn't represent reality. On the other hand, social bias refers to inequity in care delivery that systematically leads to suboptimal outcomes for a particular group. And social bias can be caused by a statistically biased algorithm, a statistically biased algorithm, or in some cases by other human factors, including our own implicit or explicit biases. So let's dig a little deeper into statistical bias. There are three broad categories that I think are important to remember. Again, this is where the model output doesn't capture or represent reality. It can occur from data handling, and this is a nice paper from Dr. Roosroek and colleagues talking about all of the different kinds of data biases that could impact an AI model. The actual model development step is also just as important and can be a source of bias. We might overfit or underfit the model to the training data. There's also the idea of a bias-variance trade-off. As bias increases, variance decreases, and vice versa. The ideal situation, of course, which is hard to achieve, is a low-bias, low-variance model. And finally, we have to think about performance evaluation. What kinds of metrics are we using to actually decide how well a model is working? And do those metrics perhaps mask some of the biases in the model? This is a nice example of statistical bias in AI. This is a paper now that's a little over four years old that looked at the sources of imaging datasets at the time that were being used to train AI. And in the US, the researchers found that they came from three states primarily, California, Massachusetts, and New York, with my home state of Pennsylvania running a distant fourth. Now, how do the patients in those three states resemble patients in the rest of the country or even in the rest of the world? So this is a potential source of bias. Another example here, a deep learning segmentation model to identify liver metastases from colorectal cancer. And the researchers found that it consistently underestimated tumor volumes. So you can see here the ground truth segmentation in red, the model segmentation in green, which doesn't cover the whole region, and even another segmentation here, a manual segmentation of a lesion that was missed. So compared to the expert manual segmentation, the deep learning model was faster, but it required correction. So ultimately, it was not fully automated or significantly more efficient. Coming back to Gerlich and Pyle, again, our atlas of X-rays from white children in the US. The winning arsenic challenge bone age model from a few years ago, trained on the data from Gerlich and Pyle, was found to maybe have up to 16% of predictions that would result in clinically significant errors, because the population that these X-rays represented did not resemble the population in which the scientists studied this. And Dr. Yee, who is one of my fellow speakers up here, is the senior author in this work. And so here are four examples of cases where the prediction of a tumor cases where the prediction, in red, does not match the ground truth, in orange. And these were, respectively, from top left, image A, black patients, image B, an Asian child, image C, a white child, and image D, a Hispanic child. And so you can see here the introduction of statistical bias due to the fact that the training data didn't really represent the reality in which the model would be used. What about social bias? Social bias occurs when the decisions that are based on the output of the AI create inequities in care access or care delivery. And it's typically the groups that are underrepresented in the training data that are adversely affected. Again, statistical bias can lead to social bias, but additionally, some of our own implicit or explicit human biases can contribute as well. This is work from Dr. Ziad Obermeier looking at an algorithm that was used to identify patients for a so-called high-risk care management program, or set of programs. And so looking at the graph on the left, you can see on the y-axis, this is the number of chronic conditions in these groups. The curve in purple represents black patients. The curve in yellow represents white patients. And you can see that the black patients tended to have a higher number of chronic conditions. And the patients that were screened for the program, to the right of this line, and even enrolled in the program to the right of the second dashed line, consistently, the black patients had more chronic conditions than the white patients. But when we looked at the amount that was spent on their healthcare, we saw a different trend. Again, the black patients in purple and the white patients in yellow. And you can see that consistently, even as the number of chronic illnesses increased, the amount of money spent on black patients was less than on white patients. So what did the AI learn? It learned that black patients were apparently healthier because less money was spent on their care. But what really happened here? These were patients that had less access to care, and that's why less money was spent on their care. But they actually really had just as many, if not more, chronic conditions. And so the AI disproportionately disfavored the black patients in these programs. We're starting to appreciate that there's more information in the data that we work with than we perhaps intend to uncover. A seminal paper from Dr. Gachoya and colleagues looking at predicting race from chest radiographs, a subsequent paper from one of my mentees, Dr. Edelberg, looking at predicting other kinds of demographics from chest radiographs, and also work from Dr. Piros and colleagues looking at predicting the future development of type two diabetes from a chest X-ray. So there is data in the imaging that we work with every day that we don't currently do a lot with, but we also don't fully understand how these models are making these predictions. And so that even further amplifies the need to be mindful of the potential biases that get incorporated into these models. So we've talked a lot about pixel-based AI, and I wanted to spend a little bit of time considering this for large language models. This is a very nice experiment that took a large data set of publicly available patient notes describing essentially hospital admissions or emergency room visits. And what the researchers did was they took the sort of introductory statements from each of these notes, and they modified them to add hypothetical patient profiles with patients of different races. And then they looked at what the large language model did in terms of generating these new reports, new notes for these hypothetical patients, including what sorts of diagnoses they got, what sorts of treatments were recommended, what outcomes were predicted. And you might not be surprised where I'm going with this, but the large language models exhibited bias in the kinds of treatment they recommended, the cost associated with care, the types of hospitalizations, and even the predictions of outcomes. It typically favored white patients, suggested superior and more immediate treatments, longer hospitalization stays, and better outcomes, better recovery. I'll give you an example. In white patients with cancer, the large language model recommended surgery, and in black patients with the same cancer, conservative care in the ICU was the recommendation. So for us as the end users of these tools and even the developers of these tools, we really need to be mindful of the bias that can get incorporated into these models. Now, all is not lost. There are techniques that can be used to actually actively decrease bias in healthcare AI. This is work in the cancer omics space where the group first trained a model with what we call class imbalance. So multiple examples of what they call a majority group, and fewer examples, you can see here in the red, the yellow, and the purple, of data from the minority group. And they found that they could adjust how they trained the model to actually improve performance in those minority groups, but it took a conscious effort. There's a nice example of this related to pain from osteoarthritis, because we know that underserved populations experience more pain, but that's not explained by standard measures of osteoarthritis severity on imaging. And so this group was able to incorporate data on disparities into their model, and actually predict knee pain more effectively from radiographs to adjust for the disparities. So to conclude, there are multiple sources of bias in AI models. We know that they have the potential to amplify healthcare disparities. Our colleagues who are developing these models need to be conscious of this and mitigate the impact of bias wherever possible. And we, as the end users, need to be aware of this and mindful of it when we use AI for our care. So with that, I thank you very much. I apologize that I can't stay for the Q&A, but I'm happy to have a conversation at a later date. Thank you. Thank you, Dr. Cook, for that excellent presentation. I'm now going to introduce Dr. Paul Yee, who will be talking about implementing AI in your practice. All right, thank you, Kirti, for the introduction. Thanks, everyone, for joining us here on a cold but energetic afternoon. So today, I'll be talking about implementing AI in your practice. So anyone who's in this room knows that AI is front and center here at RS&A. In fact, it's in our theme, Building Intelligent Connections. With programs like Radiology Reimagined showing in-person demos of AI from numerous vendors, it's clear that AI is here to stay in our field. But if you're like me, when you step out onto the vendor floor, especially the AI Exhibit Hall, it can be really intimidating. There's dozens, if not hundreds, of companies ranging from big tech companies like AWS and Google to up-and-coming startups. In other words, there's so many options, but so little time. And it's enough if you're like me to make your head hurt, and it's hard to know where to start. So the spirit of this talk today is really just to give you three principles, to just know where do I start, where do I have some grounding. We'll start with finding your why. We'll talk about next, trusting but verifying. And finally, deploying and monitoring. And just a disclaimer, this is really, like I said, just the tip of the iceberg. What I'm intending is these questions are designed to provoke thoughts at a high level that allow you to do some reflection, give you a framework on how to decide how to best implement and choose AI for your practice and your situations. And hopefully over the next 12 minutes or so, it gives you a solid foundation to at least get you to the next steps. So let's talk about finding your why. John Maxwell is a renowned leadership guru and coach, and he has a quote, find your why and you'll find your way. And like many things in life, once you know your motivations, your reason for doing things, your goals, the rest becomes so much easier. And if we consider artificial intelligence as incorporating human intelligence in the machines, I'd submit to you that the why for radiologists might be considered augmented intelligence. How do we use AI to improve human performance? But this is really gonna look different depending on who you are. So what might work for me in my practice? For Kirti Magudia, it might be very different. Or for Matt Lee, or for Tessa Cook. Each of us have our own unique practice settings, our own specialties, populations we care for, and preferences. And that includes all of you out there. But I think we all share some common goals of augmented intelligence. Obviously, improving accuracy, improving standardization of what we do, whether it's in radiology reporting or how we acquire images, improving efficiency, decreasing major errors, and improving how we communicate, not only to our ordering providers and colleagues in healthcare space, but the patients as well. Some key factors for AI success in radiology that goes a little more granular. Think obviously front and center is clinical and research value. How do we improve these performance metrics like accuracy, detection, triage? But also simplicity. How easy is it to integrate into our pre-existing IT systems? How easy is it to use as a radiologist? And last but fortunately not least, financial benefit. How do we bring back revenues to hospital systems that are increasingly squeezed? How do we increase efficiency of radiologists and increase patient throughput while minimizing burnout? So it's really a balance of multiple factors. And so for me, again, the why is augmented intelligence. And this is again gonna look very different for each and every one of us. So I'd like to challenge you, just think about this, if not now during this talk, hopefully you're listening to me instead of daydreaming, but maybe afterwards, just think about your why. And so just as an exercise, our why at St. Jude, really depends on some unique characteristics just to drive home the point. We're a specialized research hospital. We focus on a very narrow scope of diseases, pediatric cancer and other catastrophic diseases like sickle cell disease. And we're very small. We have 77 inpatient beds. Most of our patients come in actually as outpatients. And we have no ED, limited OR, limited ICU. And our work primarily supports clinical trials. The reason St. Jude exists is for this research. And we have a unique operating model. The majority of our funding over about 90% comes from generous donors. And so the usual kind of considerations for billing don't necessarily hit the top of our priority list. And again, for us, my why and very crystal clear, we're finding, curing and saving children. And so what this means is, the usual suspects for AI don't work for us. A lot of you probably use algorithms like this, ADOC, triage for pulmonary embolism, intracranial hemorrhage, things that are very time sensitive, particularly in the emergency setting. Or in the pediatric space, one of the few widely used algorithms is pediatric fracture detection. But again, because we don't have an ED, because we focus on these pediatric cancers, these don't really work for us. But when I think about our why, things that do move our needle is how do we reduce radiation while maintaining image quality? We can do things like deep learning, iterative reconstruction to take low dose scans that otherwise would be very noisy, very hard to read, and make them look a lot better. PET imaging, which is a mainstay for cancer staging and evaluation. We look at things like this. We can do a full dose, full scan time of radiotracer. If we minimize the doses, we obviously get less radiation exposure, but the image quality isn't as good. On the flip side, we can decrease scan time to improve comfort for our patients. But again, the image quality isn't so good. But this is where we started using deep learning, iterative reconstruction to say, let's reap these benefits of time of lower radiation exposure, and let's get better quality images. Another example of a tool that works for a while, or why, but not in other places potentially, is tumor response treatment assessments. So this is a company called AI Metrics, and basically allows you to interact with an image and automatically do measurements for tumor lesions that might be of interest. And again, this might not work for most practices, but for us, because we support these clinical trials, this really moves our needle. So these are unique examples that work for our why, but again, for your practice, these may be very different. So again, I encourage you, I challenge you, think about this after you leave this room, and before you step out into the exhibit halls. And one quote from a good dear friend of mine, Dr. Amal Isaiah, who's an otolaryngologist at the University of Maryland. If you don't know what you want, you start wanting what you get. In other words, if you don't have an idea of what your motivations are, what your goals are, you're more likely to get swayed by whatever you're being pitched to, whether it's on the exhibitor floor or your LinkedIn inbox. So once we've understood our why, let's move on to vetting potential tools once you've figured out what are you looking for. And this leads us to our second principle, trust but verify. So like I said, there's so many options, so little time. And it can be confusing, it can be frustrating, and you might be tempted to take vendors at their words. And this can be super challenging if you've got sales pitches flying by 100 miles an hour. But I think we can take a tip from the late Danny Kahneman, behavioral economist, to think slowly and deliberately in how we evaluate these tools and their potential benefits, particularly when we're considering making a financial investment. And so how do we do this? Well, I think at a very high level, it's test AI on your home turf. So if you've got an AI model, it's reported by a vendor to have performance at an AROC of 0.95, really evaluate, how does this work when we actually use it in our particular setting? So case study is a very popular AI algorithm for pulmonary embolism triage and detection, claiming 80% enhanced identification rate of incidental cases. When we've looked to these kinds of home turf testing, like this paper from the University of Alabama group, that are prospectively evaluated, how does this algorithm actually work? What we find is, despite a really slick user interface, something that looks really polished and might be pleasant to look at, in a prospective study, when we look at pre-AI and post-AI phases, it actually did not improve radiology accuracy, miss rate, or examination or turnaround times. Just as another example from this group is evaluating, again, human performance. So not just looking at these statistical evaluations. And so this was intracranial hemorrhage detection. And similarly, this did not improve diagnostic performance of radiologists, nor did it improve report turnaround times. So it just goes to say that sometimes you have to evaluate, does this work for our hospital? And that doesn't mean that it won't work in all cases, but it might not for your particular setting. It's also important to account for diversity of patient populations, how AI might perform in those settings. This is a bone age app that is commercially available, and it was actually powered by the winner of the RSNA Bone Age Challenge. And this is a paper that Dr. Cook had mentioned where we basically showed that this algorithm is biased against females, against children at the older and younger ends of the spectrum, as well as non-white children. And so again, it's important to consider your geography and unique populations, where this is the training data for that algorithm, and the testing data was here in Los Angeles with a more diverse population. Another consideration is robustness, which is basically, how does AI perform under different conditions? You can think of it like testing your car in different kinds of weather. And so if you're an MSK radiologist, like I am, this might just be a normal day. You've got an eXray, you've got a comparison that's flipped, you've got one that's rotated and rotated. But what we found is that these algorithms can often be very misled by just simple rotations, even just 90 degrees. And so again, how do we verify? Test on your home turf. And finally, is move into the home stretch, deploy a monitor. So there's a concept called performance drift, and the idea is that AI models can worsen in performance over time. And this can result from a number of variables, including changes in scanner manufacturer, new diseases like COVID-19, or changes in the population. For example, if your system acquires a new hospital. So how do we monitor this? Well, you can deploy, make adjustments, have regularly scheduled evaluations, and have a cyclical kind of evaluation. And so that's great in retrospect, right? But what about prospectively? We don't know the ground truth. How do we evaluate this in real time to know if the AI's going off the rails? Well, work that's being done on the research side that one day I believe will make it to clinical work is work like this, collaboration between Microsoft and MGH, where they basically take an algorithm that can characterize imaging appearance, looking at DICOM metadata and some other features, apply it to new incoming data, and evaluate for deviations from those distributions, and basically be able to identify these drifts. And what's really cool in this study is they were able to identify the state of emergency that coincided with COVID-19. So this is another example of kind of human drift, if you will, from UT Southwestern, where they looked at three phases of this intrapreneurial hemorrhage detection algorithm, three different user interfaces. And what they found basically was that there were decreased turnaround times only for one phase. So this is important to know that the user interface matters. And just in the interest of time, just a couple notes on human-computer interaction. We'll just go through this. Heat maps are used to explain algorithms. There are many types of explainability mechanisms out in FDA-cleared products. But what we've shown recently in a paper published last week in Radiology is that when you look at different explainability types, using local explainability mechanisms leads to higher accuracy and speed, but radiologists may over-rely on them. And just a case in point, even Google's algorithms have been shown to have issues with actual user usability. Stakeholder collaboration is key with not just radiologists, but other doctors, imaging IT, techs, and other folks. So in conclusion, just some takeaway points. First, know your why. This is gonna be unique to you and your practice of what types of algorithms are we looking for, what's gonna move our needle? Trust the verify AI vendors' claims. And finally, deploy and monitor AI routinely and with intention. And if you don't remember anything, just know if you don't know what you want, you'll start wanting what you get. So with that, thank you very much for your time and I appreciate your, thank you. All right, so I'm gonna be going next. We've had a great overview so far of implementing AI and some of the bias considerations for AI. I'm going to be going through some applications as well. So, you know, we're talking about AI for GI imaging, which is the name of our session, also body imagers. A lot of us, you know, that goes synonymously. Many of us also read chest, at least occasionally, so we could think about whether chest also applies to us. And then I am going to introduce this topic of opportunistic imaging, which we're going to hear more about in our last presentation as well. So I have no financial conflicts of interest, but I am involved in some other efforts sponsored by RSNA, which I'll touch on at the end. So the objectives of my session are to think about current FDA approved AI products for body imaging and how to go around, go and find those products. Some of the options for that. Think about the range of AI research and body imaging and think about AI applications related to opportunistic screening and radiology. All right, so let's get started with FDA approved products. So you're at your workstation. And as Paul was saying, you know, you can go to the vendor, to the floor here where the vendors are on the AI showcase and you could look at products that way. But if you're at home and let's say it's on RSNA at that time, you might be thinking what AI products are there out there that could help me with my current radiology workflow. One option is to go to the ACR Data Science Institute. And they have a webpage that you can get to. This is what the interface looks like. And it's a really interesting place to go because they have everything organized by topic. So if you haven't gone here, I think it's, it is worth a look. So you start up on this homepage and you can click on abdominal imaging and you'll end up at an interface that looks like this. And so there's literally, you know, these little boxes that show you each of these products. And you can have a sense of, you know, what are the sorts of products that are out there that are FDA approved. If you log into the ACR website, this is another view of the data that you can see after logging in. And you can see that at this point, this was when I put this presentation together a couple of weeks ago, the number of FDA cleared products in medical imaging were 333. This has gone up since last year where maybe there were 60 to 80 fewer products. So, you know, this is changing over time. And again, you can click, all of this is clickable. So you can click on the abdominal imaging section and you can see all of these different metrics in terms of what anatomical regions these products cover, the year that they were cleared, and even the focus of these products. And so you can see that there are many, many AI products related to prostate, probably prostate MRI, image quality improvement, and then a number of products related to liver analysis as well. And clicking further, you can actually get into lists of getting even more information about these products. So this is a way to kind of interface with some of these products. As Dr. Yi mentioned, you know, one product that I think is quite interesting is body imagers, you know, even for adults, we're reading lots of cancer staging studies in most abdominal imaging departments, or sorry, abdominal imaging divisions. And so this is a product that I think is of interest because it can help with automated measurements and automated report generation to have more structured reports that I think many of our oncologist colleagues would appreciate having access to even outside of the context of clinical trials. So just something to keep an eye on and to consider as well. All right. And then in terms of AI research for body imaging, you know, very briefly, there are so many different applications of research related to AI and abdominal imaging. And I have some of the topics listed here, organ segmentation, lesion detection, detection of incidental findings, characterization of findings, whether they're benign or malignant, opportunistic screening, which we're going to be talking about next, and radiomics, which have been ongoing for years. One that I am in particular really excited about and is something that we can all access is this total segmenter AI model. If you haven't seen this, this is a website that you can go to, and you can actually upload your own, you know, an abdominal CT. And this model will segment over 100 anatomical structures. And you can get basically a virtual phantom based on that abdominal CT exam. So it's really powerful in showing that, you know, although, you know, some of the papers we're looking at might be, you know, can we segment the prostate? Can we segment the liver? You can actually do this really powerful segmentation of many different structures simultaneously. All right, so the next topic that I'd like to talk to you about is opportunistic screening. And just a, you know, very brief background on this. This is screening that depends on requests from patients or their providers. So we're taking advantage of when a patient presents for care for another indication. On the right here, I have an example, which is that, you know, if a child goes to a well care visit to their pediatrician, usually they're going to take advantage of that visit and check the child's vision at the same time, at least a basic vision screening. So that would be an opportunistic screening when the patient is already present. And that's in contrast to organized screening, which is coordinated by the public sector. So in the example of vision screening for children, this would be the idea that, you know, perhaps there's an organized program for vision screening in schools. So the children are already at schools, and this is something that's organized by the government. So bringing this to radiology, opportunistic screening in radiology looks at, you know, is there latent data available in medical imaging that's performed for other indications? You know, in body imaging, when you look at the volume of abdominal CTs that are being performed, there were, it was estimated there were over 20 million abdominal CTs performed in the United States in 2016 alone. And I think we all know that imaging volumes are going up, so it's likely that these numbers are higher now. And what information can we gather from all of the CT imaging that is being performed? So, you know, at least initially when I was thinking about AI, you know, we think about these problems like, you know, can an AI model distinguish a cat from a dog? You know, can we predict mutational status of tumors? Can we predict tumor grade? You know, these are all interesting questions that are quite difficult and challenging tasks, but this is not what AI and opportunistic screening is about. We're really talking about things that are, I like to think about this as more in two different buckets. One is biomarker development, and the other are identification of institutional findings. And there probably, you know, are many more, and we're going to talk about just a few of these. The first, and one of my favorites, is body composition. Here's some chuckles from the side. This is something that I've been working on since residency, so I'm going to give you some background and some of the exciting work that's being done in this space. So, why do we care about body composition? Well, if you look at your electronic medical record at that face sheet, it's likely that BMI is on that kind of home section of the chart, and no matter where you travel in a patient's chart, you're going to see that BMI number staring you in the face. And that's because if you go back to that Framingham study that Dr. Cook mentioned, BMI was shown to be associated with survival and cardiovascular risk. So, everyone is really focused on BMI. It's how we determine if a patient is obese or underweight. But we're looking at abdominal CTs. We're radiologists, and we can see, even though we're not reporting this, we can see a patient's muscle, we can see their subcutaneous fat, we can see their visceral fat, and we can see that although these two patients have the same BMI, their body composition looks very different. So, there's a very different ratio here of muscle to subcutaneous fat between these two patients. And it must be that these patients have different risk profiles, even though that's not being captured by BMI. So, you know, research started, and people were doing manual segmentation for body composition, looking at single axial CT slices at the level of the third lumbar vertebral body. This was because there was prior research showing that single slice analysis correlated well with whole body measurements. And you'd end up with a segmentation mask like this for subcutaneous fat in green, skeletal muscle in red, and visceral fat in yellow. When it comes to manual analysis, this was time intensive and costly. So, it was really limited to small cohorts and well-funded research studies. So, I and multiple other groups have developed models for body composition analysis to automate this. And we developed this fully automated workflow that is high throughput. So, you can give it an entire abdominal CT exam and end up with a result with areas for body composition. So, it'll select the appropriate series, including series that are primary series that are not derived. So, for example, not including dual energy series that have enough images, select the correct L3 slice using one deep learning model, and then use another model to actually do that segmentation. And we know that body composition varies by age, race, and sex based on other research. So, our approach was to go ahead and to try to define normal or reference values for body composition. And we did this by analyzing, you know, many thousands of CT exams from a single hospital system and showing that, you know, what does a body composition look like when you're looking at age, sex, and race? So, for example, here we have white women, white men, and the x-axis for all of these is age. So, you can see that there are, you know, significant variations for body composition by these factors. This first row is skeletal muscle. So, we can see that overall men have more skeletal muscle than women. But it's interesting because you can see that visceral fat is higher in men than women, which is not something that I knew before doing this study. And that overall, it seems that black patients have less visceral fat compared to white patients. And this can be used to calculate z-scores that are similar to bone mineral density, where t-scores are driving decision, treatment making decisions for osteoporosis and osteopenia. And what we found is that this normalized body composition is associated with subsequent cardiovascular events and with overall survival. So, you know, this is work that we're really interested in in terms of, you know, what are the clinical outcomes that body composition is related to and is work that's being done by many, many groups across the world in many different domains. So, to give you an idea of the domains of significance for body composition, this can range from cardiovascular disease for cancer, where it's associated with risk of developing cancer, treatment toxicity, and outcomes for cancer, liver disease, inflammatory bowel disease, chronic kidney disease, and even COVID. And then some latest results that we have are that changes in body composition also are very significant, because many of our patients are getting subsequent CT exams over time. So, if your body composition is changing, that could also be a very significant marker. And that's what we also saw with overall survival, that changes in body composition over time predict worse overall survival. Just a quick plug here. This is a medical student who's working with me from Duke for her third-year research project, and we're looking to see if we can see if differences in body composition by race and ethnicity are related to social determinants of health or lifestyle factors such as diet and physical activity, and whether we should be taking into account race and ethnicity when interpreting body composition results. So, that work is in progress. And then we're thinking about, you know, how body composition varies by geographic area and in race and ethnicity groups other than white and black patients, which we've looked at so far. All right. And really quickly, just to mention coronary calcium scoring. The reason that this is important is that this tool of coronary calcium is a validated predictor of coronary artery disease in patients that are otherwise asymptomatic and an intermediate in risk. So, it's a really powerful predictor, more helpful to us than all the other risk predictors that are out there. I mean, as radiologists, I feel like we should take advantage of this. And there's been, you know, really incredible work that's been done in this area by this group that was published in Nature Digital Medicine. And it's been shown that that particular AI model actually, you know, when patients were alerted of these results and their providers were alerted, that that could actually change downstream prescriptions for statin medications, which is one of the primary treatment of having higher coronary calcium levels. All right. So, in the interest of time, not going to be talking about some of the other ones, but, you know, we could do liver fat quantification. We could look at identifying abdominal aortic aneurysms or incidental PEs. These are all other options for opportunistic screening. Thank you for your attention. Sorry. All right. Next is Dr. Matt Lee. All right. Thank you, Dr. Magudia. Good afternoon. I'm grateful for the opportunity to be here with you. So, kind of building on what we've been talking about in this session, over the next 12 minutes, I'll cover what opportunistic imaging is, why it's important, and how we can harness everyday imaging data to uncover clinically silent or pre-symptomatic diseases and risk factors for future adverse outcomes. I'll provide a whirlwind tour of the University of Wisconsin's contributions to opportunistic imaging, its impact on the field, and role of AI tools and technological advancements in driving these efforts forward. Lastly, I'll share some kind of exciting examples of our current projects, our population-level impact, and our plans for the future. So, what is opportunistic imaging? We've talked about this some already, but opportunistic screening refers to exercising prevention through an unorganized program or chance encounter. As it pertains to imaging, it's the practice of leveraging this incidental imaging data unrelated to the clinical indication, generally for the purpose of wellness, prevention, risk profiling, or pre-symptomatic disease detection. I asked ChatGBT and actually got a pretty solid description. So, we're talking about extracting clinically relevant information beyond the original purpose of the exam. So, the origins of opportunistic imaging really date back for over a decade now. Since then, opportunistic screening has been widely adopted in the CT literature. Opportunistic imaging encompasses, of course, the assessment of clinically silent, unsuspected conditions or risk factors, and it should be noted that this does not imply healthy asymptomatic patients. Of course, the word opportunistic is problematic in terms of its largely negative connotations, at least within medicine, and more palatable terms may include incidental, fortuitous, serendipitous, opportuneful, kind of an interesting one, I guess, value-added, or some combination of these. All imaging exams, of course, reveal incidental findings unrelated to their original purpose, often drawing negative attention for triggering costly, anxiety-provoking, and potentially harmful follow-up. Although radiologists are well-versed in managing these incidental findings, our position has been to sort of reframe this data as untapped potential instead of burdensome. Although opportunistic imaging can be applied to any imaging modality, the application at the University of Wisconsin, at least, has been to body CT, and this is really related to a confluence of factors. The sheer volume of CT examinations, the accessibility, reproducibility, objective nature of CT imaging, the emergence of automated and explainable tools, which we have favored, and then increasing emphasis on precision medicine and value-added initiatives. Recent efforts have been made to sort of demystify AI tools in radiology, with interpretable AI being those transparent models where the decision-making process and prediction can be understood by a human. As such, we favored using automated and explainable body composition tools that have analogous manual or semi-automated measures and can be readily checked for quality assurance. And when it comes to opportunistic imaging, at least at the University of Wisconsin, what are we really talking about? We're talking about measures of fat, abdominal muscle, calcified aortic plaque, liver, and bone, and these are CT biomarkers that have bubbled up to the top over a decade worth of work in terms of survival and cardiometabolic risk prediction. We'll take a quick look at a few of these. Some of it has already been touched on, and how our experience with opportunistic imaging has evolved. So looking at our origins and experience, perhaps somewhat surprisingly for a bunch of abdominal radiologists, the early work was actually focused on bone and specifically opportunistic screening for osteoporosis at CT. Osteoporosis remains an under-diagnosed and under-treated condition, but CT is really quite ideal for osteoporosis. We find that the spine is the window to bone health, BMD, and fracture risk. And here we see an example of a patient with osteoporosis and myosteatosis who underwent CT colonography for colon cancer screening and subsequently suffered a left proximal femur fracture. Measures of abdominal muscle, including paraspinal, psoas, and body wall musculature, have shown value for cardiometabolic risk and fracture prediction. Interestingly, though, our early work focused on muscle bulk. We've actually found that muscle attenuation has repeatedly been shown to be a predictor of survival and mortality risk. This has also been shown not just in the CT colonography population, but within subgroups, including patients diagnosed with colorectal cancer, where we found that muscle attenuation was a significant predictor of survival in patients without metastases at the time of staging CT. Myosteatosis is also associated with increased cardiovascular, future cardiovascular events, diabetes, and appears to be a biomarker of metabolic health. Of course, fat measures, including visceral, subcutaneous, and ectopic fat are also indicators of metabolic health. And finally, I'll just touch on calcified aortic plaque as a cardiometabolic biomarker that can be readily identified at abdominal CT and has been very important in our work. Of course, cardiovascular disease is the leading cause of death in the U.S., making cardiometabolic disease prevention a critical public health health concern. And certainly in an aging United States population, chronic conditions like cardiovascular disease, diabetes, and obesity are increasingly common and costly, though largely preventable. Historically, the widespread adoption of body composition analysis has been hindered by labor-intensive methods of measurement, inconsistent measurement techniques, and lack of standardization and generalizability. The work that we've done couldn't have happened without really important years of partnership and collaboration. The individual tools which were first used in some of the publications that I showed were developed, trained, and tested at the NIH by Dr. Ron Summers, subsequently modified with some improvements and modifications at the University of Wisconsin. These tools are now streamlined into a single executable software package and takes about a little over three minutes to run on a CT, an abdominal pelvic CT scan. So these explainable tools have shown that quantitative CT biomarkers are predictive of cardiovascular disease and survival and kind of replace the more labor-intensive methods that you may be familiar with from years past. They can be opportunistically applied to existing exams without the need for additional testing cost or patient exposure. So putting it together, this is what our composite color overlay QA images look like at the L1 and L3 levels. And so what are we doing with us? Well, over the years we've shown, again, utility of these explainable tools in patients undergoing CT colonography and cancer cohorts, but now we can show their scalability as well. So we've applied them to very large datasets, recently showing that automated quantitative CT biomarkers predict survival in an adult cohort of over 136,000 patients, validating findings from our smaller studies. Again, we see that lower survival probabilities were observed for patients with myostatosis, increased abdominal aortic calcification, higher visceral to subcutaneous fat ratios. We feel opportunistic imaging has clear applications to public and population health and have directed efforts to how we in radiology can work to improve health through a population focus. Despite high spending on health care in the United States, we have the worst health outcome indices among high-income countries. If we think about the mission of public health to fulfill society's interest in assuring conditions in which people can be healthy and the fact that health disparities are exacerbated by limited access to health care and resources, including access to imaging facilities, lack of primary preventative care, from a value perspective, we really believe that radiology is uniquely positioned throughout the patient care experience to improve value in population health. Opportunistic imaging enables us to show population-level impacts of radiology and assess population-level trends in imaging resource utilization through patient outcomes and health disparities that may currently be underappreciated. Using measures such as area deprivation index, which is a composite measure of socioeconomic disadvantage for the United States, we've shown that socioeconomically disadvantaged patients get their first CTs at an older age. They die younger. They're more likely to have an initial CT in the inpatient setting. They have shorter survival time after their CTs and they have higher risk CT biomarker profiles for cardiometabolic disease risk. So, in terms of cool, exciting things that we're currently working on are how CT biomarkers associated with aging may actually impact our understanding of aging and longevity. This is certainly of the moment, as evidenced by this entree of aging and longevity into popular culture. To date, geroscience and phreolomics have primarily focused on cellular and subcellular epigenomics, proteomics, and metabolomics. However, imaging biomarkers may actually better reflect, or at least be complementary to, these omics through the cumulative effects of aging genetics and disease at the tissue and organ level. Of course, chronological age and sex guide many of the key healthcare decisions, like prevention, screening, and intervention. However, chronological age is an imperfect measure of health span, prompting this concept of biological age, a composite measure of not only years lived, but genetic predisposition, lifestyle habits, and disease, which may be a better predictor of life expectancy. So, using our CT biomarkers from our explainable tools, we developed a CT biological age model solely on imaging data. With this model, we have been able to show that there are distinct survival probability differences across CT biological age risk quartiles. This CT biological age model can also be used for 10-year survival probability, can be used to generate actual CT biological ages for individuals, and then be used to determine relative age acceleration or deceleration. So, of course, we're biased, but we think that the future is bright, and we're looking for broader application of existing AI tools to enhance risk modeling and cardiometabolic disease risk, frailty prediction, biological aging, and cancer prediction. The future of opportunistic imaging involves expanding AI tools to larger and more diverse populations, developing comprehensive normative data, like Dr. Magudia has already done. Efforts like the Opportunistic Screening Consortium and Abdominal Radiology aim to implement tools such as these across the US and globe, refining some of these definitions of normal and abnormal, and enhancing predictive accuracy. Future directions also include clinical decision support model development, automated reporting, dashboarding, and extending applications to other body regions. So, in conclusion, opportunistic imaging is a resourceful, purpose-driven approach to utilizing incidental imaging data which go unused in clinical practice. We believe that the rapid, reproducible, automated, explainable tools can really accelerate the use of imaging biomarkers at scale for potential population-level benefits, and we feel that we have a unique opportunity to identify potentially important, clinically relevant diseases and markers of disease risk. Ultimately, the early detection and identification of individuals at increased risk for major causes of human disease and suffering, things like cardiovascular disease, cancer, could translate to improved longevity, quality of life, and clinical outcomes. Thank you much.
Video Summary
In a recent session on AI in gastrointestinal (GI) imaging, Dr. Tessa Cook highlighted the prevalence of bias in AI, emphasizing that it can begin at any phase in the AI model development pipeline and manifest as either statistical or social bias. Statistical bias can lead to outputs that do not accurately represent reality, while social bias results in inequities in healthcare delivery. Dr. Paul Yee discussed the importance of knowing how to implement AI in radiology, emphasizing understanding one's motivations (the "why"), testing AI in one's own practice settings, and monitoring AI performance over time. Dr. Kirti Magudia explored FDA-approved AI products and research in body imaging, particularly in opportunistic screening, which leverages incidental imaging data for identifying health risks. She detailed the potential of AI in analyzing body composition and coronary calcium scoring, demonstrating its implications for predicting health outcomes like cardiovascular risks. Dr. Matt Lee presented opportunistic imaging at the University of Wisconsin, focusing on using CT data for identifying conditions such as osteoporosis and cardiovascular risks, emphasizing AI's role in advancing this field for public and population health. Overall, the session underscored AI's promising role in improving healthcare outcomes through mitigation of bias and adoption in clinical practice.
Keywords
AI in GI imaging
bias in AI
statistical bias
social bias
AI in radiology
opportunistic screening
body composition analysis
coronary calcium scoring
healthcare outcomes
RSNA.org
|
RSNA EdCentral
|
CME Repository
|
CME Gateway
Copyright © 2025 Radiological Society of North America
Terms of Use
|
Privacy Policy
|
Cookie Policy
×
Please select your language
1
English