false
Catalog
AI in Neuroradiology: Research, Implementation and ...
WEB29-2022
WEB29-2022
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Good afternoon, or good morning, everybody. My name is Max Wintermark. I'm the chair of neuroradiology at the University of Texas MD Anderson Center. It is my great pleasure today to be the moderator for this webinar organized by the Radiological Society of North America. The topic of the webinar is AI in neuroradiology, research, implementation, and ethics. And I'm delighted because we have a wonderful panel of speakers. I'm going to introduce each of them. Our first speaker will be Dr. Yvonne Louis, who is an associate professor at NYU and also the associate chair for artificial intelligence in their department of radiology. Our second speaker will be Luciano Prevedello, who is an associate professor of radiology at the Ohio State University and also their vice chair for medical informatics and augmented intelligence in imaging, as well as the medical director for image informatics and for their 3D lab. Our third speaker will be Mark Colley from UCSF. He's a professor of radiology, the medical director of imaging informatics and the associate chair of clinical informatics for the department of radiology. And our last speaker and panel participants will be Dr. Suzy Bash, who is the medical director of neuroradiology at the San Fernando Valley RadNet site and also an expert in artificial intelligence. So we are going to have first three 10-minute presentations. And then after that, we'll have a lot of time for questions and discussion with our panel of experts. This webinar is being recorded. It will be made available later on the RSNA website and more information on that topic will be given to you in the chat. And also you'll have the opportunity to ask all the questions that you want to ask to our experts today in the chat. So please do so. And we'll tackle all those questions in the panel discussion in the second half of this webinar. So that's it for the introduction. And without further ado, Dr. Louis Yvonne is going to tell us everything about upstream AI. Yvonne? Thank you so much for the introduction, Max. And it's really a pleasure for me to join this webinar today. So Max asked me to speak about where it all starts, right? Upstream AI. And I actually had to ask him, okay, what exactly do you mean by upstream? So I have this graphic here. Upstream is, you know, up in the mountains. I'm kind of a freshwater lake mountain person myself. I guess if anyone wants to guess what little rivulet or stream this is. This is actually the Hudson River all the way up in the Adirondacks. So you can see, you could probably hop across it. Doesn't look so threatening. This is kind of more my speed. Whereas the sea and open waters, I think we're gonna talk a lot about, and some of the other speakers are gonna talk a lot about the sea and open waters of AI, but it's a wild scene out there. I don't know if anybody who's a surfer follows big wave surfing knows where this is, but this is a kind of a famous site in surfing. It's in Portugal called Nazare. It's frightening. Okay, so I have actually no financial disclosures. So I don't know. I do lead a departmental collaboration with Siemens and Facebook AI Research, but I have no financial interests in any of these things. We're gonna touch today on a bunch of different areas where AI can influence and impact diagnostic imaging. So I'm gonna concentrate on this very tiny piece here. How can we revolutionize how we make an image and can we provide some new information? Why do we need to go faster to or get less information? Why would you even need to do that? Well, for patient comfort, we know that MRI has incredible tissue contrast, right? But people move, people are not comfortable. Even our research subjects are like, okay, now I understand why you need to go faster. Obviously image quality. These are actual clinical pictures that we took. Sometimes you're faced with this thing on the left, right? And then can we translate research to reality? Our functional neurosurgeons are going for the ventro-intermediate nucleus of the thalamus. They can't see any roadmaps, right? There's no internal structure that we see well within the thalamus. This is a diagram from a neurosurgical textbook telling them like where you need to go. There are pulse sequences where we can do this better, but they take a long time. So can we help translate research to reality? And obviously, it would be helpful in pediatrics, elderly, people who are claustrophobic. Can we do less sedation? Can we make things more accessible? And cost-effective is really important these days in medicine, the speed. I'm sorry, with the cost of medicine, healthcare these days. And then obviously in terms of going faster, you can always trade, as we know, speed, gains in speed for gains in other things like resolution and image quality. Okay, so let me take a step back and just remind ourselves how do we make a picture in MRI, right? So there is some signal from an object and this is in the object that lives in a XY kind of Cartesian coordinate system, right? We use some process to transform this into frequency domain and which we call case space. All right, we're comfortable with that. And mathematically, we write that as we apply a mathematical transformation. In this case, it's Fourier transform to the object and we get out the other side, this signal in case space. And that's really the data acquisition piece, right? So this is the image reconstruction piece. How do we take the frequency space case space and kind of reconstruct back some representation of the object back in Cartesian coordinate system? And we know mathematically, okay, we could apply an inverse Fourier transform or reverse Fourier transform to go backwards, to go from this red box back to this. Mathematically, that's true. And that's what we call inverse problem. But the problem is that this case space, we don't really have all of it, right? We cheat and it's not the actual representation of this object fully. And so when we apply this reverse Fourier transform, we get back a not very good picture of our object. And we, in fact, we get less and less information the faster and faster we wanna go and we get a worse and worse image. And we aren't happy with that. We really want a good image of this thing, right? Except we really only have this. And so how do we do that? It becomes what's called an ill-posed inverse problem, you've probably heard people say that, because it no longer works to use just a reverse Fourier transform. You have to apply some thing, some complex operation to get the good image back, right? And then the trick for image reconstruction is to figure out what that operation is. And mathematically, it's a matrix and basically solving for the parameters of this matrix. Okay. So in terms of using deep learning for image reconstruction, it builds on the principles of image reconstruction and the advances we've made in terms of iterative reconstruction. So you get some undersampled case space, you get some coil sensitivity maps, and you put it through a reconstruction model, you get a bad reconstruction, you compare it to a reference, and you measure the error, you update the parameters, you make a better image until you're happy with it. And then that's how you kind of train this matrix. And you've probably seen images like this where the deep learning reconstruction is a six-fold accelerated, is really quite shown to be like, you can't tell the difference between that and the ground truth. So we've been fortunate enough to partner with folks in computer science who've been working on images processing, like kind of rain removal tasks. We partnered with Facebook AI Research kind of combining these areas of expertise. We were able to share and engage the entire scientific community and kind of add synergy to advanced science by providing this publicly available data set, the FAST-MRI data set, which now, actually, this is kind of a little bit old. There's like over 9,000 unique visitors in the past year from all over the world. And I think the last year, nearly a petabyte of information. It's like one of the top 10 life science data sets that's hosted by Amazon Web Services. So that's been really fun. And then we've kind of co-hosted some challenges together with Facebook AI Research. And it's really advanced this problem or throwing this problem out and advanced science quickly. This is just some recent work we're doing, trying to understand what a variational network actually does if you push it out to a hundredfold. Here's a pretty normal looking brain. It's the zero-field reconstruction on the bottom, just so you can see just how little information we're actually getting if you do a zero-field reconstruction compared with the like hundredfold. Now, this is a little bit academic because the gains in time are nominal almost between like 64, 80 and a hundred X. But just to show that the abnormal case, you do see that there's some kind of pseudo normalization of the abnormality when you get really far out. But, and is it merely memorizing? So if you take such tiny amounts of data sampling from the actual person, is it really just learning some like average brain and spitting that back out? Well, it turns out it's not, because that was the fear, right? You can see that from these three very different looking brains at even at a hundredfold, it totally looks like, resembles the original person. And in fact, when you add noise to the system, it spits out noise. And if you put in a knee, which this thing was not trained on, it doesn't try to give you an average brain. So what we think is that it's a complex projection of the original data and not just mere memorization or some average. So in real life, we don't have the reference, right? And so we can't really measure the error and we don't have the similarity measurement. So how do we guard against this thing where it's like pseudo normalization that I mentioned or hallucinations or just frankly bad? Well, there are kind of different ways you can try to approach this, this complicated picture. But if you take, this is the ground truth and you do something to it to get your new image, what you wanna do is measure this error, right? But you don't have this original Z. And so you can't do that. So instead there's a process, kind of like bootstrapping or Monte Carlo kind of approach where you can take, you can do the same process to Z prime that you did to get it. And then you do have Z prime. And so if you do this multiple times in average, you can kind of measure the, you can estimate the error. Well, actually you can measure the error between Z double prime and Z prime, which allows you weirdly to estimate the error, the original error. So how good is this? It's not great. It's not perfect, but you can see that at least the PSNR, one is the true error and the estimated error really follows the close correlation, right? And there's some outliers for the SSIN. But it's an approach like this that I think could be useful in the future to be able to estimate error. And I just wanna skip ahead a little bit to this slide to show that some people are working on adaptive acquisition strategies. So you would not only, this is what we just talked about, train a reconstruction model, but you would use deep learning to decide even which pieces of information to sample. And people have shown that at faster acceleration levels, there is a benefit in adaptive strategies. So in the future, we think about image reconstruction. Hopefully we could get case space, sample some data, put it through the reconstruction model, estimate the error. And then if there's a lot of errors still, we go back and get more data and then make a better image. Once the error drops down to a satisfactory level, you make the final patient image. And then obviously there's also specific information that could be obtained that would influence your parameters of your model and maybe an adaptive strategy to even getting the information. So I think there are many, many approaches. And then in terms of image enhancement, which we didn't even get a chance to talk about, these are things like super resolution, going from low to high dose, synthesizing CT from MRI and addressing different artifacts. You could see a world where you would add these all up and make your final image. Thanks so much. Thank you, Yvonne, for this great presentation. So now that we have heard about upstream AI, we're going to turn to Luciano to hear about downstream AI and applications in neuroradiology. Luciano, please. Hello, everybody. Can you hear me okay? Yes. All right. So yes, I'll be talking about the downstream. I have no personal disclosures to make. I want to acknowledge my team in the AI lab. We collaborated with a bunch of physicians, engineers, medical physicists, technologists, and interns, and some of the stuff that I'll show relates to the work that we've done together. And here's the overview. So instead of like presenting all the clinical applications under the sun, I'm going to really more provide you with a framework of some of the things that you should be thinking about when you're implementing these tools in your practice. So things like how to select a clinical application on AI, decisions that were done during development and implementation that will impact those downstream applications, such as cohort selection, metrics, how it's deployed, inference adjudication, and how to improve models. We'll talk also at the end about transferability, which is a very important topic in the implementation of AI. So how to select a clinical application. First, obviously, I mean, we don't need to talk too much about that. It needs to address an important clinical need. We do see every once in a while an application that is cool, but doesn't have that clinical importance. I would try to stay away from things like that and really focus on the ones that have that clinical value that is added to patient care, or improvement of workflows and things like that for the physician. But most of all, I think we're more worried about providing a better care for our patients. One thing that is super important is to select something that you will have some sense of human performance and that there is agreement. There's been a lot of research out there that has shown that if the data that you feed an algorithm is very, there's not much agreement in the annotators, the output is not gonna be great. So things like if you're trying to assess, to develop or use an algorithm that is determining foraminal narrowing in a lumbar spine or spinal canal narrowing, there's a bunch of research that shows that we as neuroradiologists, we struggle a lot to be in agreement with each other. So you need to be careful because that AI algorithm, it's not gonna be better than the neuroradiologists in figuring out how to determine those things if it was fed by a very heterogeneous and incomplete data. It should improve outcome, as I mentioned before, and we should always think about when implementing these tools, not to have a narrow focus. So like an example would be, if you're using an intracranial hemorrhage detection algorithm to screen all your CTs, this cerebral abscess will not be flagged as a critical finding because you're just focused on that intracranial hemorrhage. So you just need to think clinically during that implementation to be able to address, you know, concerns with the patient more globally. I will give, throughout the talk here, we'll give like focus on these examples of studies that we created here at OSU. And one is critical finding identification algorithm for head CTs, about almost 50% of our cases, and this is common in academic practices, they are stat, they are considered urgent, and there's a prioritization issue that we need to address. And another one that I wanna show is the metastasis detection algorithm that can detect down to a millimeter, lesions down to a millimeter, which is key to improving lives of the patients with cerebral metastasis, because the earlier we can detect, more effective radiation treatment will be for these patients. And also very difficult to detect that one millimeter lesion in the brain MRI. So in that framework that I wanna give you, and things that you need to consider when you're either creating or purchasing, you need to know where the data, the data that was used to create that algorithm came from. And it's important not to introduce any bias in that data selection. So if the selection was done with only beautiful, perfect images, it will only work for beautiful, perfect images. Is that our reality? No, it's not. There's a bunch of motion degraded artifacts in our real world data that needs to be accounted for. So in order for you to have an algorithm that will work in a clinical setting, it either has to address, you create a two-step algorithm, one that filters all the motion degraded studies, and then pass the beautiful images to the algorithm, or you need to deal with all the real world data upfront and create an algorithm that can handle that type of suboptimal data set. Suboptimal, but reality. So examples are, if you don't include this motion degraded study, what will happen eventually is that your intracranial hemorrhage detection algorithm will say, okay, there's hemorrhage here. And in reality, it was just motion degradation. We are very used to those things. So it's important to have that data being very representative. So strategies such as using a sequential data set from your PACS instead of a handpicked one is important. You also need to be very careful with inclusion and exclusion criteria, because that will essentially skew your algorithm to detect one thing or another. As far as the development, one of the most important things is the metrics that is used. And I decided to put this slide here because I see a ton of either papers or even vendors talking about accuracy all the time. And accuracy is such an incomplete metric in medicine, I feel like, because in medicine, we're very used to unbalanced data sets. So you almost never have an equal number of disease in normal cases. It's almost always gonna be a minority class that will be positive cases. So the importance of recognizing the issue with accuracy in the imbalanced data set setting is displayed here. In this case, there's 10 cases with cancer and 990 cases without cancer. And the AI prediction said that there was no cancer. In other words, it was kind of useless. The AI prediction was always saying, yeah, this is a normal case. It was not detecting any of the positive cases of cancer, but still the accuracy is 99%. And that is just a reflection of that imbalanced nature and the fact that accuracy cannot account for those things. So there are things that there are other metrics that are very useful. I'm gonna give you an example of the intracranial hemorrhage or critical finding detection algorithm that we created that we use hydrocephalus, mass effect, intracranial hemorrhage, and also added normal or cases with chronic findings, such as encephalomalacia and other chronic diseases. And we fed an algorithm using deep learning. And that's one of the key things for us to recognize here is that these tools, they rely a ton on the data. So if you don't have your data organized correctly, so that is a large portion of your algorithm development will be in that data selection. That's why I was emphasizing so much. So here's the metric that we use for this type of algorithm, AUC. So the Receiver Operating Curve, the area under the Receiver Operating Curve. So 0.91, one of the things with this metric is that it's great for analysis and comparison between two different algorithms. But one of the things that it does is you cannot implement and know how this is going to perform because you need AUCs or Receiver Operating Curves. They are threshold agnostic. So you don't need to select that threshold. Typically a threshold will be a 0.5. Below that, you will have a negative case. Above that, you will have a positive case. So you need to select that threshold to be able to determine that. Once you do select that threshold, then you can talk about sensitivity and specificity. And those are things that we're way more used to understanding in the clinical setting. So that is a very fair way to report these performances because first we understand what they mean and sensitivity and specificity will not be prevalence, won't change with the prevalence of the case mix. In other instances, such as the intracranial metastasis detection algorithm, we didn't have a great way to represent with standard AUCs and standard metrics. So we developed our own metric that uses a combination of sensitivity and false positives. The false positives will give you a sense of, obviously the sensitivity is like a screening tool. We are very aware of what sensitivity means. But that balance between sensitivity and the number of false positives is that it's giving me the sense of, if you have a ton of false positives, you're gonna give me more work to go through these cases, through these ROIs than really is gonna help me. So I wanna have a very balanced approach here that is very high sensitivity with very few false positives, just to give you a sense of workflow issues and also detection capabilities. So all those things need to be considered when selecting or implementing your tool. As far as the deployment per se, it will all depend on the type of tool that we're talking about. In the screening setting that we discussed previously, that is more of a workflow related tool. And we use this to actually do work list prioritization. So what is an example? Here's our work list. These are cases in a timestamp based work list. And as you can see, there is a hemorrhage case down in the bottom of the work list. That hemorrhage is actually this one here. So it's a subdural with the shift. We wanna go and read these cases. First thing, this is a cerebellar mess with hydro, another subdural hematoma, but smaller. And the tool we develop goes through, assigns, there's a light bulb when AI has gone through these cases. And once it has gone through them all, it reprioritizes the most critical cases. After identifying the abnormality in each of the slices and computing the whole abnormality for the entire skin, it bumps up the ones with the highest criticality findings. On the metastasis detection, we put an ROI, as I mentioned, in the lesions to allow for the radiologist to pay attention to the ones that the algorithm thinks are most likely to be metastasis. And here's an example, a little dot there that will make a difference for this patient when radiation treatment is being considered. So here's just a little video showing how that's implemented. You can select to show the AI results. And then as you scroll through that data set, it will show you that circle, the ROIs in each of the metastatic deposits there in the brain for the radiologist to decide whether to include those as real cases or not, which comes to inference adjudication. AI is not gonna be right all the time. There are several instances that will be wrong. And I don't foresee anytime soon that it would be 100% correct all the time. So we have to have a way to have input from radiologists in the mix. An example is this. So the ROI is shown for the real metastasis, but for some reason, it thought that there was metastasis there. I don't agree. So I would go there, I would click that ROI and say, delete that measurement. And what that does is provides a feedback loop to the algorithm to say, okay, so made a mistake there. Next time I'm gonna train with a different batch, with a new batch that was collected by the radiologist. We hope that AI will do better. And that also gives an opportunity for you to assess how the model is doing. Actually, the FDA is very much in tune with that concept. So once you create an algorithm, you train, you validate, it's all good. And then you'll deploy the model here. That's not the end of the story anymore, because these tools, as I said, they are very particular about the data set in which they were trained with. So you need that other component of model monitoring that will keep an eye, whether there's something, let's say you completely change your scanners and now there's something about the algorithm, the way that was developed, that is not recognizing the lesions anymore because your scanner had changed. We describe how to implement that feedback loop in this paper for those that are interested that covers research production and that feedback loop. Essentially the end of the day, what we're trying to do is monitor how the algorithm is doing on a regular basis and also improve the model at the end of the day. Just a word on transferability. Some may be more familiar with the generalizability. Transferability is just like you created a model at OSU, now you're trying to implement at UCSF. So the data set is completely different and we know it's different. So that is called transferability. And here is, I think it's important to realize that the algorithms that are most stable for transferability are the ones that have a very heterogeneous data set as far as the training. That's why the RSNA has been working so hard in creating data sets that have representation from multiple parts of the globe actually. So if you look on the AI page of the RSNA, we've been working on several challenges but most of all developing data sets that have representation from all continents that allows for the geographic heterogeneity but also the different scanners and all that is needed. So heterogeneity is welcome. And the importance of that has been described in many papers. This is one of the examples that was published that the performance of this intracranial hemorrhage detection algorithm where it was developed was 0.99 area under the curve but then once they implemented in another hospital the AUC dropped to 0.83. And this is something that we see all the time. And there was actually a Forbes communication why machine learning models crash and burn in production. That relates to that, that transferability is something you need to be very attuned to very focused and concerned about so much that some people that before they implement the algorithm on their site they would do a test run to see is it really performing the way that I was hoping it was going to perform. Just to give you one example of the algorithm that we developed initially when we're still trying to figure this whole AI thing out we developed an intracranial hemorrhage detection algorithm with just a subset of our scanners. And then we try to run that same algorithm in the ED that never had representation in the training set. And all the cases from the ED were being flagged as with hemorrhage, even the normal ones. That was because that scanner had this cupping artifact here that higher density and the periphery of the brain that the radiologists didn't like that but they were already used to windowing appropriately and determining whether hemorrhage was there or not but the algorithm never had that representation never had that case to figure that out. And so one real world example of how important it is to be very comprehensive in your assessment and especially in that cohort selection. The other question- I'm sorry to interrupt, we need to move on. Can you please wrap up? Sure, yes. So essentially it's important to have that clinical, understand that the clinical application will be helpful for the patient and for the radiology department. Downstream decisions will impact downstream applications. Transferability is a very important thing and we need to recognize the limitations of these tools for a successful implementation. Thank you very much. Some of those things we'll tackle them again during the panel discussion. So you'll have a chance to make the points you are not able to make. And I think it was a great transition for Mark Colley from UCSF who is going to talk to us about ethics in AI. And I think from the first two talks, we can see already that there are quite a lot of potential issue and Mark is going to detail them for us, Mark, please. Thank you, Max. So I appreciate the introduction. I'm going to talk today about artificial intelligence ethics and implementation and reinforce a lot of the things that other speakers have mentioned. These are my disclosures. None are really directly related to the talk that I'm going to give today. So I'm going to give an overview of some of the terms that other speakers have used today. We're going to talk a little bit about bias. We're going to talk a little bit about risk in implementation. And then we're going to talk a little bit about how you might design or integrate AI into your workflow. So the first thing is bias. And I've learned a lot about this in many different fields over the last several years. So bias is prejudice in favor or against one thing, a person or group and compared with another. And I think that it's important to recognize that all of our artificial intelligence algorithms have some sort of bias in them. And sometimes that bias is needed and necessary, but other times that bias has an unnecessary negative effect to it. So bias is the ability to discriminate between two things. Sometimes that discrimination is between disease and not disease. That's very helpful. But if that discrimination between disease and not disease isn't the same for people from different ethnic groups, for example, then that bias is not going to be helpful to the application that we're looking for. So there are a bunch of different kinds of bias. And I think that many of you who are scientists or researchers, or even just physicians are used to dealing with bias on a daily basis. So I think it's just important to recognize that the skillsets that you already have in looking for and recognizing bias are important to apply to artificial intelligence algorithms. I wanted to walk through an example. So this is an article that was written by some of my colleagues here at UCSF. And this was about discrimination in a commercial electronic health record algorithm. So basically what they took was a model that was built in and provided by Epic to calculate likelihood of no show for all kinds of different report, all kinds of different encounters, including visits for imaging. And this is something that is provided by the vendor and you can just turn it on and use it. They looked through the documentation for the model and it pretty clearly specified that the inputs to the prebuilt model included the patient's demographics. So including things like race and gender and other things. They also included clinical history, prior missed appointments and the day of the week and other things as inputs. So one of the things is that, we have a big interest in health equity here at UCSF. And so when they saw the list of inputs for this algorithm, they were really concerned about including the demographics. As I mentioned, the output really is this no-show likelihood. I think that it's really important to think about what sort of interventions you might be doing with that no-show likelihood. And I would say that typical interventions are things like overbooking. So if you have someone who is likely to no-show, you might also schedule another patient in that slot. Also thinking about sending reminders to patients is another intervention that you could use as an output from an algorithm like a no-show likelihood. So these inputs that we were concerned about were things like ethnicity, financial class, BMI. If you have patients who are larger, maybe they have mobility issues, that makes it more difficult in our dense urban environment to get to appointments on time or to get to appointments. We were concerned about basically continuing to exacerbate our existing healthcare inequities by implementing something like this. We were concerned about continued marginalization of these vulnerable patients that we serve here at UCSF. So one of the things that we did was we took that model, we actually removed the demographics from the model and retrained it and created a new version of the model that we implemented. And when we analyzed that, we recognized that that algorithm, even after removing those demographic inputs, was still biased. So we started to ask ourselves, you know, why is this? And the reality is that the algorithm still contained things that ended up being a proxy for those demographic issues or the other issues that were problematic. So even just a higher likely, even having prior no-shows essentially is a proxy for some of those social determinants of healthcare that lead to patients being at higher risk for no-showing for an appointment. And so we were kind of stuck and wondering, you know, what are we going to do with this information? How are we going to try to improve the rates of people coming for care and still be able to maintain the operations that we want to maintain? And so one of the things that we decided to do here was only to implement patient-positive interventions. So rather than overbooking, which leads to, you know, issues with being able to care for multiple patients at the same time and downstream interruptions to the care process, we decided only to use this signal to implement patient-positive interventions. So additional outreach for transportation, outreach for childcare, flexible appointment times and, you know, better access to telehealth. So I think it's really important to recognize when we're implementing artificial intelligence that these biases exist and sometimes they exist even as a proxy to the inputs and may not be visible from the early, from just looking at the model inputs from the outside. Another thing, and this was part of the transferability part that Luciano was discussing, there have been many articles that have pointed to how brittle deep learning approaches are. This was one that was particularly striking to me. So this is an image that a deep learning algorithm said was a panda with 58% confidence. This research showed that if you add a very specific type of what they call adversarial noise to that image, that the deep learning algorithm will recognize this as a gibbon with even a much higher confidence. And you know, I don't know about you, but I can't actually perceive a difference between the panda and the gibbon here. So I think that, you know, Luciano showed some examples of where you can understand why the algorithm didn't perform well with the artifact that he showed along the skull edge. But I think that this underscores the need to do local assessments on your data when you're thinking about implementations. The other thing that's really important is to ask people when you're thinking about commercial products, whether the data that they're providing for their training data, whether that actually matches your patient and scanner population. Many times commercial companies either don't have this data or are unwilling to provide it. But I think as consumers, it's really important for us to be asking about ethnic and racial backgrounds and also whether the scanner populations are represented and not really accepting generalized answers like, well, we got data from a whole bunch of places as an acceptable answer. And then just like we did the ongoing monitoring of the EHR performance implementation, we want to make sure that we're doing ongoing monitoring and validation of algorithms in our practices. So let's move on to the concept of risk. I think as physicians, we are well aware of, you know, the concept of risk and we're constantly trying to minimize risk and maximize benefit for our patients. But I just wanted to talk to you about, just for a moment, a thought example around the sort of risk that goes with administering a medication. Medications that have a much higher risk of patient side effects, things like chemotherapy, require much higher quality of evidence in order to prescribe that intervention versus something like antibiotics. So, for instance, the same thing, you know, administering a medication, you would probably want to have a tissue diagnosis before administering chemotherapy, whereas, you know, if somebody has a cough and a fever, you may empirically prescribe them with antibiotics for pneumonia while you're waiting for a chest X-ray. Those two things are very different based on the risk of the interventions. How would we apply this to artificial intelligence? Well, you could imagine that if you have an artificial intelligence model that takes images that are acquired from a scanner, looks for a particular finding, whether it's present or absent, if the finding is present, you ask the radiologist if they agree with the artificial intelligence. If they do agree, then they place that into a radiology report and file that off in the electronic health record. With this human in the loop, you can imagine there's one degree of risk associated with that. If you take that same exact artificial intelligence algorithm and the same images with the same diagnostic performance, but you take the radiologist out of the loop and the output that you do is you actually take that finding and enter it on the problem list or for the patient in the electronic health record, or you create an order that actually prescribes a medication or something, you can imagine that that same exact algorithm with the same model performance carries a much higher risk just based on the output and the way that you're implementing that algorithm. It's not only necessary to consider the diagnostic performance of the algorithm, but it's also really important to understand what the algorithm is going to be doing in the way that it's implemented. I think that we're learning a lot more about this. Institutions like mine here at UCSF, we've been building a process to regulate and to govern the implementation of artificial intelligence. We've really been taking some of these themes that I'm just hitting the tops of the waves on and building them out into our process. We're actually really excited to start to publish some of the questions in the process that we're working through for how we think about AI implementation. I think it's really important to match the controls that you put in place to the risk of harm because doing full monitoring validation for every algorithm, I don't think is necessarily practical. So algorithms where the risk is either low or the risk is born by the system and not born by the patient directly, those algorithms may have different requirements for monitoring and validation. I just wanted to talk a little bit about implementation design here. I'm sorry to interrupt. We're getting close to the end of our to keep a little bit of time for the panel discussion. So do you want to make some concluding remarks? Sure. Thanks, Max. So a lot of the things that I was going to talk about there, I think were themes that were discussed by some of the other presenters. So we can go ahead and shift that into the discussion. So thank you very much for listening to me about the pieces about risk and bias in AI implementation today. Thank you. Thank you, Mark. And I think that this just reflects the fact there are so many things to discuss about the AI that we definitely need more than one hour. I think for the last 10 minutes that we have, I wanted to kind of perhaps ask each of you, starting with Susie, if you could describe one application in your clinical practice that you are using AI for and perhaps also one obstacle, like what you see as the major obstacle to the practical clinical use of those AI algorithms. So Susie, do you want to start, please? Sure. We actually use a lot of different AI tools in my clinical practice, so it's hard to just limit to one. I think probably one of the most exciting ones is deep learning for image reconstruction. And it allows us to scan our patients 50 to 75 percent faster, but with increased image quality. And so we deploy a vendor neutral DL solution to all of our legacy magnets within our fleet. And we also use an OEM DL product for some of our newer magnets. We also use the vendor neutral one for some of our newer ones as well. And we've had tremendous success with this. And really, prior to implementing this, we did pilot studies and we also published trials, multi-center, multi-reader trials. And we found that these trials show that the DL enhanced fast scans really outperform standard of care across every single quality feature assessed. And it also preserved quantitative integrity. And that's really probably one of the great ones. But again, we also use AI tools for quant. I've been doing that for 16 years. We use it for cancer screening and we use some triage apps as well. Although triage apps are more applicable to a hospital based setting, I'm outpatient. So that's just a very, very quick flavor of what we do in our company. I would say there are several obstacles with AI tool implementation. One of them, I think one of the biggest ones are probably just getting our radiologists on boards with the AI tool that we choose. They really want to know if it's going to slow down their dictation speed. Most radiologists work on an RVU basis. And so even if the AI tool adds value, they don't really want to be using something that's going to impact their salary. And so it turns out that it's variable depending on what AI product. So for AI quantitation, for example, in multiple sclerosis patients where you're using a pre-populated reporting template, studies have shown that you can actually read 38% more cases per hour when you use the tool with a pre-populated reporting template. Deep learning for image reconstruction would have no impact on the radiologist dictation time that's actually done on the front end during image acquisition. Things like quant for dementia, it may add possibly a minute to my dictation, nothing really impactful. Although if I use the pre-populated reporting template, it really doesn't add any time at all. So that's the other thing. And then also just how to teach the radiologist how to use the AI tool. And we tend to do that through, in our company, through internal webinars, champions within subspecialty divisions, and just quick eight-minute webinar type things where the radiologist can log in and learn about how to use the tool and incorporate that. So that's one area of obstacle, and of course there are several others, but I will let the other speakers weigh in as well. Thank you, Susie. Yvonne, do you want to take it next, your most exciting application and what you see as the most significant obstacle? And Yvonne, you're muted. Thank you. It's interesting for me to hear Susie's response because, like she said, she comes from a practice scenario and I'm in, I guess, ostensibly a different world, the academic radiology department, but I think the drivers are very similar, maybe more so these days than in the past, than ever before. And I want to say my response would be very similar. I think, you know, if I had to say one thing that we're using, we are using deep learning for image reconstruction. To be honest, everybody's going to be using it, right? All the major vendors at the RSNA, this is last year, so who knows what they're going to show come November. We're already rolling out their versions and they all have their own, you know. Each vendor has a little bit of a different take, but I want to say, I mean, every modern scanner that's coming out has something incorporated into it. So that's really exciting. I mean, I think it's impactful on our field. And so we're certainly using that. We, you know, I want to say maybe we're a little bit slow because we're very careful. We have talked with a lot of different stakeholders within our institution, legal, IRB, compliance. There's sort of a whole governance group that thinks about implementation of different AI products, at least in imaging. And, you know, those have engendered really interesting conversations, very useful conversations. But you know, when you talk to legal and compliance, I guess this is the answer to the second question, you know, all the obstacles and why you can't or why you shouldn't or why it's not a good idea kind of come up. And I think it's necessary to think about, and at least in our department, we're very careful and, you know, not politically conservative necessarily, but conservative about implementation. I also think another big, big obstacle is kind of the business model for some of these companies that are out there, startups and other bigger midsize companies that are coming out with AI products for diagnostic imaging. Not all of them have, or I would say hardly any, have a clearly articulated business model that really works, or at least, you know, I don't know. I mean, I'd love to hear from the other panelists a little bit more about what they think. Like how is it viable? Who's going to pay for it? Like the patients, the department, the radiologist going to take a cut in salary? I mean, how is it actually going to work once it's implemented? So perhaps we can switch gears a little bit. Luciano, do you want to try to answer Yvonne's questions? Who is going to pay for that? Yeah, I think it's a very difficult thing to answer. So there are a few ways to think about this. One is for things that are addressing a workflow issue and improving the efficiency of a group, you know, that's clearly an area that you can invest on and decrease the amount of money you're paying elsewhere. Others are on the quality side is where I see the hardest time that we have to try to improve the point to leadership. It's worthwhile investing that. But obviously is where the, you know, when there is no like real financial return on investment, we just want to make sure that that quality will be addressed. And I think it's a hard thing to solve. But there's been also with Medicare, Medicaid, there are instances that they are paying for AI, you know, as was announced, I think, a few years ago, added payments to the use of AI. So I think we're starting to see a move towards that when we are able to prove that the AI, the utilization of AI in a clinical care will impact on that patient care and will improve the quality of care, decrease, you know, length of stay or improve outcomes for that patient. Yes, I just want to make one note of that. We have actually looked at our numbers for different AI tools. RadNet, we use a ton of different AI tools in our practice. For deep learning and image reconstruction, we found at least a sevenfold benefit on ROI. Now, again, applying a vendor neutral tool to all of our legacy magnets, that means instead of retiring that magnet, we can actually make that old 1.5 scanner look more like 3T quality images. So that's a huge impact. And then also OEMs are operating more on just the newer magnets. But it has been at least a sevenfold increase in ROI. So that's very exciting. And for these quant, we've also specifically looked at our numbers for the quant tools, and it has turned out to be a slight net benefit for us as a company. So we do not lose money anytime we use quantitative volumetric post-processing. And even in the instances where insurance doesn't pay, you could always do an alternative model, like a patient pass-through where they would, if their insurance is not going to cover it, you could have them pay up front, you know, when they come in to get their scan. But even without the patient pass-through, it's still a net positive. So that is encouraging. And then as Lushana was mentioning, like the VIS LVO tool, and CMS is reimbursing and actually an incentive payment if they do that, because they have actually proved that it, you know, improves morbidity and mortality factors when you use the AI tool for patients that come in with an LVO in acute stroke cases. Mark gets to share the final words of wisdom from this webinar. And you'll see there'll probably be many others because there are so many issues to discuss, but Mark gets to share his final words of wisdom. I don't know that I have a whole lot to add. I am, as far as the payment question goes, I'm skeptical that Medicare and Medicaid and the AMA are going to create net new CPT codes for us. I do agree with Luciano that we will need to have, you know, we'll need to have outcomes research in order to prove that. And I just look at our track record and radiology of being able to generate outcomes research, and it's really, really poor. That's very, that stuff is very expensive. So I think that we're going to have to be creative in the ways that we try to finance this. And I think following Susie's examples is really the path forward here. And right. Thank you all. That was a great webinar. So as you can see, a lot of very exciting new developments, still a lot of work to do and a number of things to address, particularly in terms of implementation and reimbursement. And I'm sure that RSNA will organize other webinars to dive in into each of those specific aspects. But in the meantime, thank you all for presenting today. And thank you for all the attendees. And you all have a great day. Thank you. Bye bye.
Video Summary
The webinar, moderated by Max Wintermark, explores the integration of AI in neuroradiology, focusing on research, implementation, and ethical considerations with expert speakers like Dr. Yvonne Louis, Luciano Prevedello, Mark Colley, and Suzy Bash. Dr. Louis discusses "upstream AI," emphasizing the importance of faster and higher-quality MRI image acquisition using AI, particularly partnering with private sector giants like Siemens and Facebook for enhanced imaging data sets. Luciano Prevedello then shifts focus to "downstream AI," explaining the implications of AI in clinical applications, the importance of unbiased, comprehensive data, and balancing sensitivity and specificity for accurate detection in neuroradiology. He exemplifies critical finding identification algorithms and their workflow improvements, noting the need for local data validation due to algorithm transferability issues. Lastly, Mark Colley addresses AI ethics, highlighting bias concerns, especially in demographic data influencing algorithms in healthcare, and the importance of ongoing algorithm validation and risk considerations when implementing AI technologies. The Q&A segment reveals practical applications in clinical settings and stresses major obstacles such as radiologist adaptation and clear business models for AI tools, alongside discussions about financial models, insurance implications, and reimbursement challenges. Overall, the webinar emphasizes that while AI in neuroradiology offers promising advancements, significant challenges in ethical application, business viability, and integration remain.
Keywords
AI in neuroradiology
MRI image acquisition
upstream AI
downstream AI
AI ethics
algorithm validation
bias in healthcare
radiologist adaptation
business models
ethical considerations
RSNA.org
|
RSNA EdCentral
|
CME Repository
|
CME Gateway
Copyright © 2025 Radiological Society of North America
Terms of Use
|
Privacy Policy
|
Cookie Policy
×
Please select your language
1
English