false
Catalog
Reducing Diagnosis Error in Radiology - Is It Poss ...
M8-RCP24-2024
M8-RCP24-2024
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hello, welcome everyone. Thank you so much for showing up on a Monday afternoon 4.30 at RSNA. You are the superheroes of radiology. Pat yourselves on the shoulder. Coming here for a topic that's not even imaging related, or it's not like an anatomical topic, but we're talking about reducing diagnosis error in radiology. Is it possible? And the session is sponsored by the RSNA Quality Improvement Committee. So is it possible? The answer to this question is yes. And here's how. I will give you a driver diagram, which is a tool. I will give you a definition of error in diagnosis. And I want to talk a little bit about the SAPES model, which is the Software Engineering Initiative for Patient Safety Simulation. Maybe a model you haven't heard of, but that inspires also some thoughts about improving diagnosis. So let's talk first about this driver diagram tool. So who in the room has heard about a driver diagram or ever used a driver diagram? Anybody? Hands up. Nobody. Okay, cool. So the driver diagram is a tool. It's usually used with a group of stakeholders that know the material or that know the problem. And in that first box on the left side, you would put the problem. So for example, well, I'll show you an example later, but if the problem is timely reporting, for example, you put that in there, how, what our problem is, our reports are delayed, something like that. And then you think about primary drivers. Those are higher level kind of concepts that lead to delays. And then secondary drivers. So for each of these higher level concepts, there are more specific ideas on how this delay is caused. And usually once you're at that level, the secondary drivers translate directly into an idea for improvement. So it's a really wonderful tool. I love it, and I'm excited to show it to you today. The diagnosis, the definition for diagnosis error comes from this report, Improving Diagnosis and Healthcare from the National Academy of Sciences. And it's written down here. So it's the failure to A, establish an accurate and timely explanation of the patient's health problem or problems, or B, communicate that explanation to the patient. So a failure to communicate that patient. So we have these three components, accuracy, timeliness, and communication of the results to the patient. Now, we rely on the providers to make that communication, but we also now have the reports available in MyChart, so patients can read them. And on occasion, when we can't reach anyone else, we may even report critical results to patients directly. So let's talk about accuracy first for a little bit. In the literature, there are error rates in radiology reported to be between 3% to 5%, which is what we're envied for by clinical specialties, where the error rate is estimated to be about 15%. So I don't know how reliable our source data are, but there we are. We're looking great in error rates. But that does translate, because of the high volumes, to 40 million diagnostic errors annually worldwide. And so what are these errors? They're missed findings, so we didn't see the abnormality, or we saw something abnormal and incorrectly interpreted it. And in the end, it leads to a wrong diagnosis. So let's take these three elements and put them into a driver diagram. So here's the first issue is we have a missed finding. So what are the kind of primary drivers for that? Could be the attention on part of the radiologist. It could be bias, image quality, how the study was ordered, protocol, and technique. And I crammed the ideas in there. There's probably more. If you really do it with a group, I just did this by myself as an example. But let's just focus here on the bias, for example. So if we think about satisfaction of search bias, like you already had two findings, so you think you've enough, and you missed the third and the fourth finding. Or the satisfaction of report. Maybe there was a prior report leading with a diagnosis from your colleague that you cherish very much. So you follow that person's interpretation of the finding, and you miss another differential diagnosis. So once you have these primary drivers, then you can think about secondary drivers. And those could be, sorry, these were the secondary drivers. And now that translates directly into an intervention on the last column on the right. So if we're talking about cognitive bias, it helps to create awareness and educate people. If we had a second read routinely, I know in real life it's almost impossible to do, but that could help. Structured reporting and checklists can help in making sure that we evaluate everything, and we search the entire screen and every image, or using AI solutions to make a diagnosis or catch missed findings that we might have missed. So similarly, for the other entities, the misinterpretation or wrong diagnosis, there's another set of biases that play a role, like anchoring bias all the ones that I've listed there. And again, there's a number of interventions we could consider, like provider education in terms of the clinical information that they provide, which we all know is often very lacking. And then training radiologists to always give top three differentials, not settle on one diagnosis, which could be wrong. Using certainty reporting, like how sure are you that the diagnosis that you're providing is really the one that the patient has, and how standardized is our reporting there, and in terms of the language that we're using. Also, second reads can help with most of these biases, and rat path feedback. So those are all just ideas that came to mind quickly. How about timeliness? Again, here, if we look at the example of how timely is follow-up, we do have some issues with patients that don't know that they need a follow-up scan, for example, for an incidental suspicious lung nodule. And so one thing that could help is tracking systems and reminders in radiology. So this tool, the driver diagram, can really help. And you may come up with completely different things here. This is just what I put together for the talk as an example. For communication, I had this. For communication as accountability, who is providing, is it really a provider task, how much is radiology involved, is it perceived as interference if we communicate with patients directly, do we have the right contact, and for all of that, do we use the right vocabulary in our written and verbal communications with patients, but for the written also, do we have images to illustrate the findings, and then can we, as an intervention, use hyperlinks and annotated images, and also AI tools that can help translate reports so that patients can understand them. So you can get a lot of ideas by using this tool related to diagnosis error. How do you like these examples? Are they genius? Are they moronic? No? Okay. Well, I can't take credit. I took inspiration from this paper from Australia. They really summarize very nicely a lot of the errors and biases in radiology and some of the interventions that can be taken to address those. All right. Let's talk a little bit now about this model, the Software Engineering Initiative for Patient Safety Modeling. It's a mouthful. We are more familiar with this. This is the Donabedian model, and so that means when we provide care, we need a structure, a process, and then somehow that will produce an outcome. So what do we mean by that? The structure is what is needed to provide care. We need a building. We need machines. We need tools. We need heaters and coolers and windows, all of that part of the structure, and then we have people working in that, creating a process, and they use tools to schedule the patient, image the patient, create a report. All of these are processes, and then we have an outcome. So in radiology would be we have a final result. Maybe the patient has a diagnosis. Maybe there's going to be a change in management and a better outcome for the patient because of that diagnosis, but I want to hone in on the structure. So this model actually looks more into how the human being interacts within the structure. So it's like the human interaction within that structure, and the structure can be divided into the organization, the environment, the tasks, and the tools and tech that we're dealing with, and I'm going to give a few examples about that. I think it's a really nice framework also to think about to identify sources or factors that contribute to error. So for example, for the organization, do we have policies in place that inform how we should do our work or what we should emphasize? Do we have a just culture in place that makes it easy to deal with errors, talk about errors, and achieve improvement, and is there engagement from everyone and willingness to work on decreasing errors? The environment is very much the physical environment. Is it too hot or too cold? Where I'm working, employee safety is part of the environment and teamwork, how much people help you with your work or help you with a task that you're not familiar with. And then for the task specifically, we can see errors when people are faced with tasks that they haven't been trained on or that are not written in their job description or not clearly defined. So those are important things to do. So we're talking about standard work where we write down exactly how things should be done in the best practice that we know. And then, of course, tools and techs. And we all work with that every day. And we know how the tools can fail. And we have workarounds. And we don't want these workarounds because they make us more error prone. So I hope this has been an interesting dimension or idea or some interesting points for you to consider. And we're going to hear from Cindy Lee about interventions that can lead to better diagnostic accuracy, which will deal a little bit with the cognitive bias. And then we hear about uncertainty reporting from Atul Shinagare and then fewer diagnosis errors with AI, Safwan Halabi. So if you're here for that, you have to wait to the very end. So with that, I'll hand it over to Cindy. All right. So, wow, we have more people in the room than when we started. So I admire all of you for staying with us. And thank you for coming to this session. My topic is about focusing on the cognitive errors that Dr. Kaden had mentioned briefly in her presentation and specifically discussing solutions, interventions, strategies that can help you to increase your diagnostic accuracy. And in the next about 15 minutes, we're going to define what it means to have a cognitive errors, their impacts to us as a radiologist. We're going to identify the top five and talk about potential ideas and solutions to decrease each type of errors. And I put the two references there in case you want to read me more in details. Dr. Kaden, thank you for more in details. Dr. Kaden and I wrote these two and they're the basis of this presentation. As defined, diagnostic errors are anything that's missed, wrong, or delayed as confirmed by subsequent images or testing. And it's multifactorial. It could be due to cognitive or system factors. So what it means for us as radiologists. Cognitive errors are usually related to problems of visual perception, meaning you were scanning for the findings but didn't see it, or you see it but didn't recognize that's abnormal, or you have a different or incorrect interpretation. This is different than the system factors, which is usually related to the health care delivery. For example, communication of critical findings to the ER physicians or even the incidental pulmonary nodules as an example. The impact of this is tremendous. Cognitive error is the most common source of medical malpractice in radiology. And near 75% of the malpractice cases can be attributed to some type of cognitive bias or error. And as Dr. Kaden mentioned, 3% to 5% or on average 4% of the radiology reports in our daily practice contain eight type of cognitive errors. And for about one fifth of the lung cancers that measure about 16 millimeter in median diameter, they were missed on chest x-ray. So this is a common and very relevant problem in our practice. So for example, this is actually a case from expert witness psychic. It's actually a malpractice case that has been settled. In 2018, this patient presented with her screening mammogram. The radiologist did not see the finding that's circled in red. A year later, she came back and it got bigger. Now they worked it up and four days later, they worked it up. So let me retract it. In January 8th, 2019, the radiologist still didn't call this abnormal. They said it was benign, let her go. The patient felt that there is something palpable. So she actually came back in four days and said, no, this is something new that I need to be worked up. And this turned out to be biopsy proven breast cancer. So why do we make mistakes? We're all humans, right? So making mistake is natural, is normal. And this book, Thinking Fast and Slow by Daniel Kahneman is one of the great book written on human cognition and decision making, which is what we do daily in our practice. And they explain this idea that we can divide a human mind up into two parts. System one, which is that fast, automatic process that helps you jump to a conclusion and save time. Where system two is that slower, more logical, more deliberate process that kind of help you work through the problem more systematically. So an example, a fun one is that, here's a math problem for you. A bet and the ball cost a total $1.10. And you know the bet cost a dollar more than the ball. How much does the ball cost? Somebody want to yell out? Ah, all right. So the flash answer most people jump on would be, oh, the ball cost 10 cents, it's just a dollar more. But if you actually think about it, if the ball cost 10 cents, that means the bet will cost $1.10. The total will be $1.20. So that doesn't work. So the genius in the room that says five cents is using his system two, the more logical, deliberate process to help you reach to that conclusion. Whereas the system one is helping you to jump to that quick solution, but may not be always right. So this is just an example of how it works in real life. And in radiology, when we look at the images on the CAT scan, MRI, or chest x-ray, you're looking at the features of imaging finding. You're looking for pattern recognition. And if you recognize it, then yes, this is an easy case. There's aortic dissection, there's a PE, there's a DVT, you're done. You dictate your report, you're done, next case. So that's your system one automatic process. But if this is a tricky case, you need to look up a deferential, you need to think about it, it's T too bright, brain mass, where is it? Let me think about it. Then that's your system two kicking in, the more deliberate process. And it's normal to go back and forth between the system one and system two when you're making this decision. And of course, all of this is impacted by internal factors. If you're tired, if you're stressed, ER is calling you nonstop, you need to make a decision right now, versus external factors like lightning, like there's glare on your screen, you just couldn't see part of the screen, for example. Okay, so now we talked about the cognitive errors and their impacts, we're moving straight into the top five errors that we see in radiology. There are a dozen have been described in the literature of radiology, but these are the top five most common we see. So framing bias is defined as radiologists reading the same imaging finding differently based on the provided clinical context. Right? If I tell you this is a 50 year old male with dysuria, and that's all I tell you. And on chest on the abdomen CT, you're seeing kind of like this enlarged kidney, maybe there's some stranding on the non-contrast. And then on the nephrographic phase, you see two hypodense area. So you're thinking maybe this is pyelonephritis, you know, it's 50 year otherwise healthy. But if I told you this man also has a history of lymphoma, your differential diagnosis or renal lymphoma will be a lot higher on your differential. So interventions that could be taken to reduce framing bias is to get better history. We all know if you can get better history, accurate history, available history, it makes your report so much more accurate and so much better. So suggestions have been made to have the technologists collecting the information, the history directly from the patients, if you can, with the patient's conscious, and it's a good historian. Or there's such a tool that can actually integrate the patient history from the electronic medical record and what's available in the PAC system. And that's the citation I wanted to leave you if you want to look up such tool, which I think is really cool. The second most common bias in radiology is called availability bias. It's defined as the preference given to diagnosis that were more recent, more memorable, or more personally felt. So what does that mean? It means this is actually a 40-year-old female patient. This is her breast MRI. I'm showing you post contrast subtraction images. You see irregular non-mass enhancement in the right breast. She had a history of silicone implant rupture, and she came with a palpable mass. So availability bias kicked in because we know she has silicone rupture. There is extravasated silicone, which is known to be associated in a small, rare percentage of cases with anaplastic B-cell lymphoma. And because that's out in the news, that is out in the FDA warning, the radiologists thought, oh, maybe this is a case of that very, very rare lymphoma, and failed to consider that, oh, this could be just actually silicone granulosis. So this went on to have a biopsy and it turned out it was just giant cell foreign body reaction. Another example is someone who were previously involved in a malpractice suit for missing cancer, and because of that, now recalls significantly more patients. The recall rate goes through the roof because you're reacting to what happened to you as a person, as a doctor. So interventions that can be used, for example, so this is that same case that I showed you earlier. The story behind why the radiologist didn't call back the patient in 2018 despite the growing mass is because there was a conversation between the radiologist and his superior that he was told his recovery was too high, that he needs to bring it down. He's calling back too many patients. They're anxious. They're upset. So because of that, it affected the accuracy and the diagnostic ability of that radiologist. So interventions that we could take to reduce availability bias include knowing how often each diagnosis occurred in nature. It will help you prevent overcalling or undercalling patients, and also to have access to benchmarks. If you know what your peers locally, regionally, and even nationally are doing, it helps you feel that you are reading at that level. So there's more objective comparison. This is particularly important for BRUS. We do annual MQSA audits as required by federal law. We look at the recall rates, the cancer detection rate, and even the biopsy yield for cancer for every single radiologist. And to have the ACR, National Mammography Database, give you access for that comparison. You can look within your practice, compare. You can look among your region by zip code. And you also can look across the nation to a facility with similar of your size and with a similar patient population to see how well you're reading in comparison. So the next bias is satisfaction of search. This was mentioned in Dr. Kaden's talk. This happens when additional abnormalities are missed after you find the first one. For example, this is a 28-year-old trauma patient that came in after motor vehicle accident. So the radiologist found brilliantly fast the first diagnosis. There's aortic dissection. So a patient went to thoracic surgery and got treated and properly cared for. However, there's also a mass in the pancreas that was missed. And this turned out to be a neuroendocrine tumor. So interventions that could be taken to prevent something like that will include report checklists, to have a macro in PowerScribe, to have your structural reporting. And the ACR has endorsed multiple different reporting and data systems. It started with breasts, the bi-rats, the most well-known. And now there's the bone rats, the CT colonography, the C-rats. There is the liver rats, the lung rats, the head and neck rats. There's also the ovarian rats, the prostate pie rats, and the thyroid tie rats. So this list of ACR reporting data systems has really helped us to create a checklist and to be able to communicate that clearly to our physicians. Additional interventions include your peer learning conferences, which could help you identify common misses among the group and be able to hold an educational session for everyone included. There are certain findings that are associated with certain syndromes and also computer-aided detection, which you'll hear more and also the application of AI in this specialty in Dr. Halleby's talk later. So the next bias is called premature closure. This is defined as when someone failed to consider additional differential diagnosis after making the first diagnosis. So for example, this is a 82-year-old male. He had a CT of the chest, and now we see a new pulmonary mass. He has a history of colon cancer. So the first guess was, oh, must be a metastasis from colon, right? So he went on and received more treatment, assuming that this was recurrent or advancing metastatic colon cancer, but it didn't respond. So the next scan showed a slightly bigger mass, and now they did a pulmonary biopsy. This turned out to be a primary lung cancer. This actually was not a metastasis. It was a primary lung cancer. So this goes back to what Dr. Kaden was suggesting. Having the top three differential diagnosis instead of just one would have helped in those situations. So interventions include decision tools to help you generate a list of differentials based on just the lesion descriptors. These are a couple companies, a couple softwares that have this capability, and particularly for breast radiologists, that feedback from the pathologist is super important because for every biopsy we do, we have to do radiology, pathology, concordance to decide if I sampled that lesion correctly. If not, I have to go back and repeat it or send the patient to surgery. So for us, it's super important to have an AI system from PowerScribe to help me do that close-up loop, right? I don't have time to go look up every single case. I want the system to bring this to me either as a message or a notification or email saying, hey, pathology is ready, go look at it. Or, hey, the recommended CT chest follow-up for your nodule is ready. So I think it's super important for us to have access to such tools. Okay. For the last bias, it's the anchoring bias. It's the common human tendency to rely too heavily on the anchor. The anchor means that first piece of information that you got. And you didn't stop. You continue with that. In light of the new information, we didn't adjust that first impression. So, for example, this is a 32-year-old female. She was found to have new axillary lymph nodes on the CT chest, pointed by that little arrow. She also has a history of lymphoma. So most common things are common. So this is probably a lymphoma, you know, either from her prior history, treated lymphoma or new lymphoma recurrence. So she was told to see her oncologist and came back in six months. The nodes got bigger. It didn't respond to the lymphoma treatment. It turns out she also had a breast mass that she didn't tell anyone. So it turns out this was actually a nodal metastasis from an undiagnosed breast cancer. So this is an example of most common, quote, being correct. Calcifications are very prevalent, very common on mammogramming all women. And in 2018, there are suspicious calcifications in this woman, but it was dismissed. And a year later, it came back as biopsy-proven breast cancer. So most common is not always right. So to decrease anchoring bias, we will avoid early guesses. Call for a second opinion if you need it. And when the symptoms get worse, we didn't respond to the treatment, you'll reconsider and come up with, you know, additional diagnosis. And there are tools available for this. And resist the temptation that most common is always right. So which of the following is not a cognitive error commonly seen in radiology? This is one of your five test questions today. And I'm going to skip quickly. All right. So take home points. These are the top five cognitive errors that's most responsible for the malpractice cases in radiology. And I appreciate your attention. Thank you so much. And I'm going to introduce our next speaker, Dr. Srinagari. He is an associate professor at Harvard Medical Schools and vice chair of clinical operations at Brigham and Women's Hospital. Thank you. Thanks for the invitation to speak here. So any diagnostic test is associated with some degree of uncertainty. And it could be test-related, such as limitation of the modality or variable protocols related to the report, such as report structure, missing details, or radiologist experience or expertise, or report language, variable terminology, expression of diagnostic certainty or follow-up recommendations. And we have to understand that the uncertainty cannot be completely eliminated. But we can try to minimize it. And we do that by creating best possible protocols or standardizing them or creating using structured reports or radiologist education. But despite that, some uncertainty always remains. And we express that by our own diagnostic certainty in the report. For example, you see a liver lesion. You're not quite sure what it is. You think it is a metastasis. And you could say, OK, this is most likely metastasis or probably metastasis. So why should we care about this diagnostic certainty? We know that we always say a picture is worth 1,000 words. And our job is really to take hundreds of images and convert that into a single text document. And in this process, oftentimes we make the diagnosis. We give differential. But sometimes the report or findings are expressed in a way that don't make sense to the referrers or patients. Or it's very hard for them to make decisions based on it. So the report is there, but it may not be actionable. Because creating such a report is an acquired skill. It's the most important output of any radiology practice. It requires training. And yet we don't spend enough time teaching our residents or ourselves about how to create such a good report. And then what happens when people are stuck in this diagnostic uncertainty? They resort to hedging. And that's what makes a report sometimes not very helpful for the referrers. So hedging and diagnostic certainty are really important for us to understand. So what is hedging? It's communication of uncertain findings using terms that are ambiguous, vague, or imprecise. So if you take this example, 2 centimeter liver lesion, this is a real report, can be reasonably considered to be of benign etiology such as cyst or hemangioma. But it is not completely exonerated, pretty fancy, of being a metastasis or primary liver malignancy. Now I'd consider this report as thoroughly vague and spectacularly unhelpful. Because really, this is not giving any direction to the referrers about what to do with this patient. And we should avoid this kind of unnecessary hedging and too many differentials. Giving one, two, three top differentials is great. But please do not list every known diagnosis to mankind just to be 100% right all the time. Because then you're not really being helpful. And that's where this concept of diagnostic certainty plays a role. It's a key component of any actionable ideology report. It helps our referrers make management decisions. And we use these numerous phrases to convey the diagnostic certainty. And we are going to see some examples. And this is pretty common. For example, 86% of abdominal CTs and MRs have at least one, probably more, diagnostic certainty phrases. And so if you're using all these phrases such as probably, highly suggestive, worrisome, concerning, not inconsistent with, all those things, do you all mean the same thing? Like in terms of your subjective diagnostic confidence, do all these phrases mean the same thing to you? Or do they mean the same thing to your referrers? For example, if I were to poll this room, if I say probably metastasis, how many people here think the likelihood of metastasis is, let's say, less than 25%? OK, I see no hands. Great. How many think it's between 25% and 50%? I see a few hands. 25% to 75%? I mean, 50% to 75%? I see more hands. And what about more than 75%? All right. What about, I say, consistent with metastasis? How many think this is more than, let's say, 75%? I see many more hands this time. So simply by selecting the term changes how we express our subjective confidence. And there are many studies that have shown that there is a significant disconnect between how different radiologists use these terms and what those terms mean to them and to their referrers. In our own study, we got this input from 142 radiologists. We asked them to pick their favorite term for each of these diagnostic certainty categories. And you'll see for each one, there are multiple options that people use. And to make things worse, many of these terms, some people use this, for example, probably, to express very low likelihood. And some people use it to express very high likelihood and everything in between. So if you're a patient or a referrer, and you see probably metastasis or whatever, you have no clue, you have no idea what the radiologist means by that. And that's a problem. So in our practice, we try to solve this by creating this diagnostic certainty scale, where we use these five standardized terms to express the diagnostic certainty. And our referrers really like it. In the beginning, some radiologists were concerned about it. I'm going to talk about this a little bit. But over time, people have really started using it. Our referrers like it. And we also provide a link in the report. And that website has pretty heavy traffic because it seems patients also go and visit this to look it up exactly what these terms mean. So after implementing this, we noticed the use of recommended terms went up pretty significantly. We also want radiologists to put the certainty scale at the bottom of the report for easy reference. And that has also been used, has been rising. So there was really good adoption of the certainty scale. And now many other centers have also started using this, either their own version or our version or anything that's available out there. Just for fun, I asked CHAT-GPT, all right, give me a list of terms which have consistent interpretation and variable interpretation. And this is what it gave us. So terms like maybe, probably, possibly, suggestive, suspicious, are variably interpreted, should be avoided. Our own and maybe existing literature affected CHAT-GPT. But our own guidance is that avoid these terms, probably, possibly, question of suggestive. These are not really good or reliable. terms. Just for fun, I also asked CHAT-GPT to create its own certainty scale. This is what it came up with. The point is, does not matter what you use, as long as you are consistent within your own practice, as long as your reference know what you mean, it's all good. What are the barriers? Individual expression. Radiologists want to have their own individual expression in the report, which is great. But once you start educating them about the problems that can cause, they are willing to change. Now, this change requires help. Any change is hard, and we need to support radiologists through it. And then there are medical legal concerns. What if I say this is high probability and I'm wrong? Well, that's possible. But the good thing is, anytime you are expressing, like, probabilities, you are acknowledging that it's not 100%. It's possible that you're not right. And our risk management team thinks that this is a safer approach for radiologists. And because you are using some kind of system, you're not dependent on the interpretation by your referrers or patients. Sorry, my voice is a little bit scratchy. URTR, that hasn't completely gone away. So, some strategies for change. Increasing awareness among radiologists. Training education goes a long way. Multidisciplinary consensus to secure buy-in. And creating expectations from radiologists. Giving them feedback. We use a lot of AI and LP tools for that. Inclusion of these initiatives in our QI and incentive programs. And strong leadership engagement, both in terms of authority and influence. Now, what are some of the next steps? This is a team from MIT that basically took this and converted these terms into, like, a visual representation of confidence distribution. So, you'll see very unlikely is here. Most likely is on the other side, close to one, or, like, high likelihood. But then, as a next step, they took a bunch of chest X-rays. And for each diagnosis, they looked at what terms the radiologists used. So, here, most radiologists, like, I'll talk about this a little bit more on the next, on the next, this part. But basically, for each diagnosis, they created what terms were used, and then what the diagnosis ended up being. And then, they correlated that with the follow-up CT scan to see how often radiologists were right or wrong. So, they found that for atylexis, for example, radiologists were often underconfident. Because they used lower likelihood terms when they were more often right. On the other hand, for infection, they were overconfident. They used high likelihood terms when they were more often wrong. Similarly, individual radiologists, some radiologists had overall lower calibration, and some had higher calibration. Some use sort of more sort of mid-level terms, maybe, or terms like that. Some use likely or most likely, and they were more confident. And then, the next step was they tried to create feedback for the radiologists. For example, they did not use any standard uncertainty scale. They just stuck to the terms radiologists were using. But they could say, okay, instead of using present, maybe tone it down a little bit, or maybe tone it up a little bit, and that is going to improve your accuracy. So, the idea, this is very early, sort of very early work. But the idea is that we can study the certainty phrases that people use, and use that to give them feedback to improve their own accuracy. So, in summary, diagnostic certainty is a key component of actionable radiology report. These terms are very commonly used. We should have clear expression of diagnostic certainty in our report. Do not hide in the hedges. Use a certainty scale to improve communication. Use one of the existing ones, or create a new one, but standardize it. And then, it's possible to measure and recalibrate uncertainty expression, either per radiologist or per pathology. Thank you. Thank you. Our last speaker, Dr. Halaby, is an associate professor at Northwestern University, and he's the Vice Chair of Informatics at Lurie Children's Hospital. And what my goal is, in this short time, is to talk about, obviously, to compound the errors, and talk about why AI is great at reducing variability, improving consistency, how I can help with early detection, and identify subtle abnormalities, and explore how AI can be leveraged for a mechanism for error flagging, second opinions, and reducing these ambiguous terms. And again, going back to the literature, this was published in 2000, and, you know, to err is human. It's one of the classic pieces that really brought attention to how many people pass away or die because of errors in health systems. And so, as we look at, how are we going to bring the quality and safety up using technology? And then, also talking about how technology may be also a source of increasing errors, and how do we reduce that? So, this was already discussed in detail, but I just wanted to reiterate how even there has been literature, up to 20 to 30% of more complex studies, like CTs and MRIs, that can, that lead to errors, or that there are errors in those reports. And also, inter-observer and intra-observer, so not only do we differ between radiologists, but also with our own self. If you present the same study a week, or two, or a month later, that there may be variability in what we report out. So, the factors that contribute to those errors, all was discussed before, the human error, the workload. So, all of you are seeing increased volumes. You're, we're doing more, there's scope creep, there's MRI creep, you're getting, doing more sequences, we're doing more imaging. And as we get into more and more AI quantitative assessment, we're going to be gleaning much more information out of imaging, including bone density, sarcopenia scores, what are we going to do with all this information? So, the workload is increasing, the image quality, so I work in a pediatric setting where we have movement, you know, how do we use AI also to reduce noise reduction, and movement errors, and maybe reduce even anesthesia time that we need for patients. And then these cognitive biases, like in, you know, influencing the radiologist's interpretation of images, which was really eloquently discussed by my previous colleagues here. Now, this diagnostic consistency, these AI algorithms, they don't, they don't sleep, right? They act the same at 9 a.m. versus 9 p.m. So, these algorithms, and again, there's so much emphasis on this in the past five years, and in the presidential addresses, and in all the preliminary talks at this meeting, is all about reducing this error, right? Analyzing lab data, having consistent criteria, consistent terminologies, reducing the inter-observer variability amongst radiologists and even pathologists, and then treatment planning, using evidence-based treatment protocols. So, this goes beyond just the radiology diagnostic information. So, data guidelines, clinical guidelines, standardizing care across different providers and locations, the risk assessment, you know, treating each patient equally. Like, we have biases. We have biases based on what we see, who we're working with, what time of day it is, and whether or not we had coffee that morning. We work differently at different times. And how can AI level that playing field? Clinical documentation, which was, again, discussed. So, using NLP, standardizing how we use information, how it's recorded, how it's indexed, how it's coded, and how it's communicated. So, how do we reduce that documentation variability, too? So, consistency in measurement and classification. This is a straightforward one, right? Like, we, you know, we do bone ages on a daily basis in a pediatric setting. But how do we have the same reader, or the AI system, have consistent feedback? So, you're not getting the second-year resident versus the attending that's been working for two years versus the attending that's been working for 20 years, that this levels that playing field. And measurement and classification, these mundane tasks of measurement can be, you know, significantly reduce the errors and inconsistency with AI. Tumor response. So, counting those nodules and lesions, volumetric measurements of lesions and metastases and sizes and number of lesions. We're seeing more and more of the feasibility of using artificial intelligence to track tumor response or tumors over time. And I'll talk a little bit more about even, like, MS plaques and other things that we do temporal assessment of. But, as many of you know, AI does not look at comparisons. Or, you know, at least the current AI tools, many of them do not look at differences over time. Automating annotations. So, lines and tubes, ICU films. It is a mundane task. We do a lot of imaging in the ICU setting. But how can we label, like, where the end of the endotracheal tube is, or automating that, or populating the report with all of these features, where the PICC line ends, where the, you know, the umbilical arterial catheters and umbilical venous catheters are lying. So, you can see here, we can start to have that copilot helping us along the way to reduce those errors and to do it much, much more efficiently and consistently. So, this is just, again, this is the lines and tubes of showing how the AI can really delineate where these lines and tubes lie and can also report out where these, you know, where these live. Pattern recognition. You know, tissue texture density, early stage tumors before the visible eye, you know, visible to the human eye, and flagging minor asymmetries in bilateral structures, which goes right into the whole CAD aspect of breast imaging, or musculoskeletal, where you have a comparison, or even brain imaging, where you have a built-in comparison of side to side. And we're looking for subtle differences. And in even getting outside of imaging, or even what we term as imaging, looking at even things like heat maps. So, for instance, reading facial temperature for early diagnosis of metabolic disease. We're hearing more and more of using AI for movement, for even subtle changes in even the retina imaging, retinal imaging, where it's detecting dementia, or even looking at prediction of potential myocardial infarction and other vasculitities. So, again, looking at AI not as a sort of a harbinger or a canary in the coal mine for looking for other diseases. And we do have to think outside the medical imaging paradigm, and including these other areas where AI can be influenced. Temporal comparison, I mentioned that briefly. Like, again, comparison across sequential imaging studies, identification of gradual changes that may escape visual inspection, and tracking of subtle disease progression. How many of you have actually working hanging protocols in your PAC systems today? That work every time with every modality, side by side comparisons. Okay, now a few hands went down. But we even have trouble being able to do temporal resolution and looking at exams over time. For example, again, this is the MS lesion plaque tracking, is why not automate this and have volumetrically display it, and also track it over time so that can be very quickly assessed, but not only by the radiologist, but also the referring physician and patients can see very clearly how their disease is progressing or hopefully staying stagnant. Improving image quality, using AI, again, to reduce errors in, if you reduce the errors in the artifacts, or you reduce the artifacts, you're getting better diagnostic imaging, which will ultimately help reduce those errors. And we can see here, you know, a very noisy image becoming cleaned up with AI, or even having a motiony patient, or, you know, not enough radio tracer or contrast that the AI enhances. This is another example of improving image quality, where you can see a lower dose of not only the nuclear tracer, but also lower scanning time can improve with using that signal to noise ratio, and FDA cleared AI tools to do that. Now, the clinical documentation, again, eloquently discussed about how much ambiguity there is in hedging, there is in waffling, there is, I don't know, there's so many terms we can use for how radiology reports come across, but it is a knowledge resource, but how do we use large language models and generative AI to, you know, create differential diagnoses, or, you know, based on the description of imaging findings, so just based on what you say, it's giving you a differential diagnosis, or using references like nomograms and cancer staging schema. How many of you have to look, like, have that binder to go back to whatever pretext, or whatever, you know, cancer staging system you use to remember how to do it, or even a trauma, like the AST criteria and other things that we try, we reference all the time, and then the synthesis and simplification of radiology reports, so standardization of report terms, removing unnecessary or ambiguous terms, and automatically generating impressions and recommendations. So, this is, again, this is older CHAT-GPT 3.5, give me a differential diagnosis for a loose interlytic lesion in the proximal tibia in a child, and so this will allow us to help not omit certain potential diagnoses that the clinicians and us as radiologists would omit if we did not have this list, or were not very aware of what that is, or even simplifying, like, here's a revised version of the radiology report. So, we're seeing a lot of vendors, if many of you are going to the vendors of speech reporting vendors, they're not only looking at things that mismatch in your report, but also seeing whether or not you're using the AI recommendations, and seeing if there's a mismatch between are you using the AI recommendations in your report or not. So, that's also validating and looking for errors in the AI algorithm. So, one of the big controversies, and I think this is going to be very interesting, there are a few papers coming out, is that should we prospectively and retrospectively evaluate for diagnostic errors using AI? So, for example, a colleague looked back at cervical spine x-rays, and let's say, you know, over a year, and they found there was a seven percent miss rate with the AI tool of detecting fractures, and they reviewed every single study, and then do you report those errors out back to the radiologist? Do you start grading people based on their diagnostic abilities and recreating report cards? Will this increase the risk of litigation for radiologists and health systems by retrospectively? You know, as a patient, I would want somebody to maybe look back at my CT scan or MRI to look for errors that now they're able to find, or diagnoses that they're able to make now based off of that. It's akin to, let's say, you get DNA, 23andMe, and they're able to test for new panels. Would you want to know that new information, whether or not you're going to develop Parkinson's disease, Huntington's disease, or other neurological things? Would it be unethical not to use these new tools to assess for errors? So, this is going to fall under the quality safety rubric, and we're going to have to work both with the informaticians and quality safety, risk management, legal, privacy. There are many issues. It's almost a Pandora's box that opens up here. We'll be rating radiologists' performance against AI. So, those are all the things that I really wanted to discuss today, and hopefully we'll have more questions. But is there better diagnosis with AI? It is possible. We already know that. There's better diagnosis. Faster? Yes. Earlier diagnosis with AI? Yes. Better reporting? And we're already seeing those tools come to fruition. And so, those are my objectives. Thank you all.
Video Summary
The RSNA session discussed strategies for reducing diagnostic errors in radiology, exploring tools like driver diagrams and models for improved patient safety. The driver diagram aids in identifying primary and secondary causes of diagnostic delays to propose effective interventions. Key components contributing to errors include missed findings, misinterpretation, and communication gaps, with interventions focusing on raising awareness and utilizing AI. Cognitive errors like framing, availability bias, and premature closure contribute significantly to radiology diagnostic errors. Structured reporting, standardized terminology, and improved patient history documentation are recommended to mitigate these biases.<br /><br />The session underscored the importance of diagnostic certainty in reports, avoiding vague terms like "perhaps" or "possibly." A standardized diagnostic certainty scale can enhance communication and actionability of radiology reports, fostering referrer trust. Attendees discussed AI's potential to enhance diagnostic accuracy by minimizing variability, improving image quality, and aiding in standardizing reporting. Challenges remain, including whether AI should retrospectively assess past diagnostic errors, with implications for medical legal concerns and radiologist performance assessment. Overall, the session showcased strategies and emerging technologies aimed at enhancing radiological diagnostics through consistency, accuracy, and improved communication.
Keywords
diagnostic errors
radiology
driver diagrams
patient safety
AI in radiology
cognitive errors
structured reporting
diagnostic certainty
standardized terminology
RSNA.org
|
RSNA EdCentral
|
CME Repository
|
CME Gateway
Copyright © 2025 Radiological Society of North America
Terms of Use
|
Privacy Policy
|
Cookie Policy
×
Please select your language
1
English