false
Catalog
Artificial Intelligence in Breast Imaging (2023)
WEB40-2023
WEB40-2023
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
On behalf of the RSNA, welcome and thank you for joining us for today's webinar, Artificial Intelligence in Breast Imaging. My name is Manisha Baha and I'm a breast imaging radiologist at the Massachusetts General Hospital. This webinar is one of the many ways in which the RSNA supports radiologists at every stage of their careers by offering educational resources and tools to increase knowledge and foster innovation. Today's webinar will be recorded and available on demand in the RSNA Online Learning Center. After the presentations, I will moderate a Q&A session with the speakers. We would like for this session to be an engaging and interactive discussion. So please type your questions and comments into the question box and we will respond during the live Q&A portion of the webinar. You can also interact with speakers and other participants by sending messages through the chat feature. The following slides include important CME information. Please take a moment to review each slide. After attending today's webinar, be sure to click the link in the resources panel to complete the survey to receive credit for your participation. This webinar offers one AMA Category 1 credit for CME. Please review these disclosure statements. Next are the accreditation and designation statements. And last is the RSNA disclaimer. In this webinar, Dr. Emily Conant, Dr. Despina Contos, and Dr. Sophia Zacherson and I will discuss the use of AI algorithms for the detection and diagnosis of breast cancer on mammography. Listed on this slide are the learning objectives for today's webinar. We would like to begin by understanding our audience through a few polling questions. About 40% of our attendees are from the United States, 28% from Europe, 57% of us are currently using AI algorithms in our practice, and 58% of us are using tomosynthesis in our practice. I'll begin by providing an introduction to AI in breast imaging. My disclosures are listed on this slide. AI for radiology has rapidly advanced from feasibility and reader studies to clinical implementation. There are more than 250 FDA-cleared AI applications for radiology, including more than 20 for mammography. In the next 10 minutes or so, I'll review key AI terminology and concepts. We will begin with the definitions of AI and its subfields. AI is a branch of computer science dedicated to developing computer algorithms that imitate intelligent human behavior, such as learning and recognizing patterns. AI is a broad umbrella term that encompasses natural language processing, expert systems, robotics, speech recognition, in addition to machine learning. Machine learning refers to computers learning from data without being explicitly programmed. The machine learning algorithm is developed to maximize the fit between the input, such as images, and the output, such as classification, and it can then be applied to new data. The machine can learn through supervised or unsupervised learning, which we'll discuss in a moment. Deep learning is considered to be a subfield of machine learning that relies on neural networks with multiple layers to progressively extract higher-level features from raw data, which we will also discuss in more detail. In supervised learning, the algorithm is provided with labeled data. Two examples of supervised learning are classification, in which the output is categorical or a class, and regression, in which the output is numeric or continuous, like age or weight. Supervised learning requires a large amount of data for learning and thus computational power, accurate labels, and an agreed-upon definition of the ground truth. In unsupervised learning, the algorithm is provided with unlabeled data, and the machine learning algorithm clusters or organizes the data to uncover underlying patterns. Most AI models in breast imaging utilize supervised learning, in which the computer is provided with labeled data. For example, when developing an algorithm for breast cancer detection, the mammographic images provided to the computer are labeled as positive or negative for breast cancer. While a traditional machine learning algorithm would rely on human-engineered or manually designed features based on clinicians' knowledge and experience, the algorithm would rely on Deep learning algorithms learn the features that are necessary to classify the mammographic images as positive or negative, improve with exposure to more data, and have the potential to discover features and relationships that are currently unknown or imperceptible to humans. Deep learning relies on neural networks, which resemble the connectivity of neurons in the brain. In a neural network, there are layers of connected nodes, each of which receives inputs from other nodes. Network architectures with numerous and large layers such as this one are deep learning neural networks, as opposed to shallow learning neural networks with only a few layers. Convolutional neural networks are the most common type of neural networks used for image analysis. A convolutional neural network as seen here has an input layer, hidden layers which extract patterns within the data, and an output layer. Deep learning relies on these neural networks with multiple layers to progressively extract higher-level features from raw data. The algorithm may first learn to recognize pixels and then edges and shapes. It then learns to identify more complex shapes and features. Better performance of neural networks is generally achieved with deeper architecture and with exposure to more data. AI algorithms require both internal and external validation. Internal validation refers to validation of a model using data from the same source as the training data, and external validation refers to validation of a model using data from a source that is different from the training data. A common internal validation method is random split, in which the data set is randomly divided into a training set, validation set, and test set. Two methods used to evaluate model performance are confusion matrices and receiver operating characteristic curves. A confusion matrix provides information about the classification performance of a model on test data for which the true labels are known. The information is presented in table format, in which each column represents instances of the predicted label and each row represents instances of the true label or vice versa. In this example, when the true label is 2, the algorithm correctly predicted the true label 2 in 87% of cases. Using the table, the reader can easily visualize if the model is confusing two classes. An ROC curve is another method by which to evaluate model performance. The curve is plotted on a graph in which the x-axis is the false positive rate and the y-axis is the true positive rate. Each point on the ROC curve represents a different decision threshold with trade-offs between false positives and true positives. The accuracy of the test can be summarized by the area under the curve or AUC. The green ROC curve is a perfect classifier with an AUC of 1. An ROC curve with an AUC of 0.5 represented by the red dotted line represents a random classifier. Models with AUCs above 0.5, as represented by the blue curve, have at least some ability to discriminate between classes. Certain models, such as those based on neural networks, are considered black boxes in that the imaging features or patterns used by the model to make predictions may not be readily evident to radiologists. One method to address this lack of interpretability is heat maps or saliency maps, which are used to indicate the most salient regions within images and thus draw attention to the specific regions that contribute most to the corresponding output by the model. Explainability of AI algorithms is one of the many issues that arise when we begin to discuss the ethical use of AI in radiology. A joint European and North American multi-society statement discusses how we have a moral obligation to consider the ethics of how we use and appreciate data, how we build and operate decision-making machines, and how we conduct ourselves as professionals. I'd like to briefly comment on one of the ethics of practice issues, which is automation bias. Thank you to Dr. Conant for bringing this very recent article published in Radiology to my attention. The study involving 27 radiologists who each interpreted 50 mammograms found that radiologists at all levels, but particularly inexperienced radiologists, are prone to automation bias, in which we tend to favor the decisions of the AI algorithm even if the algorithm is incorrect. Automation bias and other effects of human-machine interaction must be considered to ensure safe deployment of AI and accurate diagnostic performance. At present, AI applications for breast imaging are at various degrees of maturity and availability. In the next presentation, Dr. Zachrisson will discuss AI applications primarily for 2D mammography from a European perspective. Then Dr. Conant will discuss AI applications for tomosynthesis from a United States perspective. I will turn it over to Dr. Zachrisson, who is a professor of radiology at Lund University. Thank you so much, Dr. Bahl, for this great introduction, and hello everyone. Good evening from Sweden. Perfect. So, just as Dr. Bahl said, I am about to talk to you about artificial intelligence in breast imaging and the clinical applications in mammography from the more European perspective. These are my disclosures and some learning objectives that you've already seen, I hope. So, just to introduce you to all the many possibilities that we actually do have with AI in breast cancer screening, you can make this list even longer. I just put down some of the very useful things that you can do with AI in breast screening, everything from predict the future breast cancer risk down to actually reducing your workload at your institution. So, in my talk, I will focus on these blue lines here, from normal cases identification, increases in sensitivity, specificity, or using AI standalone or as a decision aid and the workload reduction, and also give you some highlights of the very recent prospective trials from Europe that have just been published. And just to get you a little bit of insight in how we do screening in many countries in Europe, we have double reading, as you might know. So, in our case, AI is the perfect partner to swap out one of these radiologists and to replace that one. So, that could be one way of using AI in our screen reading environment. Normally, in many cases, we have a kind of consensus discussion if there is some disagreement in the rating of the images, either between human readers or an AI reader. And then of course, if there's any still positive assignments there, we do the assessment and we have the final diagnosis. So, one scenario is that you would replace one of the radiologists with AI. So, a human radiologist is the first reader and AI is the second reader or vice versa, whatever you like. Another option would be that you actually triage by first applying AI to your images or your examinations, and you get the risk score, for example, for the images. And then, depending on the risk score, you actually decide whether this examination should be read by one or two radiologists. And these are just a couple of ideas that has been implemented or tried out, I should say, in some of the European trials. So, I'll start by going through some of the retrospective studies on AI and mammography. And the first studies that came out, they were often single-center, single-vendor studies, and also on cancer-enriched datasets. But then, as time evolves, we saw more multi-center, multi-vendor trials, and also using real-world screening data. Because, as you know, in a true screening situation, most women are actually healthy and don't have breast cancer, while a breast cancer case is a rare event. So, this is usually the way you have to go through when you're trying a new technology, but you have to sooner or later move to something that looks like the real world. But you should keep this in mind when you read the studies and to understand whether this is a cancer-enriched dataset or not, for instance. So, one of the first trials that came out from, or not a trial, but a study from Europe, is from a Dutch group, where we were also participating with some reader data. So, they combined a lot of different sets of reader studies from all over the world, and they ended up with 101 radiologists in this dataset and thousands of readings on very many images. And as you can see here, the RC curve between one AI system and these, sorry, 101, try to go back, sorry, breast radiologists, they almost overlap. So, there was no significant difference between the AI and the average breast radiologist. But if we looked into a little bit more of the details, we could see that this AI was better than 60% of the breast radiologists. And fortunately, I should say, or I don't know if it's good or bad, AI was always worse than the best radiologists. So, there is still some hope for work for us in the future. That's good to know. Now, this was a cancer-enriched dataset. But in the same dataset, we also looked at if we could reduce the workload with AI by sorting out normal cases, actually, potentially in a future scenario, not being read at all by a human reader. So, these are AI scores for all the exams, and the normal exams, benign exams, and malignant exams that are here on the bottom road. And depending on, of course, where you put your threshold for the AI score, in this case, if we put it like for 5 out of 9 or 10, you would reduce your workload by 47%. But you, in this case, would miss 7% of the breast cancers. Another scenario, if you push it a little bit lower to 2, you would miss just 1% of the cancers and reduce your workload with 17%. So, this is, again, like we always have in screening, it's a matter of balance, you know, how much do you want to find? How much can you actually accept to not finding it in this screening situation? So, this is perhaps not realistic right now, but it's still an interesting thought, right? And then we have to remember that AI is not perfect. I mean, just as a radiologist, it can misinterpret things. So, this is just a case from one of our publications where the AI score was set to 3, which is a very low score, but it is a small 7-millimeter invasive cancer, as you see here. So, after these studies, there were quite a few studies published also from the U.S. side. So, this contains datasets from both U.S. and Europe, and it was a Google-based AI that was tried out for different scenarios in this case. And this AI showed that it actually can end up with less false positives, but also less false negatives, the interval cancers. But depending on what reader scenario it was tested in, either a double-reading scenario from U.K. or a single-reading scenario in the U.S., it had, you know, different differences in performance, as you can see here. But now, this is already some years ago, and I would say that most AIs have developed a lot. So, I'm not so sure that these differences actually still are there. So, we tried out in a pure screening population the same idea about sorting out normals, and we could find almost 20 percent workload reductions without any cancers missed, and also a small reduction in false positives. And then my colleague, Dr. Christina Lange, she tried it out on a big set of interval cancers from our region, and she could show that actually could AI detect 58 of the interval cancers that then were missed by the radiologist in the previous readings. So, there certainly is a great hope that this technology can help us in some way. And then more and more studies came out of this kind, and as you can see here, the more later publications, they have even better performance of sorting out normal mammograms. A study from Stockholm by the colleagues sorted out 60 percent of the normal mammograms, and that the AI also could find the future cancers, both interval cancers and the next round cancers, to a certain extent, much earlier than human readers. And similar things have been found in a study from Copenhagen in Denmark. So, I mean, what I just showed you is all based on retrospective data. So, we need to have some sort of reality check. How would this actually work in a true screening situation? So, that's what I'm going to go into a little bit more now, because what we need is, of course, some prospective studies showing more of the interaction between AI and the radiologist, and not just AI as a standalone reader. And if we see a future with AI in screening, we need also to think about how to validate which AI I should use in my screening setting, and how I actually can validate it over time if I change my equipment, for instance, and then not the least the legal aspect, as has been mentioned before. So, I'll highlight some of the recent studies that has come out just now, some of the prospective trials. And the first one has been performed in Malmo, where I work, by my former PhD student, Dr. Christina Lange, a very nice trial called the Masai trial. And she did more of this triage design. So, in the intervention group, it was a triage by a risk score. So, if there was a risk score 1 to 9, it was single reading plus AI, but if it was a score 10, which is very high probability of cancer, the images were still double red. And this was then compared to a control group with standard double reading of mammograms. And this has been performed in 100,000 women in population screening. And she could show here that there was a similar cancer detection rate, similar recalls, and false positive rates. And she could achieve a 44% reduced workload by using this algorithm here. So, the conclusion of this study is that, I mean, there's a similar performance between a triaging to single reading with AI for low risk cases that can then arrive at a 44% reduced workload. And the conclusion here is that AI is safe. So, the final results of this trial will come later on because the outcome is actually interval cancer rates. So, she will also look into if AI can detect more cancers and the relevant cancers compared to the human readers. And then here, you just see some of the metrics, the cancer detection rate and the very low recall rates where we operate in Europe around 2.2%. So, it's very, very low. So, great study. Another study that just was published is the ScreenTrust-CAD trial from Stockholm by Karin Denbrover and Fredrik Strand, my colleagues. And so, this is another situation that I showed earlier. So, it's a comparison between the standard of care, two radiologists are reading the images, or the new strategy, AI and one radiologist. And this is just some of the results that Dr. Denbrover provided me with. And you should concentrate on the blue bars here, which are the double reading, which means AI plus one radiologist. And you can see that it flagged a little bit more than double reading by just two radiologists. But in the end, the recalls were actually a bit lower. Biopsy rates were okay. And with this strategy, as many or more cancers were actually detected with AI plus one radiologist compared to double reading with two radiologists. And they could also show that they stayed good on detecting advanced cancers. And this is just a table showing some of the metrics also in this trial. One is, if you use single reading by AI alone, you would actually have an even lower recall rate, but even higher or as good cancer detection rate. Now, this is not a scenario that we can use right now, but interesting. So, great results from the Swedish trials. And actually, the Stockholm trial, they implemented their AI from this summer. So, they are now replacing one of the two radiologists. But they also make sure that they have experienced radiologists in the consensus discussion, because that's how we can keep the recall rates at good level. And they also have a detailed follow up, of course, how this will proceed. So, we'll probably see some nice things coming up there. The Stockholm colleagues also were some of the first who actually compared or benchmarked three different AI algorithms on a retrospective data set showing that some AIs are actually better than others. So, that brings us then into the validation part, which will be the last part of my talk. So, because this will be important if you consider to implement AI in your setting, how would you know which AI to pick? Because we work in different clinical settings, we have different kind of workflows, we have different equipment and models, image processing, and not the least, what happens with the equipment over time with the detectors, for instance. And then our populations differ, we might have different ages, densities, ethnicity, and underlying breast cancer risk. So, how can we get to know which AI to choose? So, I guess in many countries, there's been a call for some kind of validation platform. And I'm just showing one of ours in Sweden that also I have been part of building up. It's a technical platform where we can have the regions actually push in their own images, and then we can apply different AI on them and give them back some information without disclosing the exact relation between the AI companies, but still just saying that this is a good achievement or a good enough achievement for this and this vendor, for instance, on your specific images. So, you can read more about this in the publication that I'm citing here. So, and also, I wanted to provide some more reading. There's a great review just out about AI, both for mammography and digital breast tomosynthesis. And this is something that Dr. Conant soon will talk to you about. She's on this great paper. And then that was also just the other week, this overview of trials of AI in breast imaging, and then also a very nice roadmap on how we should plan for both studies and implementation of AI in breast imaging for the future. So, I really recommend these two articles. So, to sum up, I'm well convinced that AI mammography screening is here to stay, but we have to keep track of the quality assurance, how to map that, the ethical and legal aspects. And I do see a great potential in the future for AI and risk prediction. And I guess that will be partly covered in the last talk of tonight or today. So, thank you so much for your attention. And I'd like to leave the floor to Dr. Conant. So, please. Hi there. Thank you so much. Thank you so much for that great start introduction. Let me set mine going here. So, I'm going to really talk about what's becoming the standard of care in the United States, digital breast tomosynthesis for screening and the applications of AI there. Some financial disclosures. Learning objectives here is really about tomosynthesis and screening and how AI may help us. Certainly, with the accuracy of interpretations, we all know and sometimes see our false negatives, our false positive rates. We certainly have a higher recall rate than Dr. Zachrisson was speaking about in the United States, possibly due to different medical legal climate. But there's certainly room for improvement. Also, efficiency of interpretations. She mentioned beautifully about the idea of triaging cases by likelihood of malignancy and even provocatively standalone. Are there cases that can be read just by the AI and not needing a radiologist for review? And then I'll speak briefly about some risk assessment study using some AI algorithms and looking at beyond breast density to set the tone for our last speaker or for Dr. Despina-Cuntos, who's going to speak to us about radiomics, etc. So, I think always we need to be grounded in our need to improve patient outcomes. It's not only about our efficiency and things like that. It's really about our patients. So, what about accuracy and efficiency? Here's a paper that I was involved in. It was back in 2019 and it was a very enriched study testing with 24 readers. Look at the number of cancers and those that actually went for benign biopsies out of the 260 cases. And it was a commercially available tomosynthesis-based algorithm. And the typical endpoints were looked at. Sensitivity, specificity, reading time, and AUC. You can see here the red line is the average with the AI algorithm versus the dotted blue line without it. So, the AUC actually increased statistically significantly in this very enriched study about almost 6%. Sensitivity, 8.4% increased. I should mention that I'll show you a sub-analysis. We actually clumped the readers into two groups, whether they claimed that they were breast specialists, meaning pretty much did only breast, versus those who were more generalists and did other things as well. Interesting, the sensitivity, 21 of 24 improved sensitivities. It was greater for the generalists. The specificity improvement, almost 7%, was actually better for the sub specialists. The recall rate went down, not as much as Dr. Zacharyson's recall rate in Europe, but it went down for us and the reading time was decreased almost 53% and the greatest reduction was for the subspecialists. And here's sort of a dynamic graphing of each of the readers represented by a circle. The size of the circle is their reading time and you'll see this is their graphing of reading time without AI and then it will dynamically change to reading with AI and you'll see their migration from their AUC without AI to with AI and the size of their circle decreasing or increasing depending on reading time. So here we go. Look at them, isn't that lovely? They all move up into the upper left as you know, that's an improvement in sensitivity and specificity and in general, the circles get smaller, meaning a decrease in reading time. Certainly there are limitations. This was a very enriched retrospective study and we need a more diverse data set. Here when we grouped by the specialists, subspecialists and the general radiologists, the sensitivity, here you see the general radiologist sensitivity increased much more than the subspecialists and when we looked at the specificity, interesting, the breast subspecialist specificity improved more with the AI than the general radiologists. So reading time was greater for this decrease for the subspecialists, 56% versus the generalists. This is a review of multiple studies, including the one I just presented that I think is important, looking at efficiency reading time per TOMO study. You know they take longer than 2D alone and the AUC of those reads with and without the AI and what you can see when you look at the reading time, most of them showed a decrease in reading time. There was one that did not, went slightly up from 45 seconds to 48 seconds and when we looked at the AUC, most of these showed improvement. There were some that were pretty similar and I'm talking about the third decimal point out here is the difference. So very similar. Certainly this calls, these were enriched studies, retrospective reader studies and we need prospective clinical trials to really better evaluate this. So what about this triaging idea, triaging cases away possibly for standalone reading by AI or for single reading, not double reading in Europe, or just maybe for the end of the day when I don't have my cup of coffee in my hand. And this is a study that we're writing up. This was actually presented at the ECR and this was looking at identifying low likelihood of cancer TOMO synthesis studies and it was a retrospective. It was a European and U.S., mostly U.S. sites and four vendors, so different vendors, 506 vendors, again, very enriched and really looking at the effect of adding different aspects like the age group of the patient, knowing that cancer becomes more frequent as we get older and looking at the impact of breast density, non-dense versus dense. And so I'm just going to show you some of the outcomes here, which I think are very interesting in terms of dividing these groups up and adding these factors of age and density. So here's the overall, all vendors looking at the case score and here where there were no false negatives, so no cases that ended up being cancer compared to readers alone without AI. Here's the AI looking at alone. One could triage, oh, that's about 30% of cases. However, look at this. When you add breast density, you could triage more. Now, doesn't that make sense? Because the dense breasts are more complex, more likely to have false negatives, et cetera. And then when you add age, again, separately, just case score plus age, the triage went up to a little bit above 50%. And if you combine score, age, and density all together, here we are all the way up there, and that's about 59%. So this tomosynthesis triage with zero false negatives relative to readers, that is, of course, readers have false negatives, but this is relative to the readers reading independently, triaged out approximately 33% of the screening exams. When we added age and density, the triage rate increased significantly. But of course, again, as was emphasized before, prospective trials are so important because this is a reader study. It's not like our everyday work. But here's an example. So here's the 2D dose images, and here you can see the tomosynthesis. She was considered scattered, and she had an 8-millimeter cluster of calcifications seen on both the CC and the MLO. The case score was only 23%, and the marking of the program was only on the CC view. I hope you can see faint calcifications associated with sort of a nodular little area. It was only marked on the CC view. I'll show you that here at 17%. This particular algorithm shows you the slice and the likelihood of malignancy at our triage of zero false negatives relative to the readers. This would not have been triaged out, so it would still have been included. However, at the next triage state, it would have been triaged out. It was a small microinvasive DCIS, a microinvasive ductal carcinoma with DCIS associated. Dr. Zacherson mentioned this. This was a meta-analysis of standalone evaluations of both 2D and digital tomosynthesis cases, and this was not including prospective trials that are ongoing. These were all retrospective. There were 16 studies of which 6 were reader. There were only 4 which were tomo, but there were over a million screens in about half a million women. It was very promising and interesting, but there were things that were lacking in this large meta-analysis of all of this, like cancer biology, the subtype of cancer, the interval cancer rate, and the results by individual vendors and versions, et cetera, but here's the data. I know it's small and hard to see, but if you look here of all studies, this includes the 2D and the 3D, you can see the sensitivity improved with AI and the specificity went down slightly. If we look at the tomo only, remember only 4 studies, the sensitivity went up a lot and the specificity went down somewhat as well. Here are the AUCs. While they look better, again, they're a balance of the sensitivity and specificity. There were some cases that were comparing to reads. There were actually historic priors to compare, so that's important, and the AUC certainly did not improve in those because the radiologists had the advantage of priors for stability, et cetera. In the four tomo studies, the pooled AUCs for the standalone AI was better than the radiologists. That was an international collaboration. What about risk stratification? I think you're going to hear more about this in detail by our next speaker, but I just wanted to share with you some of the work that we've been doing in an international group looking at leveraging tomosynthesis data from the AI algorithms to predict risk. The question is, who after a negative or non-actual benign screening interpretation will actually present with cancer before the next exam? We were looking at one-year intervals. This was U.S. sites, three different tomosynthesis vendors, 805 cancers in a random sampling and a nested case control study. Basically, 45% of the women in this, again, study were low-risk, 14% were deemed high-risk, and this was using a sort of adapted USPSTF five-year risk model. Of the women with cancer diagnosed before or at the next screen, 50%, based on this algorithm, were considered high-risk, and in that high-risk, the cancers diagnosed included advanced stage, while 59%, 60% were stage 0, but 76% were stage 2 or stage 3. Perhaps this kind of algorithm could help us better decide who needs supplemental screening, and I'm a big advocate of MR. Here are the AUCs. Here's the typical lifestyle and density type, like a Tyre-Cusac A, that includes density algorithm and its prediction, its AUC. Here was the 2D alone prediction using the data from the 2D images, and here is leveraging the data derived from the radiomics of the tomosynthesis. So compared to lifestyle and density risk algorithms, tomosynthesis risk was about two times more likely to identify the women who, in short term, may develop breast cancer. So I just want to talk quickly about large language models. What about CHAT, GBT, large language models? It's all over the press and news, and I think it's very, very interesting. Here is a paper, and I want to call out the middle author there, Dr. Ball, who's leading this discussion today, really looking at how a CHAT large language model, CHAT-GBT, could answer some of the questions regarding breast cancer screening prevention. So what if this was embedded in your EPIC or algorithm so that patients could interact with this? And here, of course, is my AI algorithm. Here's a woman. I like that. And the questions like, what is breast cancer? What is my personal risk? How should I be screened? How often should I screen? What should I supplementably be screened with? And these were the examples of the questions, and you can see the majority of these were appropriate answers graded on the response from the CHAT-GBT. There she is again. However, unreliable responses were, how can I prevent breast cancer? That's a tough one. How can I get screened for breast cancer? That was unreliable. And this one was based, this was done during the COVID time. How do I plan around my COVID vaccine? And that was considered inappropriate. Anyway, CHAT-GBT gave the appropriate response approximately 88% of the time. So I think it's very, very interesting. Not 22% of the time, or I'm sorry, not all the time, but 88% of the time. So where do we stand? Next steps, large, useful, heterogeneous, not so enriched databases. Diverse populations, so important because we live in such a diverse world, and we need to really take advantage of that. Prospective trials, as I think we heard from Dr. Sackerson's lecture. And transparency in algorithms, and I think important is the collaboration, so we can improve reproducibility and generalizability. And standardized and benchmarking of outcomes, and always we have to consider ethical and legal ramifications for safety and regulatory. So thank you so much. I'm very excited to announce the next speaker, who's a colleague of mine, Despina Contos from the University of Pennsylvania. She's the Associate Vice Chair of Research, and she leads our Center for Biomedical Imaging, Computation, and Analysis. So take it away, Despina. Thank you. Thank you, Emily. Thank you for the kind introduction. And I'll switch a little bit gears, but perhaps not exactly, looking at other ways to quantify imaging data using radiomics and new future directions, especially in the context of breast cancer risk assessment that Dr. Conant just spoke about as well. And it's all related to machine learning and AI, but not perhaps in the way we're mostly used to hearing about it late. And so with that in mind, some financial disclosures for institutional research funding. And I'll go back to discussing a little bit about breast density, which is a known risk factor for breast cancer, and has been conventionally quantified from 2D mammographic data in the last 30 years of research around breast density, which has demonstrated that there's an increased risk of developing breast cancer for women with highly dense breasts. But so far, perhaps the easier way to quantify something like breast density has been to create a binary percent measure from a 2D mammogram and evaluate those measures in risk prediction. Now, with digital imaging, and especially with three-dimensional digital imaging data, we have an opportunity to quantify breast density perhaps more accurately, both from 3D digital breast homosynthesis, but also from 3D breast MRI data. And now there are several algorithms available that can do that fully automatically, quantitatively. Some can be integrated in the clinical setting, some are available for research purposes. And the interesting thing along these lines is that, for example, I'm showing here how such an algorithm would operate in a 3D tomosynthesis stack, looking at all the slices, but essentially reconstructing a truly three-dimensional volumetric estimate of the dense tissue volume. And there are studies that have been published around this area, looking at volumetric measures of density from both tomosynthesis and breast MRI, and the increasing evidence suggests that this data can offer better risk prediction than 2D mammography. But with additional data, we can start thinking a little bit beyond just the conventional measure of breast density. Some of you may be aware from some of the earlier studies in this area by John Wolf in the 70s, who was looking with genome mammography in 2D data, and trying to describe the notion of breast complexity. So beyond just the binary thresholded, kind of more simplistic way of quantifying the baryentomal pattern, we can just look at it visually in this data and see, for example, the two breasts on the left-hand side could be assessed as more of a high-dense breast. But you can say that the pattern of the breast is very different, more distributed versus more centralized, and so forth. Similarly, the breast on the right-hand side would be assessed as perhaps a low-density breast, but as we can see, there's still a different amount of complexity and difference in that complexity throughout the breast. So how can we quantify that? In some ways, some of the convolutional neural networks and the AI models are trying to capture those patterns, but looking at analyzing the data at different levels of resolution with the different convolutional layers. But we can do also that with different measures, perhaps more easily interpretable in some sense, and essentially requiring smaller datasets to train models. So this is the field of radiomics, which was coined more than 10 years ago now, and feels like yesterday, where radiomics refers to the extraction and analysis of large amounts of advanced quantitative imaging features with high throughput for medical images. We can draw a parallel about these features. If you think about other domains, like a sequencing, RNA sequencing data that we went in a context of analyzing large amount of information with a single assay, in the same way, radiomics allows us to look at different patterns in the image. And how do we do that? We do that by sampling the image at very low resolutions, even at each individual sort of pixel level within the image, or at a higher resolution, but essentially quantifying a lot of different types of features that refer to the morphology of the tissue, the pattern, heterogeneity, the contrast in the image, and so forth. And without going through equations, although there's a lot of literature in this field, in fact, going all the way back in the 70s, interestingly, from satellite imagery applications and military applications, not the medical imaging, we essentially, what this can do is take us from the original image space to pattern images of the parenchymal pattern. Artificial neural networks essentially do the same thing, but at a much, much higher scale. And in a way that they summarize the information on different layers without necessarily providing a very sort of transparent explanation often. So we go to the space of the pattern where we can have pattern images of pattern features characterizing the original scan. And you can see here different aspects of the heterogeneity visualized with this sort of radiomic algorithms. So then we can quantify the complexity. Again, we can go from the pattern images to numbers characterizing this complexity. We can even derive complexity scores using the signatures. And now you can see we can start putting numbers to that notion of complexity, which allows us to quantify and compare and risk stratify perhaps in a more refined manner. So you can see on the left-hand side, those breasts that there were more at the higher end of percent density now have very opposite and very different complexity scores. Similarly, the two breasts that were more in the lower end of breast density now have, again, very opposite and very different complexity scores. And this is an area of increased interest in the field kind of in parallel with AI looking at radiomic features, which in some ways simplifies the version of a convolutional neural network, but with the feature defined a priori. And we can use these measurements in a similar way, for example, as we're using profiling data, Oncotech DX data, RNA sequencing data, and so forth to create, to try to understand and interpret a little bit. And here's perhaps an advantage of these methods versus the traditional sort of CNNs and artificial neural networks on what these features mean. So each column in this heat map represents an image, a scan, and this row represents a feature, a feature about the heterogeneity, the morphology, the complexity, and so forth. Those can be hundreds of features, thousands of features. Essentially what a neural network does is to extract thousands or more of these features and at each layer, summarize this information in a similar manner. We're doing this sort of with a different approach. So here we're asking the algorithm to group these signatures of heterogeneity of the tissue in similar groups and come up with classifications of the tissue based on the radiomic high-dimensional textual pattern. So here you can see how we can characterize the tissue based on different groups from lowest complexity in light green, lower to medium complexity in darker green, medium to high complexity in darker red, and very high complexity in very bright red. And if you look at this graph here, you can see that although there is an association with breast density, it's kind of expected because density is a dominant feature of the pattern, this is a non-linear association. As we saw in the feature, they can be, in the figure in the previous slides, they can be breasts that have low density but high complexity and vice versa, high density but low complexity. So there's more information in the imaging data that we're not able to capture with the conventional percent density measures alone. And studies increasingly demonstrate that when you include those measures into risk assessment models, I'm going to move this, I don't know, to make it a little easier here to see, you can get statistically significant differences in the ability of the model to distinguish women who develop breast cancer from women who do not just by looking at the added value of these complexity features, confirming that there's more information in the image than the conventional percent density measures. More recently, there's some work also being done to scale this and try to see if this validates in the population level. This is a very large study that we recently conducted, was presented at the San Antonio Breast Cancer Symposium at our presentation last year, where we looked at a really large case control sample. We looked at identifying those signatures in the, first of all, in a sample of more than 35,000 women, and then we took an independent case control sample and tried to see if they had predictive value. And what we saw again is that these measures tend to offer a little bit better performance than conventional percent density, especially when looking at identifying interval or false negative breast cancers, which may be because these measures better capture the complexity of the tissue, which may be related to the likelihood of mass gain. So there's a lot of more information in the image that we can quantify with different aspects, either with AI or these more a priori features. And a lot of times people tend to think about those things as competing approaches, so we do like an AI, a deep learning model, so we do radiomics or handcrafted approaches, but I tend to believe that we can combine the best of both worlds. And there's evidence to suggest that we can actually use those a priori feature definitions, which could make the job with the neural network easier and reduce the complexity of the computation, but then leverage the strength of the neural network to truly dive into this data and recognize pattern that we may not be able to uncover with conventional statistical methods and see an even increased performance by combining the radiomic feature approaches with deep learning features, deep learning architectures, which I think is a very promising approach moving forward. And ultimately, an area where I think AI can come together is to help us move beyond just thinking just about imaging and imaging markers and risk assessment and diagnosis and tasks related not just to screening, but also in prognosis and therapy response evaluation. We need to be thinking about how to put all these pieces of the puzzle together using sophisticated approaches like AI and machine learning, including imaging biomarkers, information from the electronic and medical health record that may be related, here, for example, family history, other demographic and reproductive risk factors. There may be modern assays, emerging assays, genomic and genetic assays, for example, single nucleotide polymorphisms or other information, liquid biopsy data that ultimately down the road could be related to the likelihood of the risk of cancer and be incorporated into the screening. So, a great time for AI and data science to help us put all the pieces of the puzzle together and arrive at more personalized screening and prognostication and therapy approaches in breast cancer. And with that, I would like to pass on the baton back again to Dr. Bao who will guide the Q&A session. Thank you so much. We have a number of excellent questions in the chat box from the audience. The first question is, could you comment about AI bias in various populations like the Latin population, Asian populations, et cetera? I can briefly respond to that question. That question reminds me of a study that Dr. Zacherson mentioned in her presentation, a head-to-head comparison of three AI algorithms published in JAMA Oncology in 2020. Now, these three commercial AI algorithms were actually tested in a European population and the best performing algorithm had actually been trained on South Korean women. And what was interesting about that particular algorithm is that it was trained with the largest number of cases compared to the other two commercial algorithms and that it also used pixel level annotations for training. So, it's an interesting example of robustness across different populations due to the training parameters that were used, not necessarily the diversity, the ethnic diversity of the training set. Does anyone else want to comment about that question? I might call out Despina because of her interesting work that she's been doing in absolute volume of glandular tissue. In our urban population in Philadelphia, we have a very diverse population and some of our patients have very large breasts, which one would subjectively call less dense mammographically. Despina, why don't you take it from there for how important the absolute volume is? Yes, definitely. And we've seen differences in different ethnic groups or the absolute volume and other measurements. But I would say that in general, a lot of these AI tools are not developed using ethnically diverse samples and they may not be representative of different ethnic groups. And unfortunately, I think also the FDA currently doesn't integrate these criteria in a very strict way in approving nuclear devices for clinical use. And that is a serious concern that can generate biases of how these models operate in the clinical setting. So, I think it's something we'd be careful and mindful about, and also something that both the academic community and industry needs to come together to develop tools that can serve all people, not only selected subgroups. Thank you. The next question is, besides heat maps, what other information could be helpful for radiologists that current commercial CAD systems don't provide? Hmm. Tricky question. Yeah. You know, I know in the tomosynthesis world, the nice thing is that it actually shows you the algorithms tend to show you the multiple slices engaged in the finding. And so that you're able to rapidly click to that area and you also get a feeling for the segmentation, as well as the individual lesion score. Yeah. So, I think it's like that. I was thinking about that only some of the current commercial algorithms provide information about comparisons with priors. Right. Because that's useful information for us as radiologists. Is this unchanged or not? They mark something? Is it like, yeah. So, that's a feature that is coming along, I see, for many of the vendors, but it's not there yet for all of them. I'm also not sure to which extent the available tools are trained also using non-imaging data. So, to which extent they would say that, you know, we're indicating this as a suspicious area or as a high risk because there's X, Y appearance in the image, but also because the woman is X years old, this background, this risk factor in her sort of medical record and things like that. And in many ways, lack a more thorough, explainable AI component, which is another field of great interest right now. And eventually, I think, will be translated as well. Yeah. Thank you. And I would just add that perhaps providing a level of confidence might be helpful to us. That is, you know, the AI algorithm generates a score or an output and then also conveys its level of confidence in that output. I think there's a lot of research and interest, you know, in that area. For sure. You know, let us just answer one more question. We're approaching the end of the hour here. And I wish we could answer all of the questions, but our time is limited. The last question is, with unsupervised learning, what is the reliability of AI? You know, the research that I'm doing is primarily or actually 100% supervised learning. Do any of the other speakers have experience with unsupervised learning? Well, to this, you know, I can actually answer it. I think it depends on the task, you know, unsupervised learning. It depends on the task. What are you trying to, you know, identify as, you know, is it a task to put something down in the image or predict the future? And supervised learning can be very useful in the case we don't have large data sets that are sort of available labels also. And you want the algorithm to find patterns from the data without having a label. And that's very much a lot of the times the case in a clinical setting. It's very time-consuming to obtain labels. But I don't think in some ways more or less unreliable from supervised. It's just a different task that we're asking these networks to do. Thank you. Well, thank you so much. I'm just going to pull up our last slide here. Thank you so much to everyone for joining today's webinar. Thank you to Dr. Conant, Dr. Kontos, and Dr. Zacherson for being so engaged in the planning for this webinar and for giving excellent presentations. Thank you to Kim and to the RSNA for providing us with this opportunity. Thank you.
Video Summary
The RSNA webinar on "Artificial Intelligence in Breast Imaging," led by Dr. Manisha Baha, featured discussions on the implementation and advancements of AI in breast cancer detection and diagnosis. Key speakers included Dr. Emily Conant, Dr. Despina Contos, and Dr. Sophia Zacherson, who highlighted the integration of AI algorithms in mammography and tomosynthesis to improve accuracy and efficiency in diagnosing breast cancer.<br /><br />Dr. Bahl introduced the session by explaining AI's role in radiology, emphasizing its subfields, such as machine learning and deep learning, which are pivotal in developing algorithms that enhance imaging diagnostics. She explained concepts like supervised versus unsupervised learning, crucial in AI development, and internal versus external validation for model performance evaluation.<br /><br />Dr. Zacherson provided insights from a European perspective, focusing on AI's capabilities like workload reduction and early cancer detection, supported by recent studies showing AI’s potential to replace one of the radiologists in double-reading screening programs. She stressed the importance of validating AI tools for diverse clinical and demographic environments to ensure effectiveness.<br /><br />Dr. Conant highlighted AI applications in tomosynthesis, especially in the U.S., discussing studies that reveal AI's ability to improve sensitivity, specificity, and reading efficiency. She also mentioned the potential of AI for risk assessment in breast cancer screening, leveraging advanced imaging data.<br /><br />Dr. Contos discussed the use of radiomics and new AI directions for better breast cancer risk assessment, noting that AI should incorporate diverse data sources for improved personalized medicine. The webinar concluded with a Q&A session addressing concerns about AI bias and the reliability of unsupervised learning. Overall, the webinar underscored AI's promising role in enhancing breast imaging, addressing ethical considerations, and fostering innovation.
Keywords
Artificial Intelligence
Breast Imaging
AI in Radiology
Breast Cancer Detection
Mammography
Tomosynthesis
Machine Learning
Deep Learning
Radiomics
AI Bias
RSNA.org
|
RSNA EdCentral
|
CME Repository
|
CME Gateway
Copyright © 2025 Radiological Society of North America
Terms of Use
|
Privacy Policy
|
Cookie Policy
×
Please select your language
1
English