false
Catalog
Healthcare Implications of Large Language Models l ...
WEB37-2023
WEB37-2023
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
I'm now going to go ahead and introduce all the speakers. First, my name is Linda Moy. I am at New York University, and I am the editor of Radiology. So today, we are joined by five physicians who will discuss the benefits, challenges, and implications of CHAT-GPT and other large language models in healthcare. I'm going to introduce each of the speakers now. First is Dr. Sam Biswas. He is at the Le Monheur Children's Hospital, the University of Tennessee Health Science Center. He led one of our leading commentaries on the use of CHAT-GPT. Next, we have Dr. Jonathan Elias. He's at Weill Cornell Medicine and New York's Presbyterian Hospital. He's been quite involved with us and, importantly, serving a role as a frank physician in helping us figure out how to implement these tools. Next, it's my pleasure to introduce Dr. Keith Hentel. He's also at Weill Cornell Medicine and New York's Presbyterian Hospital, and a senior leadership position there. Next, we have Felipe Kitamura. He is at DADA in Sao Paulo, Brazil. He is one of our leading AI experts, also run an excellent commentary on CHAT-GPT and one of our leaders in AI. Last, I'm pleased to introduce George Hsi. He's also at Weill Cornell and New York Presbyterian. I've had the pleasure of working with him in many venues, including the RS&A AI Imaging Certificate Program. He's another guru in AI. We have a wonderful lineup for you. Without further ado, Dr. Hsi is going to give us an intro and overview of CHAT-GPT and GPT-4. Thank you so much, George. Thanks, Linda. Today, I wanted to give a brief intro about CHAT-GPT and the newly released GPT-4 to provide some context for the rest of the webinar. My main experience is with CHAT-GPT and GPT-4, but I think a lot of what I'll be talking about will apply to applications using other large language models. Here are my disclosures. For OpenAI, I'm an unpaid advisor, so I'm listed on their website as a contributor to GPT-4, but I have no financial conflicts. I don't own stock and I'm not an investor. So, CHAT-GPT is everywhere, and it happened in the blink of an eye. It took only two months to reach the 100 million user landmark, which is faster than any other app previously. And I would also say it's really the only pure AI product on this chart, even if some of the other apps use elements of AI, maybe with the exception of EvoTranslate, which has evolved over time. And by some estimates, CHAT-GPT now produces more text every two weeks than all the printed works of humanity, which is pretty astounding. And I agree with this headline that says CHAT-GPT is the world's best chat bot. So, let's give it a try. The following example is from the original CHAT-GPT, or GPT-3.5. As Linda mentioned, she and I were course directors for the first RSNA AI imaging AI certificate. And if you work in radiology and are interested in learning about AI, it's a great way to get introduced to the concepts of AI, and you don't need any programming background. RSNA has been working on part two of this certificate, which is the advanced certificate. And a couple months ago, I was asked to give a talk for part two on the normalization of DICOM images for deep learning. So, I thought it would be fun to ask CHAT-GPT to provide me with an outline of my talk so I could save some time. So, here's the prompt that I used. Please describe the steps for normalization of DICOM images for deep learning training purposes. Where relevant, please include some Python code to help illustrate how this would work. And by the way, writing good prompts or instructions is really the key to getting the most out of these large language models. This is called prompt engineering, and it has quickly evolved to become one of the most desirable skill sets for many companies. So, let's see how CHAT-GPT did. So, notice how, so I'm going to go ahead and ask the question, notice how it provides a, you know, brief outline of steps with text, and followed by some code blocks, some Python code blocks, which are nicely highlighted. And in case you're wondering, I've tried probably most of the code that CHAT-GPT has generated, and so far, it has worked. Although, I've heard that it's not always 100% correct. So, this was pretty impressive to me. It probably saved me a couple hours of my time. So, how does CHAT-GPT work? This section is specifically for CHAT-GPT. It probably doesn't apply to other large language models. First, a disclosure, I am not a computer scientist, but will try to do my best to provide a high-level overview. This illustration describes how CHAT-GPT was created, and there are essentially three components that make this chatbot work. We'll start with GPT-3. So, what is GPT? It stands for generative pre-trained transformer, and a transformer model is one of the state-of-the-art neural network architectures that most of these large language models use, and it's based on neural networks, right? And this NLP model, natural language processing model, performs many different tasks, and I've listed some of the common tasks here, like text classification, text generation, summarization, translation, etc. And so, at a very, very simple level, what GPT and CHAT-GPT does is that it predicts the next word, right? And so, for whatever task you ask it to do, so if you're asking for summarization, it would predict, you know, many next words for that task. So, in this illustration, if the input is thou shalt, the output of GPT might be not. This is similar to how your smartphone keyboard works. Probably everyone is familiar with that, except that GPT is probably a lot better. And this is what we mean when we say that GPT-3 was trained on the internet. I've highlighted Wikipedia on the list here. Oh, sorry, the second layer in CHAT-GPT is called instruct-GPT, and it's the basis for the real improvement for CHAT-GPT over GPT-3. And I've listed the reference below, but it uses a technique called reinforcement learning from human feedback, and it's summarized nicely in the article. And this illustration, which I won't go into details on, shows how it works. There are three steps. The first two require humans in the loop, and after the first two steps, you are able to train a reward AI model that helps in step three, by optimizing the chatbot outputs using reinforcement learning. And this is an unsupervised step, so they're able to scale this step really nicely so that you can dramatically improve the chatbot. And the other improvement, the third layer, was that they added a safety layer. When GPT-3 was released, it made a big splash, but it was also shown to provide unwanted and sometimes biased or bigoted outputs, and so they realized that they had to include some safety features. The goals for CHAT-GPT were to make it helpful, honest, and harmless, and so they implemented some guardrails and safety rules like these to make CHAT-GPT better and more user-friendly. So what about GPT-4, which was launched last month? By the way, we're going to take a poll later to see how many people in the audience are using these technologies, but I'll go ahead and summarize GPT-4. They published this report, which provides a lot of technical information about GPT-4. It's freely available to anyone who's interested, and if you scroll kind of all the way down at the end and splint, you'll see my name listed as one of the people who helped to test GPT-4 before it was released. The interesting thing about my name being there, besides the fact that my kids now think I'm cool, is that they specifically wanted someone to help test for healthcare uses. So clearly they believe, as I do, that CHAT-GPT and large language models will have a big impact on healthcare. In terms of the model size, OpenAI did not release details, but some have estimated that GPT-4 is about six times larger than GPT-3, and typically we think that a bigger model size will improve performance, and so far from my testing, that's been unequivocally true. GPT-4 is way better than the original CHAT-GPT. I've tested it on various prompts about medicine and radiology. I'll show you one example later, but let's go over a couple of features which I think are kind of the most important improvements in GPT-4. The first one is that GPT-4 can understand more complex inputs, and the amount of text it can handle in its input prompts has increased almost tenfold. So before you're limited to about 8,000 words, and now the theoretical limit is about 64,000 words or 50 pages of text. So really enough, for example, to maybe include an entire patient medical chart into the prompt so that you can interrogate and ask questions about that chart. GPT-4 has multimodal capabilities. In other words, it can do images and text. This is not yet released, but it can accept an image as an input and interpret and understand it just like a text prompt, and so you can include text and images, including documents that have both, even sketches and screenshots will work, and I'll show you an example of that later. GPT-4 has different personalities, so they have a new system message where you can say, you know, you can tell it to be the, you know, world's best tax attorney and ask it questions about, you know, your income tax return. And, you know, many other improvements. It's more multilingual, has improved safety and performance improvements, and I just wanted to show one article that came out, and I think we'll talk more about this in our discussion, but this paper came out from Microsoft Research who got very early access to GPT-4, and their conclusion was that GPT-4 shows some sparks of artificial general intelligence, or AGI, and it's really worth looking at the examples that they provided. It's fascinating and impressive, so I'll just end with a few examples. This is an example of, you know, a multimodal chat where there's an image, and you can ask it questions like who is this person, Albert Einstein, what is the equation E equals MC squared, and what does the equation mean, and it can answer pretty well. So this is a radiology audience, so of course, you know, we had to try some images. This is a chest x-ray that we put in, and you can say that, you can see that it understands that it's an x-ray. It understands the body part is chest, and it even knows that in the left chest there's a pacemaker. So this is a generic model that has no special training in radiology, and it can already understand radiology images to some extent. This example is for Linda. We put in a mammogram, and it was able to tell that it was a mammogram. It claims it's a normal mammogram, gives a BI-RADS category, and it's pretty certain that it's normal. So I'll defer to Linda to see if it's actually correct or not. And finally, I wanted to show an example, one last example. I was trying to get my friend to try chat GPT a few months ago, and he said, I'm waiting for the day when I can just say, generate a CT abdomen and pelvis report with steatosis and uncomplicated sigmoid diverticulitis, to which I responded, why do you think it wouldn't work? So I ran it originally in chat GPT, the original chat GPT, but I just ran it in GPT-4, and I'll show you the answer. So you can see on the left here that I put in the system prompt, you're a world-class radiologist interpreting CT scans during a busy ED shift. So I framed it like that. I put in the key findings, which is hepatic steatosis and uncomplicated sigmoid diverticulitis. Then I pasted in a normal CT abdomen template from the RSNA RAD report website. And then I asked it to structure the report based on the key findings, rewrite the impression, and then also please provide the CPT and ICD-10 codes, right? And you can see the output here, I think is really impressive. Just from the key findings, it was able to put the hepatic steatosis in the right liver section, as well as the uncomplicated diverticulitis in the bowel section, and generated a nice impression. And you see the CPT and ICD codes at the bottom here. So for me, GPT-4 was a big improvement, and very excited about this technology and discuss it further on this webinar. Thank you. George, thank you so much for that wonderful introduction and talk. I think I better start updating my LinkedIn profile for being a prompt extender or something. But all kidding aside, I think now is really the portion of the webinar all of you signed up for. We're going to have a fireside chat discussion, where Dr. Bismuth and I will moderate. And we're going to ask the panelists a bunch of questions that have appeared most commonly in talks of my colleagues on Twitter, on journals. So let's go ahead and get started. Hi. Hello, everybody. So let's start the discussion with introductory questions. So the first question to the panelists is, can you explain to the audience what CHAT-GPT is, and how it differs from other language models? Felipe or George, do one of you want to take that? Yeah, sure. I guess George already explained a little bit of how it works. I think there is a different effect in these new large language models that we were not seeing before on prior language models, which is the scale of these models make them better, not just in terms of being able to answer more questions, but also there are some capabilities, some skills that the models develop. There's this nice example that Peter Lee says that they trained these models to solve some simple mathematical equations, and they were not able to solve more complex ones. But after you pre-trained them with general text, they were able to generalize the concept and solve more complex equations. So I think this is the kind of capabilities that we're talking about in these new models. Yeah. I also want to emphasize that when they trained these large language models on the internet, the predictions that they do for the most likely output is not necessarily what the user wants to hear because some of the answers might be offensive. And so for me, the big innovation that OpenAI did was that they added the safety layer so that when you ask it questions, it gives you something that is more likely to be something that you want to hear. Than the most likely output. Great. Thank you, George. This is the most popular questions right now that's being answered, and I'm hoping that a number of you can answer it. It is, how useful will ChatGPT be in radiologic reports in the future? I mean, I can start off by saying I think that it's going to be very useful for two reasons. One, we've already seen examples of George putting together his outline and even creating a radiology report, how it's going to be a time saver. Because it'll be able to aggregate information and place it into the report that you may be able to already have in the electronic health record. So I think it's not only going to improve the speed at which we can report, but it's actually probably at some point going to improve the utility of our report because we're going to be able to synthesize more information into that report as part of the reporting process. And that's something that I really look forward to. Now, I'm less worried, you know, and really wasn't a question, but I'm a little bit less worried about it replacing us as a radiologist and generating its own reports based on the imaging that George showed. Because unless I'm wrong, Linda, BIRAD0 is not a normal mammogram. That would be something else. So I don't think we're ready to give CHAT-GPT or GPT-4 its medical degree, but I do think it's going to, in the short term, hopefully, improve the efficiency and really the completeness of the reports that we put out. And, you know, maybe I'll pass it off to John if that's okay with you and, you know, and see what your feelings are as someone who consumes the radiology report as an ordering provider. You know, what do you think the opportunities are for our reports to improve? So, as an ordering provider, I'll say, I think when we're talking about looking at a patient chart, we're trying, you know, and formulating an assessment and plan, we're trying to consume all of the information regarding that patient and jot it down into a succinct plan for the care of a patient. A lot of that is radiology reporting, right? These large language models can definitely improve how we formulate our plans, can catch us if we miss something that's really important, and guide our assessments with likely the most up-to-date information. Again, we're not there yet, but that's what this is actually heading towards, I believe. So, I think consuming radiology reporting will be a lot easier for the ordering clinicians. And then the second step of this is, well, how do we communicate that information out to our patients? You know, I would say the ordering clinicians are usually the ones communicating it out to patients, and if we can put our radiology reports in layman's terms, to better communicate that information out to patients, all the better. And with the implications in the US, at least, of the 21st Century Cures Act, which basically has mandated us to share imaging reports with our patients as soon as they are resulted, I think this has massive implications, right? This way we can attach a layman's report to our radiology reports to better frame the conversation with our patients. I find all the use cases you mentioned quite useful. I would add also for radiologists, we could ask CHAT GPT to summarize the most relevant findings on patient history before we read the images. So, you could say, I'm reading this chest X-ray or chest CT, please get me the relevant information so I read this image better. Yeah. I wanted to jump on what Jonathan and Felipe said, because if you think about, you know, and Keith and I have tried to do this before and talked a lot about this before, but if you think about the radiology report, there are several audiences, right? It's the referring provider, like Jonathan, there's the patient, there's also, you know, the billing aspect of it, but it's also the radiologist, because we want to read our old report to figure out what happened before. And so, you might imagine that you just dictate one report now and generate those four versions for what you need to do kind of downstream, right? And so, I think it'll be hopefully pretty a big improvement in terms of what we do today. I would like to add in one thing as a pediatric radiologist, that what they found useful for this is even for pediatric radiology, as we have the liver span, which varies with age, we can just use CHAT-GPT to find out the normal ranges for liver or the MRI sequences for acute osteomyelitis. These are some of the side uses, which I found useful as far as pediatric radiology is concerned. That article is also being published in the Pediatric Radiology Journal for specific uses of CHAT-GPT in pediatric radiology, which we find useful based on the age of the patient. Great. I have to tell you, in radiology, I've done hundreds of submissions using CHAT-GPT for, and the most common is actually using these LLMs for appropriateness of exam as for, and also for clinical decision support. So, I guess I want to ask, how far away do you think we are and what would be the most pressing use case that you would like to use CHAT-GPT in this setting? And I'm going to start with Jonathan, because he probably gets stuck as a referring physician trying to figure out what exam to order, and then also maybe can touch upon, in United States at least, often we need to get pre-authorization, which means that third-party mean insurance have to approve of a particular study. Yeah. So, I definitely have to work on becoming a prompt creator, because I think that's a part of this. Of course, we formulate a plan and we say, well, we need an imaging order for X, Y, Z indication, but there are always copyouts to our imaging orders, or to any order, right? Patients' allergies, the clinical context, the resources available at our institution for imaging. So, all of that really has to come into play when we decide on the imaging appropriateness of the imaging order for a patient at any specific point in time. CHAT-GPT and large language models like it are hoping to basically take all of that information that is within a patient's tribe, but also at an institutional level, at a resource level, and try to boil it down to provide the most appropriate image for a patient at a specific point in time. Now, what we currently have in terms of clinical decision support is great. We're able to create algorithms that help guide our clinicians to the right imaging order at the right time, and even tailor that to our specific institutions. I think this will just take that to the next level. How quickly that occurs is, your guess is as good as mine, but we are moving pretty quickly towards that. I think, and John knows this, because we worked together on the implementation of appropriate use criteria that we've authored here at our institution, but tailoring it to the user and the institution is fine. One of the limitations of guidelines is they're broad. They cover large populations where individual patients come in, and individual patients are all unique. Every patient truly is an N of one. I do think that the more information that can be aggregated as inputs to the decision support algorithms, or the more accurate we're going to get in terms of what we're going to recommend. I don't think we're that, to your timeframe on being that far away from that, I don't think we're that far away from that at all. Many decision support mechanisms or even algorithms have the ability to take in some of these inputs and give out good answers. So I suspect that these large language models should be able to meet, if not surpass, that type of accuracy in relative short order. Great, I have another follow-up question which is appearing in our questions, which is how do we make these large language models HIPAA compliant? Yeah, I guess, I mean, I could take a stab at this. I mean, there are a couple of things that you are worried about in terms of sending PHI to APIs, but I think as long as you have, if Epic, for example, has integrated Chat GPT and you're using Epic, then I would assume that Epic would take care of the HIPAA issues, right? The other potential way to do it is to run a local LLM so that it stays within your firewall. It's really not, it's possible today, but not kind of the state-of-the-art language model, but eventually we might see that. Maybe other people have thoughts on that. Yeah, I mean, I think in healthcare, we tend to sort of be at the tail end of new technology adoption oftentimes because of concerns with privacy and HIPAA. And I do think we're, in this case, we're going to have to rely on our vendors to help implement this technology because in the current form, I don't think it's going to be as helpful where you have to go to a separate portal and then copy and paste into whatever system you're using and copy and paste into a portal. So I do think that as we go forward, our vendors are going to implement this technology and we will have it available through platforms that are HIPAA compliant and spend a lot of effort remaining HIPAA compliant. I'll also add, sorry, Linda, that HIPAA is just one policy, right? And it's a large policy, but I think that institutions will have to come up with their own policies as to how they're dealing with AI platforms that they are implementing. And obviously the FDA is getting more and more heavily involved in this sphere, but at an institutional level, they'll probably have to be decisions made as well. Thank you. I'm just going to say, I don't have an answer, but I'm thinking of large language models as when we first introduced other AI tools to assist us with images. You ideally want it to be used for the more tedious repetitive tasks, which seems like the reports that my referring physician told me, they had to stay late in the day to dictate, for our reports and so forth, that that really is a low hanging fruit that would be very helpful, but we do have to bridge this HIPAA compliance and then do what George suggested, which is you can throw in the patient's entire EMR and have it complete a report or tell us even what the most likely diagnosis and what are the best imaging tests we can use. So. Coming to the, as we were talking about the HIPAA compliance, what are your thoughts on the proposed six month pause on the AI, like charge GPT? Is it a good or bad idea? Should we wait for regulatory bodies to come in and then move forward? Well, I think that's a controversial topic. I'm also curious to hear what my colleagues here think. I think there is a reasonable aspect for this ask, but I think there are also other sorts of commercial interests behind that. I mean, I think to me, in some ways, the cat is out of the bag and people have their hands on this technology right now. And to pull it back, I think will, while possible is, I'm not sure how practical that is. I also think that in some ways it may give us equal footing with other areas of the world. In some ways, it may give us equal footing with other industries, because I do think that as medicine, we're gonna be slower, that we will be slower to adopt some of this technology than some other sectors. But again, I just, I don't know how practical it is to even pause this now that people, it's so widespread and people are using it, adapting it. I tend to agree with Keith, it's out of the bag. Everyone is using it in so many different ways. You really cannot police something like this. And another point Keith says that this is actually a chance for us as radiologists, even those in healthcare field, to actually be at the table of developing tools that will help us. Whereas usually it's that vendors or other sectors may come to us and say, oh yes, we've developed this for you and we have to make it work. So I also am a bit ambivalent about that happening. George, what do you think? Yeah, I think it's very tough. I think there are lots of geopolitical considerations as well in terms of pausing or slowing things down. I just wanted to add that not only is it good for healthcare in general, it's within healthcare, I think it potentially equalizes some of the existing disparities in terms of the resources and the expertise that certain places may or may not have. If CHAT-CPT is better than the average physician, then that means that everyone has access to that kind of expertise potentially. I'm not saying it is better already, but eventually, hopefully it will be better than the average physician. So I think in terms of the good of healthcare, I think it's great to keep pushing this. Well, George, if I can ask a question based on what you just said, maybe I should worry more. Is this technology gonna start replacing physicians? You're asking me? I don't know. You're the one who said it could be better than an average physician. So the average physician in terms of maybe the knowledge, like in terms of, I mean, think about the things that we know, we're specialized mostly in radiology except for Jonathan, but what if it knows about all the fields in medicine? And so I would say that that's why I would say it's gonna approach being better than the average physician at some point. There are reports out there that say that there's gonna just be a lot fewer jobs available. On the order of maybe 100 million is what I saw in one report. Maybe that will also affect medicine because if you can do more, I mean, that's the promise of technology is that you can do more with 1% than the technology. So I don't think that we're really in a lot of danger in healthcare because we have a huge shortage of doctors today, but maybe in a hundred years. Well, hopefully I'll be retired in a hundred years. I'm not sure about a hundred years, but this brings me to another question, which is the most popular in the chat right now, which is concerning earlier stage radiologists. So it's a long question, which I'm gonna read it. It says, what should radiologists in earlier stages of our career, including those in training, think about in terms of what skills will be useful in a future era of chat GPT being incorporated into the practice. And they imagined that a high volume reading that we do may not be as useful of a skill as other things like consultancy or teaching skills. So I guess the question really is with chat GPT and AI, how does the role of the radiologist change and what changes should we be making? So how would the radiologist's role change and should we be making certain changes in how we train the radiologists right now? I think what this technology just emphasizes is something that I think a lot of us have been advocating for a long time is in building back on my question to George's, we really do have to be involved directly with patient care and we really have to be involved with our patients. Otherwise technology will replace us or another ology using this type of technology will be able to replace what we do. So first of all, I think that consulting, speaking to patients, being a physician, managing patients is something that we should all be focusing on prior to this technology, but now even more so going forward. And that would be my primary advice to people entering the field now. I'm sure others have other thoughts as well. Sam, what do you think? You're probably closest to training than any of us. So yeah, this is a concern which we all trainees have. I think the jokes that's flying around in pediatric radiology is that we have to stick around with more of variant minimals and procedures, which are less likely to be taken over by softwares. But yeah, so neuroradiology, MSK, I read a paper recently that those are the fields where artificial intelligence is making great strides. And although I haven't yet seen any case where a completely accurate diagnosis is made from the image, I mean, there are some cases, but not in every case. But so, yeah, it does bring a little bit of concern to us younger, earlier career radiologists, but we have to, I think, go with the flow and change, adapt. And as someone said, we have to probably go into teaching or more into teaching or research or depending on how this AI evolves eventually. Yeah. I might add that maybe the change will be from quantity to quality. I mean, I think Chachi Petit makes it a lot easier to find any potential issues to give you, like Felipe said, better history, right? So maybe the quality will be judged more on the quality of your reads. I'm not sure if the quantity factor will go away, but you're more likely to deliver something with a higher quality. So I would recommend that the trainees embrace this technology, help to shape it, get involved somehow, right? Maybe this is a different topic, but I'm probably more worried about the radiology companies than I am about the radiologists at this point. Because I think that this is a huge disruption to a lot of the things that our radiology companies do today. But maybe we'll save that topic for later. Well, I'm just adding a comment. You know, this year for the U.S. radiology residency, it was every spot filled. So whereas I think five or six years ago, we were concerned that, oh my gosh, AI is going to replace radiologists. You know, at that point I was going to segment lesions. But I think that fear subsided. So maybe, you know, the same thing may happen with CHAT-GBT or these LLMs that it's, you know, maybe they're regular issues, a little bit harder to implement, but it seems as though technology works best when it's something that radiologists or another physician can use in their existing practice. And I see Jonathan nodding his head. So maybe you want to expand upon this point? Yeah. So, I mean, it goes back to the fundamental theorem of informatics, which is basically that, and this was posited, I think, in the early 2000s, which seems like a lifetime ago, unfortunately, at this point. But it was the idea that, and it was previously thought that a computer by itself, and there's like the stereotypical image that I think anyone can pull up about the fundamental theorem of informatics that you can Google or CHAT-GBT for that matter, that the computer will outpace the clinician at any point in time. And that was posited to not be true, right? It is the computer plus the clinician to be better than the clinician by themselves. I think that that is a really important point here. To George's point previously, you know, we're not, this change is coming, right? And has come to some part, in some part at this point in time. I think the idea is how can we become crafty with the changes that we have in place? How can we become more efficient with our work? At least as a pediatrician, I see it as a huge improvement to the administrative burden that I have to deal with on a daily basis, right? And I'm going to use that to benefit my efficiency and to improve the quality of care I give to my patients, sitting in front of them more instead of sitting in front of the computer. Figuring out what that means for radiology specifically is up to those young radiologists who are watching. I think you just have to think outside of the box here. This is not your standard playbook anymore, so. That's a very good point. And to your point, I guess this is also in line with what Andrew Ang said a couple of years ago or a few years ago, which is jobs are usually made of multiple different tasks and AI may help in some of those tasks or even replace some of those tasks, but not all of those. So I think this is still something that is true today. As a trainee, I would like to add a point is that we don't have artificial intelligence in our curriculum yet. Like we are not tested that deeply in ABR core exams or ABR certifying exams. I mean, we have a couple of questions, but it's not a major chunk like neuroradiology or MSK. So I think eventually that AI will become a major section of the certifying exams of any country as far as the radiology, residency, passing education exams are concerned. So that is something I think should be added soon as part of the curriculum so that new trainees get used to it. And even the ones who are not knowing the AI changes, they get used to it. If you put it in the exams, not everyone has to learn it. So that is something which I think will be important for the younger generation of radiologists to that AI is part of their curriculum. And knowing that there, obviously there's with any change, there's anxiety and disruption causes anxiety. The other way to look at it is for young radiologists is this is opportunity. And I know George alluded to this a little bit, but imaging just by its nature, the fact that we take pixel data and textual data and synthesize it, we're as a field so well-suited to be leaders in the implementation of this type of technology and how it's gonna even apply more broadly across medicine. So I think with any disruption and disruption, I think with any disruption and the anxiety that comes with it is opportunity. So I think it's a good time to be coming into radiology. I wanna make a quick comment about wisdom. I mean, I agree that there should be hopefully more opportunity to learn about AI in the training curriculum. The irony of large language models for me is that I work with a lot of students and trainees and I usually work with people who are technical that can program. The irony of CHAT-GPT and large language model is that I don't have to do that anymore, right? Because CHAT-GPT is really good at writing code. So that's not really the main constraint anymore. In fact, someone who is a good writer, right? Maybe an English major may actually perform better at using CHAT-GPT than someone who's a good programmer. So I think things are changing a little bit. It's really opened up to this technology that wants a lot of other people. Okay, great. I have two quick technical questions and a question which maybe Philippe and George can answer. First question is, can CHAT-GPT read handwritten clinical notes? I don't know if I'm aware of that, maybe George. Yeah, the unreleased, yes. GPT-4 should be able to, or at least attempt to. Is it in my handwriting? That's an awesome ask. I think it's trained on some of your notes, Keith. Now we're all in trouble, there goes the accuracy. By the way, the examples I showed you were not GPT-4, right? That was another model that also accepts images. The only example I saw was on the OpenAI live web stream that they did a month ago, but eventually it will be able to incorporate images and handwritten texts. I would expect that it would be very, very good. We're going to throw up a poll question in a minute, but I'm going to ask my other follow-up question, which is, is there any other LLM like CHAT-GPT that can be trained with radiology data? You want to take that, Philippe? Yeah, I mean, I've seen a couple in the past weeks. You might know others as well. So there's one from Google, which is called BARD. There's Yama, Vicuna. People are coming up with different sorts of names for these models. Some of those you can run locally in your machine and even fine-tune it. Others are really big models. There's also AutoGPT, which is giving GPT access to internet and making it able to execute what GPT says based on the initial goals you set. So these are some of the ones that I heard recently. I wonder if any of my colleagues here have heard of any other. I mean, there are a couple of open-source biomedical models, right? I think one is from Microsoft. The other one used to be called PubMed-GPT, which they renamed because they got in trouble with the age. But you can actually train GPT-4 with radiology data. They let you fine-tune GPT-4 with your own data. I think you need access to the API. But I assume that we're gonna have eventually lots of models for healthcare. Great, thank you. I was hoping we could show our poll question. So I'll go back to another question, which was, I'm sorry, Sam, did you have a question you wanted to ask? Go ahead. Yeah, so coming to what are the limitations and the medical legal issues, which we have to face going forwards using CHAD-GPT in radiology, in medicine, and in any field at large? Can you rephrase your question? I'm assuming I understand it. So what are the limitations, challenges, and the medical legal issues which we need to, which we have to face going forwards in radiology, in clinical medicine, and in any field in general using CHAD-GPT? Since we discussed all the advantages, I would like to know about the... That's a loaded question for just a few minutes left in... Let's concentrate on radiology then. Yeah, I mean, I think it's even hard to answer that right now because we're just at the beginning of this technology. And it's not even clear to me how this is gonna be regulated and what it's gonna be considered. I think in many ways, different use cases are gonna be regulated very differently. It's very different if it's helping you to actually recommend care as opposed to helping you more efficiently create reports. So, I mean, I don't know if anybody else on this call has insights, but it's not even clear to me how this is gonna be regulated. I think we're gonna have to maintain the basic rules of privacy and respect those types of things, but how it gets regulated is not clear to me. Yeah. I'm gonna jump in because one of the attendees is Eric Rengstruck, one of our leaders in Europe with AI. And he had a comment first, was shouldn't we rather stimulate more scientific work, meaning clinical validation of CHAT-GPT type of solutions instead of trying to slow down the deployment of such solutions in the medical community? I think we all agree upon that. And his second question dovetails on Keith, which is, isn't there a responsibility to call for such research to scientific societies and governments? That's an easy one to answer. I would say, yes. Okay. So who should be the one who is doing the regulating? You know, I'm thinking, do we imagine tools kind of like FDA that's putting all of our AI tools? Or what do you see? I mean, I don't know who's gonna do the regulating. I do think it's upon us to do the work to show whether or not it's valid or not, whether it's accurate or not. You know, and then hopefully as physicians, you know, we have the ability to determine that. But yeah, I mean, George, I know you've had some interactions with the FDA. Do you think that ultimately they're gonna be the group that manages this? Or where do you see this going? I think for most of the use cases that we talked about today, the FDA probably wouldn't regulate this just like they don't regulate EHRs today, right? So if you think about kind of EHR enhancements and reporting enhancements, they don't really regulate that. I do think, I mean, there was a nice, I think it was an editorial in Nature that just came out recently that talks about some of the ethical implications of closed large language models. And some of the issues they brought up were, you wanna be able to make sure that you can replicate this kind of research. It's kind of related to what Eric was asking. If you're using a commercial product, they might at any point in time update their LLM, and all of a sudden your research or your findings are not relevant, right? Or maybe your applications that you're using are different. So we need a way to fix that. Maybe that's open source LLMs, maybe it's not. The other thing that they wrote about was, you're not really sure what was used to train that LLM, whether or not the content was ethical. In other words, maybe they included content that typically you wouldn't wanna include in training a healthcare LLM. How do you know what was in there? And so there are all these kinds of potential issues that we do have to sort out for medicine, but it's beyond the scope of one hour session. I just want to add in one line, is that Italy has become the first Western country to ban ChatGPT because it wants ChatGPT OpenAI to disclose what type of training data has been used, whether it was personal data of people. And so, yeah, it was the Italian government. So yeah, most probably the governments are going to regulate and there have been news that China and the European Union are trying to draft frameworks regarding use of ChatGPT in their respective countries and it's the government which is drafting those frameworks. So we'll be seeing them soon if the media article which I read was true. Great, well, I have to say, I am told that this session's over and I felt that we barely scratched the surface. So I'm thinking maybe we should have another follow-up webinar, maybe after the summer, but I just want to thank today's faculty for sharing their expertise and also thank everyone for attending today's webinar. As we said, this webinar is recorded and will be available shortly. So please check the RSA website for upcoming educational activities and resources. And thank you very much.
Video Summary
The transcript outlines a webinar featuring experts discussing the implications of large language models (LLMs) like ChatGPT in healthcare, particularly radiology. Hosted by Linda Moy from New York University, the session includes panelists Dr. Sam Biswas, Dr. Jonathan Elias, Dr. Keith Hentel, Felipe Kitamura, and Dr. George Hsi. Dr. Hsi provides an overview of ChatGPT and GPT-4, highlighting improvements such as the ability to handle complex inputs and incorporating multimodal capabilities. The discussion covers numerous potential applications for ChatGPT in healthcare, such as streamlining radiology report generation, with the possibility of enhancing clinical decision support by synthesizing information from patient charts. A key topic is the need for AI and large language models to maintain patient privacy, meeting compliance requirements like HIPAA. The discussion also touches on the importance of radiologists remaining involved directly with patient care to complement technology, which is unlikely to replace physicians entirely. Questions from attendees include the future role of AI in radiology education and the regulatory responsibilities towards these technologies, emphasizing the need to validate these tools scientifically while navigating ethical considerations.
Keywords
Large Language Models
ChatGPT
Healthcare
Radiology
Patient Privacy
AI Compliance
Radiology Education
Clinical Decision Support
Ethical Considerations
RSNA.org
|
RSNA EdCentral
|
CME Repository
|
CME Gateway
Copyright © 2025 Radiological Society of North America
Terms of Use
|
Privacy Policy
|
Cookie Policy
×
Please select your language
1
English