false
Catalog
Ethics of AI in Radiology (2021)
M5-CIN09-2021
M5-CIN09-2021
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
It's a pleasure to be with all of you. It's a pleasure to actually be seeing people alive. Feels a little weird that I'm not staring at my computer in my attic. So thank you so much for joining us. Just a couple of disclosures before I get started. Obviously, the obvious disclosures, I do want to highlight my collaborators on this project, and also want to mention that I have presented this topic before, including at last year's RSNA, so it may look familiar, but there are a number of individuals who requested that I continue to share this as it came out in the middle of a pandemic. And so, if you've seen it before, then it may look familiar to you. So what we're gonna talk about today is the use of clinical data for AI. Let me start the timer. And what we mean by clinical data, basically, is data that is initially acquired for the purposes of providing clinical care. So we have an opportunity, then, with AI and other technologies to use that data for secondary uses, but at the same time, the ethical, the legal, the regulatory frameworks are still evolving, and so as we at Stanford and others have grappled with what is appropriate in terms of using that data, how to use that data, how to share that data, we put some effort into coming up with a framework that we've published, and so we hope that this is helpful to you in terms of how to think about this. So when we're talking about ethics, the field of ethics or moral philosophy involves systematizing, defending, and recommending concepts of right and wrong behavior. So it's at the core of our identities as medical professions, and it's the basis for our laws, our regulations, and our policies. So that's what we mean by ethics in this context. So when we set out to develop an ethical framework, this was our approach, and hopefully, I found that the approach was about as useful as what we actually came up with. We tried to be as systematic as possible, at least that would work in our environment. So the goal was to create an ethical framework to establish practical, useful norms to guide behavior in terms of data use and data sharing. And so our process that we went about was to first articulate the ethical question, try to reduce the problem to its core elements, to identify the relevant stakeholders, understand the stakeholders' perspectives, consider then the perspectives, the principles that are relevant to them and their perspectives, to then choose or develop a practical ethical framework that best fits that situation, test in some way the framework on real-world questions, vet the framework and make it appropriate and make the appropriate adjustments, and then publish that framework for consideration by the broader community. So that's the process that we undertook. So as we dove into this concept of how to use clinical data, what is the context, the mental model of how to approach this, there really are kind of two ends of the spectrum. Generally you find some advocates that really feel that the patients own the data, and others on the other end of the spectrum feel that the provider, the provider institution owns the data. And just our bottom line up front, our position is that we would say that no one really owns the data in a traditional sense, but rather when the clinical data are used to provide care, the primary purpose for acquiring the data is fulfilled at that point. And so then in terms of the potential for secondary use, clinical data should be treated as a form of a public good to be used for the benefit of future patients. And so as we looked for frameworks to be able to build a way of thinking about this, the one that we came across, we felt was the most relevant, was a 2013 Hastings Center report by Ruth Faden and colleagues, entitled An Ethics Framework for a Learning Healthcare System, and they say right there in the title, it's a departure from the traditional research ethics and clinical ethics. So they make some bold propositions which we support and endorse. This is the key table from that report, and in that report they identify seven fundamental ethical obligations that those who participate in the healthcare system have. And so these include researchers, clinicians, administrators, payers, purchasers, and patients. We would actually pull in industry as well. So these are the seven obligations, respect the rights and dignity of patients, respect clinician judgments, provide optimal care to each patient, avoid imposing non-clinical risks and burdens on patients, addressing health inequalities. These are all the obligations of those who are on the providing end of care. And then number six that they highlight is conduct continuous learning activities that improve the quality of clinical care in healthcare systems. So they include that as one of the ethical obligations of those who work in and manage the healthcare system. And so we would include AI and other tools and systems that are developed to try to improve care, we would include in those types of activities. And then obligation number seven is an obligation of the patients as they articulate it. And that is that the patients have the obligation to contribute to the common purpose of improving the quality and value of clinical care in the healthcare system. So essentially they're saying we're all in this together, we all have the obligation to do what we can to try to improve the system for the benefit of those who follow just as we benefit both the providers and patients from those who preceded us. So our position then is that patients, researchers, clinicians, administrators, payers, purchasers, and industry all have the obligation to contribute to this common purpose in a manner that's consistent with other ethical principles. So that forms the foundation for what we propose as a way to think about data use and data sharing. So what do we even mean by data? Believe it or not, as we went and looked at the literature, this is not a well-defined term. So what do we even, what is data or are data? And our definition in this context, the data is a set of recorded observations of physical properties, phenomena, or behaviors. So it's the recorded observations is what makes that data. And so when we think about who is involved or what entities are involved in data, it's the entity or person who is observed, the entity or person who does the observing, and then the entity that maintains the recorded observations. So there are three entities there that are included that we have to consider. So we again position, or we feel that the way to view this is from a position that the data, once they're acquired, can be used for secondary purposes, and that should be viewed as a public good. So a public good is a commodity or service that's provided without profit to all members of society, either by the government or by a private individual or organization. So for example, drinking water is a common public good that those municipalities or those entities that provide those services can't profit off the water itself. They can recoup their costs, and maybe in a small margin, but they can't hoard the water and then profit off of its scarcity. So there are two ethical obligations that are tied to a public good. First of all, again, no single entity is entitled to profit because of the scarcity of that resource and the need that others have for it. And second, that the dissemination of the public good is to be encouraged and facilitated. So we want to get the water out to people, and so we put the infrastructure and other efforts and make the investments to make that happen. So we also need to think about data as property. Now, hopefully it's pretty obvious. Data are not a traditional form of property. It's not like a vehicle or a real estate. You can't, you don't physically have it in the same way. It's not divided or consumed, and it can be easily or relatively easily replicated at full fidelity. So the fact that I have the data doesn't mean that you can't have the data. We can both have the data at the same time. So ownership of data is an imprecise concept, and rather, it's better, we feel, to refer to two elements. That is the rights of the control of access and use of the data, and then the rights to share in the profit. So another interesting property of data is that the value of data and information are relative. So exclusive access, insider information, so to speak, may give a relative advantage to one entity because of the competitive landscape in the marketplace, but that advantage may be negated when others also have access. So it's a funny thing that that value that's derived may be relative to what's happening in the broader world. Also, data acquired for one purpose may have value for other purposes, especially when it's aggregated. So I can use my app on my phone to get directions to downtown Manhattan, and that same app, when used by thousands of other users at one time, can show traffic patterns at a single point in time throughout the city. It's the aggregation of those data that weren't reused for that purpose initially that can provide additional value that didn't exist before. So let's talk about value of data, information, and knowledge and the difference between those. So there's a difference when data become, when it becomes more processed and more refined, it gains value. So if we think about what happens when we are in imaging, for example, when we're making these observations, there's some natural phenomenon that then manifests itself in a patient that then can be imaged, and then we record that observation, and that becomes data, right? So that observation is the activity that creates that value of the data, but then the data can be processed further and can be interpreted, for example, and so that has some meaning. It becomes information that has greater value than just the data for some application, and then that can be fed back to then take action. So in this case, at the patient level, for example, where you have, this can give information about what's happening and give you the ability to make a diagnosis and potentially intervene. So, for example, you may have a patient that has atrial fibrillation leading to a thromboembolism that then is manifested in that patient as a left MCA thrombosis and infarction. We gather data through imaging data through CT and then make an interpretation and develop radiology report on top of that. Again, so that allows us to understand what's happening and allows us to intervene with the patient. So that's what happens at the individual patient level, but this can be aggregated, and so if we have many patients that are going through this process where you acquire data and information, then that information can be aggregated where you have real knowledge, more generalizable knowledge and generalizable tools. And so this is, each of these steps, there's value-added activity, right? And these generalizable tools can go back and even improve our ability to make observations, our ability to process that information, become processing tools. And so a lot of what we're seeing with AI is right there. That's where the value is to develop these value creation types of activities and tools. So the data, then, are the raw materials. And then the value-added activities, such as discovery, design, and development, create higher-level information and knowledge. So these value-added activities at the patient level, again, when we make the observation, when it's for the purpose of clinical care, is already compensated. That transaction has been accounted for. And same, potentially, when it's interpreted, if it's for the purpose of clinical care, although if it's used for labeling, that may be a different story. But then when it's used for research or development or other activities, that is additional value, that added value activity, and something that can, it creates value and can potentially be recognized and compensated separately. Okay, so that's the underlying basis. So we're gonna address a few specific questions before we wrap up. One question comes up, in terms of the role of third parties when it comes to data. So the hundreds, I think, now of AI-based companies that you'll see right up on the exhibit floor, they have access to capital, they have development resources, they have distribution channels, and so forth. And they do that with an expectation of a financial return. So this is a great thing, if we're going to bring this to market. But it does require industry participation, it does require that financial return. So the question is, who should be allowed access to the data, and how should we ensure that data are used appropriately, and then who should be allowed to profit? And so, again, we go back to that basic framework of this multi-party obligation to contribute to the common purpose. And so one of the critical points is that everyone who participates in this needs to act as a responsible data steward. And so that means that they take care of the data and they have the same ethical obligations that we providers have when we acquire the data. So we believe that it is ethical to share data with third parties if privacy is safeguarded, if the receiving organizations accept the role as a data steward, if they adhere to data use agreements, and if the receiving organization will not share further, at least not outside of any of those agreements that have been in place. Specifically around the concept of wide release of data. So the impact can be increased by making data widely available. And so at Stanford, we've had a number of data sets have been released widely. The NIH has, many others have as well. We firmly believe that it is ethical to widely release clinical data with some stipulations. That is, again, those who receive the data accept the role of the data steward. So it's not just throwing it out there without regard of who's receiving it, but we know who's receiving it and they agree to abide by the terms. They must safeguard privacy and they must not try to re-identify patients. We've heard from some of the major technology companies say, you know, we're actually with our new techniques, we can re-identify patients and we'll never fully get ahead of that. And so we say rather than, or in addition to protecting privacy, we say, well, it's an ethical obligation, don't do that. Even though you have the capability, you just agree, just like we don't do that, you now, these companies now have to participate in the same type of fiduciary responsibility. They need to identify themselves and agree to all terms and conditions, agree to respond to communications and agree to report any problems that come up so that they can address them. In terms of profit, one question that commonly comes up is should patients profit? Our position is no, they have already received the care and it brings in a whole host of incentives and potentially counter incentives problems that come into that, and we don't think that that's appropriate to bring into the care setting. Should provider organizations profit? And we believe that no, they should not profit when they share their data. We believe it's unethical to sell the data, especially for exclusive access to individual parties, but we do think it's reasonable to profit from value-added activities like we discussed earlier. Should industry partners profit? And again, we say it's unethical to buy and sell data per se, but again, it is okay to profit from value-added activities following ethical business practices. Again, I discussed this earlier, this question of protection of privacy. Protection of privacy is of paramount importance and we need redundant mechanisms to protect privacy. De-identification is not 100% reliable. All methods are imperfect, even if slightly imperfect, and identification technology continues to advance. So again, all who handle the data must be data stewards. So what we mean by data stewardship, again, is that those who participate have loyalty to patients to protect their interests and loyalty to society to use data for societal benefit. So again, all participants must commit to ensuring privacy protection, never seeking to re-identify data, ensuring data are only used in the aggregate form, ensuring data are only used for societal benefit, and ensuring knowledge derived from data are accurate, representative, and statistically sound. So again, these obligations, at every point, there is a duty to protect patient privacy. In terms of patient consent, the question comes up, should patients have the ability to consent for the use of their data? Based on the concept of respect for persons, individuals have the right to make their own choices about actions that affect them. We believe that, actually, in terms of secondary use of data, those who use the data are looking really through the data, if you think about it, to the underlying phenomena, not the patient themselves, really. So we believe that there is not a requirement to gain consent as long as privacy is safeguarded, data are aggregated, that there are mechanisms in place to ensure data are used appropriately, and that patients are informed of how data may be used. So in summary, data are recorded observations of physical properties, phenomena, or behaviors. Once images have been used for clinical care, the primary purpose has been fulfilled. We believe data should be treated as a public good, that it's unethical to buy or sell clinical data. We believe patient privacy is of paramount importance, and that all have the obligation to contribute to the common purpose of improving the quality and value of clinical care. So it's ethical for data to be shared after it's de-identified and aggregated, and all who participate in this must accept the obligations of data stewardship. Thank you very much. I'm going to try to give the next talk on the topic of how you form an academic-industrial partnership with that in mind. Now I must say that my talk is purely based on my academic experience, and less so on these guidelines, but I think that it will reflect most of the ethical considerations that David just mentioned. So I'm gonna be focusing on academic-industrial partnerships in AI, why, when, and how to do it. So let's first define what an academic-industrial partnership really is. It is about creating solutions and answering challenging clinical questions using shared high-technology resources from both academic institutions and medical industry with different resources. It's coming together between MDs and PhDs, engineers and computer scientists, and combining their means not only to be an added value, but to be really a symbiotic relationship. So one plus one results in more than two. Multidisciplinary approaches are usually driving those partnerships, and they're also hopefully driven by market needs. And those partnerships are usually defined by mid- to long-term partnerships, and not something that you do real quick in one project. So why have an academic-industrial partnership in the first place? And this is really true for all, not only for AI. In 2005, the NIH Roadmap for Medical Research and the FDA Critical Path Report strongly endorsed and advocated for sponsored AIP relationships, and the stated goals here were to have an accelerated translational research from bench to bedside. And I think in AI it is proper to say from code to bedside, to improve the R&D productivity and to improve patient outcomes. And the main benefit of such partnerships are that pre-existing uniquely close ties between clinical experts and vendors can be leveraged to really build something greater. Now, a national economic value of the AIPs is great, and the 1980 Milestone Bayh-Dole Act allowed academicians to patent discovery and license it to industry. That is important because that is a foundation for all the AIPs. So AIPs significantly contribute to the GDP and job creation. So the Bio Institute, an autumn study revealed that federally funded AIPs resulted in patients' licenses worth $1.3 trillion and created 4.2 million jobs. So that's encouraging. So what are the general principles when we engage in such partnerships? And Pellegrino and Donohue actually postulated that each side must overcome the cultural and communication divide that can impair those AIPs. And the guiding principles here should be a mutually understood understanding of each other's mission and the culture of each other's partners. Mutually understanding of the objectives, benefits, and timelines. The understanding each other's capabilities and legal frameworks, and that is important. I'll talk more about that. And the respect for each other's limitations, which really exist. So I always say there's going to be a consolidation of players in the field. But if you look at the number of companies that produce AI products, it's not the usual way where you have the big fish eat the small ones. This is actually a swarm of fish eating the big fishes. And I wanted to give you some considerations and pros and cons of what big corporations versus small companies offer. When you work in our approach by a big corporation, definitely a big pro is to have technical resources, large global customer networks and impact, to have extensive marketing and distribution channels, to have extensive cash reserves, which is really important, and also extensive experiences with regulatory processes. On the smaller companies, you have flexibility, rapid turnaround times, motivation, because members of those companies are mostly also shareholders, more individualized relationships and narrow product design, as well as willingness, in some cases, to part with equity. The cons for big corporations are complex internal bureaucracies, over-managed companies, sometimes inflexibility, and unwillingness to part with equity. And the cons in smaller companies are mentioned to be limited in resources and capital, inability to absorb large institutional overheads, and there can be quite substantial, and experience with academic industrial partnerships in the first place, and the longevity of those companies is really uncertain. So what divides us within those AIPs? What does industry think negatively about academia? So they think that we have a slow bureaucracy on our hands, that we lack productivity, and they are being just taken advantage of for money, that they will have a low return on investment, have to fight legal battles around our IP, and worst of all, bad word of mouth, bad recommendations. And the academicians, when they engage with industry, they usually think that we're being taken advantage of intellectually and financially, that we have an inappropriate use of raw data, that we encounter poor potential for scientific publications, and that being perceived as a bias or sold out to industry is obviously always a consideration. But at the end of the day, what are the motivations and needs in AI-based academic industrial partnerships? So the industry ultimately needs our expertise. They need raw data, access to it, access to imaging reports, pathology, genomics, clinical outcomes. They need annotated data with our expertise, prototype testing, prototype validation is where they come to us. So they do benefit from the co-development, and they do need to have our collaboration. Most importantly, from our academic perspective, we do need research funding, and that's quite honest to say. We wanna have first access to exciting new products, we wanna have an intellectual participation and interest in all these things, and we do want to be involved in research publications that involve such new products. We also wanna be involved in rapid transfer from innovation to product. Nothing is more satisfying as to take a product that we develop and see it being used in clinical scenarios. And then ultimately also commercialization in individual cases. Sometimes those partnerships can give us money to go via the startup route. So therefore, it is really important to avoid development in a black box, and there is a value of co-development. When you meet MDs on the one hand, when you meet engineers and computer scientists and companies with great ideas on the other hand, you wanna have an iterative exchange that drives you from idea towards the prototype validation and ultimately a product. So before we discuss specific scenarios, I wanted to briefly give you an overview over the grants and contracts that are applicable to AI partnerships, and there are multiple ways that you can fund such partnerships. First of all, the investigator-initiated industry-sponsored research, called the research grants. This is primarily where industry is giving you funding to pursue your investigator-defined research goals and questions. The investigator should remain independent regarding the outcome, and no predefined outcomes or deliverables should be involved, merely report to the funding source. Then there are the industry-sponsored contract research grants, where the AIP within the setting of a predefined goal is taking shape. Many of those are individualized and very site-dependent models. So first of all, you need to sign an NDA and individual consultancy agreements and define those specific goals. Then oftentimes data transfer or data use agreements are being involved, and then there are different layers of those partnerships. Sometimes it's just an unfunded research collaboration, for example, where a company gives you an instrument or a server. Sometimes those are sponsored research agreements where, for example, you have milestones and deliverables. And sometimes when it's a really large partnership with a large company, a master research agreement with exhibits is really meaningful. And then ultimately federally funded EIP grants where you apply for an NIH R1 grant that has an industry partner on board and this is a very small portion of the NIH funds but it does fund a lot of innovation. Ultimately build a compelling value proposition when you're working in academic industrial partnerships. Define the clinical problem and understand if it is worth solving and pursuing. AI cannot solve every problem although some people here say it can. Not every company, not every investigator has a good product or a good idea. Don't create solutions in the absence of problems. That's very important. Start with simple issues such as work list optimization. Start with issues where you have a lot of data. Start with problems that have insufficient data to begin with is essentially a recipe for disaster. Pursue blatant and critical clinical issues rather than blatant and desperational. So to sum it up, AI specific EIP scenarios really are important to know and I'm gonna be talking about a couple of them in a minute but mostly those ethical problems involve the use and transfer large amounts of data. The need for large amounts of high quality annotated and curated data but ultimately a large amount of data is not a panacea for a great project. And then ultimately another issue that you will encounter is the derived algorithms inherently contain data fingerprints and you need to be aware of that. And then patient hopefully does not become currency but sometimes patient data does and so this may result in some unresolved ethical issues and I think David elaborated that in great detail. So why then using an AIP? What is the incentive of a company to approach an industry, to approach an academic site? Because technically there are free databases like the TCA, the NIH or other institutions. So the pros are obviously simple access to this data, cost effectiveness, no contractual obligations, no IP and licensing that is being involved, no HIPAA, no IRB issues. You just take something that is online and you go with it. But then of course there is a problem with that. The data quality is often uncertain. It has, one can say a lower street cred. There is no walking back and looking deeper into this piece of data. Mostly no or a limited clinical data that you can use and this mostly happens outside of an AIP so the intellectual contribution of the sites is limited. Therefore and I love to quote Milton Friedman, there is no such thing as a free lunch. If you really want to have a deep partnership you do need to engage with an institution. And I'd like to walk you through three different scenarios when someone approaches you for an idea. And the first setting that I can describe is a company requests raw patient data for product development but academic institution is concerned about data use by for profit. Frequently encountered, more so in the past, we now have new strategies to deal with it. But the issues and questions to ask is for retrospective data, did the patients consent to such data use? Especially if it's a non-anonymized data. For prospective data, is it IRB approved? Is there a waiver in place? How will the data and algorithm be used after the completion of the project? Is there any data bias that is being involved? Is the anonymization of the data complete? What are the consequences if it's not? And then solutions are really consider onsite server infrastructures for example, federated learning and involve the IRB in legal terms extensively to avoid any substantial damage to the reputation and also privacy. Contractually regulate raw data use and also clarify who owns, and I know this is an imperfect term, who kind of uses the data and the resulting algorithm. Then the second scenario is really a company approaches you and now desires to acquire raw data for product development but does not desire to have a true co-development partnership. Essentially just a quid pro quo relationship. Well there's tons of issues regarding data ownership or data use and IRB similar to the scenario one, but what is the value of data here? What is the fair market value of whatever you share with them? What kind of data types are involved? Is the data use or data transfer agreement exclusive? What is the value of such a partnership for the academic institution? How do you benefit if someone just takes your data and disappears? Is monetization on anonymized data ethical? We think it is, but you still have to think how you feel about it. Solutions and precautions is here, it's an ethical dilemma and you have some guidelines now that you just heard how to proceed. My hunch is just to avoid such partnerships. The third scenario is you innovate on your own, develop algorithms for your institutional raw data and protect your IP, and now you approach a large industry company or an angel investor who promises you funding. Ask yourself is this startup route really the right one for you? Is development of your IP supported or co-funded by the institution? Do you have the right team and motivation? Are you as a PI at your institution permitted to license at all in the first place? And will your institution have shares? Mostly the answer is yes. Who owns the clinical data? And the solutions here is really approach your institution, look at your contract really carefully, and inquire about possible routes towards a patent, and verify institutional licensing processes before you really approach industry. And lastly, a topic that I'm going to be talking about is the fact that disruptive AI products really require a different approval process. So we all know and we've heard throughout all these meetings in the last couple of years that AI-based products may have profound impact on the delivery of patient care. AI may really overcome the gestalting in radiology and introduce truly quantitative medicine. And we believe that potential impact on care and outcomes is poorly understood at this of now. So the scenarios in which AI-based products really change clinical outcome are possible, even for simple products like workless prioritization. So conventional approaches and mechanisms for device approval may be insufficient. And I would like to learn from my collaboration with my medical oncology partners because they always have this four-phase clinical trial setup where they first test safety, then efficacy, and then ultimately bring it to the market with phase three trials and test it live. And this is where I would think we need to go about here as well. We need to focus on safety of the workflow, of efficacy and the added value of external validation, and then ultimately on outcome. And the FDA clearly had some very important thoughts. I mean, there is not only a voluntary certification process, but also a proposed regulatory framework for modifications to those algorithms that is being discussed in this paper. And I think this is some of their ideas, how they envision to develop AI, to deploy it in clinically, and then to monitor those models and then really evaluate the iterations constantly. And I think it's an important point to note that there are two layers of approval. First, the 510K approval for an initial set algorithm, which is then deployed clinically and then continuously re-evaluated and fed and re-evaluated again with data. And if you want to modify that algorithm and send in version two zero of your initial version, you have to go through the process again. And that is very important, unlike with other products where you just produce an update. Here we really have to undergo through the approval process again. And that requires meticulous documentation and that requires a focused FDA review of the changes to the algorithm. That's really making it somewhat complex. So another question that I just want to raise at the end of the day, what about reimbursement? We really don't know. I mean, there is a couple of products right now that are out there that are being reimbursed. But who will pay for AI? Really, it's right now really an unclear ethical dilemma. And I think this article that we accepted in Radiology Artificial Intelligence provides some good thoughts on this topic. So I'm going to leave you with a couple of final slides. Be aware of the dangers, hype, anxiety, and stereotypes about AI when you're dealing with real ethical issues in terms of monetization. Don't use misleading buzzwords and slogans like AI will replace radiologists, AI beats radiologists, or AI will increase revenue. All this stuff really is probably unethical in by itself. So I would not listen to articles written like that. So my bottom line is curb the AI enthusiasm, really. Just try to look at the data. Try to focus on the science. Try to go the ethical route and really expect what you really try to achieve with this partnership. And then ultimately, remember that those partnerships are mostly collaborative endeavors between MDs, and PhDs, engineers, and computer scientists. And then together, we're more than just the sum of the parts. Choose the right idea and identify the right piece of data. Be aware of local legal framework and stay on the safe side. Consult with your legal departments and your institutions and be aware of the AI hype. Don't over-promise and do not under-deliver. And that concludes my talk. Thanks for joining this session. I'm Yvonne Louis, and we're going to talk today about the intersection of privacy and medical imaging in this new era of AI. I have no disclosures. Let's start with a pop quiz question. I'm going to take you back in time. It's the 1980s, and we're in the radiology department. What is this big box, this question mark box, at the heart of the radiology department? And this, by the way, is a real plan of a department from that era. What was this thing located right in the middle? Is it a patient area? Is it where the radiologist sits? Is it a break room? That would be nice. Maybe it's a brand new, state-of-the-art, single-slice CT scanner from then. Or a film library? Or is it something else? It was something else. It was the dark room. And some of you might be thinking, oh, golly, what is that? What is that? That's where they used to develop the films and wet reads were called wet reads. Because the films were still wet from the developer and wash. So films got interpreted on alternators. Who remembers this? What even is that study? It's a pneumoencephalogram. Films got stored in these jackets, which incidentally I found on Amazon. You can still buy these, they're a dollar a piece. And I was super tempted, but I ended up not clicking on buy now because I just couldn't figure out what to buy. And these pencils, who knows what these things are? They're China grease pencils, which is how we marked films. So even though it was possible to, yes, lose or misplace a film or have neurosurgery take it for the operating room and return it some eons later and have some kind of breach in privacy, it just simply could not be, not nearly on the scale of what is possible today. Fast forward to 2021. Fast forward to 2021, medical images are digitized, stored and sent, downloaded all over the place. Here it is on my iPhone. They're in ads, they're on the internet. The image at the top is from the New York Times. I still don't understand why given a 50-50 chance, layout artists always put their radiology image upside down or backwards. The image on the bottom I love, it's obviously from some high quality data set like the Human Connectome Project, but it has this watermark on it. I don't know if you can see that says Science Photo Library and you have to pay to download the clean version. This on the right is what comes up when I Google normal CT. The confluence between advances in computer vision and AI and medical imaging poses new challenges in terms of patient privacy. In terms of personal medical information, at one end of the spectrum, you've got tissue samples, actual specimens, pathology specimens, which contain a person's entire genetic code. And after the best-selling book came out detailing the actual provenance of the scientifically very important HeLa cell line, the NIH made changes to research protocol that now requires what they call explicit consent for the use of any genomic data. And this, by the way, is an actual HeLa cell division. I mean, it's remarkable. You can see the mitosis happening, but that's for a different lecture. On the other end of the spectrum are these structured text data that make up the bulk of the EHR. Much easier to aggregate, also easier to de-identify. And these sorts of data are commonly shared by your very own health organization, probably, to all sorts of people, service providers, partner facilities, for many different tasks, things like quality assessment, analytics, and billing. Medical images have a particular combination of easy transferability on the one hand and highly personal information content on the other. And because of this combination, it's very revealing about weaknesses in our current policy and practice. And we will talk about some of the specific issues that they bring. Public Law 104-191. What is that? That is HIPAA, the Health Insurance Portability and Accountability Act of 1996. This adopts national standards for electronic healthcare transactions. And here we'll review some of the definitions that are central to HIPAA. PHI we know about. It's information relating to a person's health. PII is personal information. And then there's this thing called covered entities. And very few organization types are covered under cover entities. A health plan, a healthcare clearinghouse, and a healthcare provider. The reason is because HIPAA was originally envisioned to cover electronic healthcare billing and privacy of patient information in that process. And then there's this thing called covered entities. And this is a process. HIPAA as we know, defines 18 identifiers. And if you want to read the full law, I've included the link here. So when does HIPAA apply? HIPAA applies when we have all three of those things we just defined. PHI, PII, and a covered entity. HIPAA doesn't apply in actually many situations. Research is regulated by the IRB. Startups are regulated by the Federal Trade Commission. And medical devices are regulated by the FDA. That means your prospective research study, HIPAA doesn't apply. Aggregated health data, again, HIPAA doesn't apply. Fitness data, HIPAA doesn't apply. There is talk about whether we need to revise HIPAA. Do the 18 identifiers need to be expanded? Or do we need to entirely re-envision HIPAA and make it less constrictive? Do we need to redefine what counts as a covered entity? Maybe. Let's test our knowledge a little bit. Here are some steps that I took. Fitness data, we just said, is not subject to HIPAA. Even though there is personal info, because it can be tracked to my phone, there's no PHI, and Apple's not a covered entity. Okay, what about an app that you can turn your phone into something that monitors vital signs? Well, that's definitely PHI, but also PII, but again, not a covered entity. So what about this one, data about our volume? These are some data pre-COVID, and we look at things like volume routinely to do things like optimize staffing, calculate RVUs. Because we're a healthcare system, we certainly count as a covered entity, and this is protected health information, but it isn't personally identifiable. Holter monitor is prescribed by a doctor, info goes back to the EHR, and therefore is subject to HIPAA. Let's go back to this app. How do app providers get away with this? What they do is they have users sign a license agreement, and they effectively get your consent. So it's something like this. You scroll to the bottom, click, click, I agree. Scroll to the bottom, I accept, and hit next, and you get to use the app. So now if you're like most people I know, that's exactly what you do. You scroll to the bottom of this thing, and click, click, agree. On the one hand, we have these complicated terms and agreements, and on the other, we have to contend with the fact that the most popular apps are downloaded hundreds of millions of times every quarter. This is what I call the fallacy of consent. You've got terms and agreements that are more than 70,000 words. That's the length of the first Harry Potter book approximately, which we just got through with my middle child. But terms and agreements are a lot more boring probably. You've got billions of transactions a year. Do that multiplication, and each one of us would be reading these agreements for literally 76 straight days. So instead, because that's untenable, we do what actually any sensible person would do. We click, click, agree, and we spend less than one second on that page. So 999 people out of 1,000 spend less than one second on that page. Some academics have actually studied this and come to the obvious conclusion that these documents are too long, too hard to understand, or seemingly unimportant to take the time to read or give meaningful ascent. Another point I want to make is that the more consent and the more complex the consent that's required, the fewer data actually end up being included. And unfortunately for all science, and this includes machine learning, that that can introduce unintended biases. It's been shown that higher socioeconomic groups are more highly represented in research requiring complex consent. Let's switch gears a little bit from the fallacy of consent to the myth of anonymization. In medical imaging, it really isn't just these 18 HIPAA identifiers. We know that, right? We have all this DICOM metadata, the pixel level info, stuff that's burned into the image, biometry. The Mayo Clinic published this paper in the New England Journal showing that in 85% of cases, a standard facial recognition software was able to identify their research volunteers based on their MRI reconstructions. So not only is it possible, it's actually easy. Advances in technology allow for things like really easy surface rendering of medical images. You no longer need expensive licenses or separate workstations. Remember those? Vital, the Vital Workstation, I'm on it, you can't have it, the Vitria, the Leonardo. Now you perform this surface rendering on a free app on your phone. That's what this is on the left. It's something called FastSTL Viewer. You can download it right now. It's an app intended for 3D printing shown here on my colleague's Android mobile device. Even thick section images can take advantage of advanced post-processing methods such as slice interpolation to generate a facial rendering from 2D images. What about the DICOM metadata? Here's just one page from a multi-page document listing metadata belonging to a single image in a brain MRI. There are literally hundreds of fields. Some seem obvious like what they are and they need to be removed. Things called patient name and patient ID number. But then there are things embedded in the middle of all of these other parameters. Something like this, study instance UID that actually can be used as a unique identifier for this study. And there's no easy way to know that, except if you know that. And oh, by the way, every vendor is allowed to have their own unique fields. There are many de-identification tools out there. I've included a bunch here for your reference. They do different things. The RSN ACTP tool, that's the clinical trial processing tool deals with DICOM metadata. Many people use variants on the RSN ACTP tool because despite being endorsed by the NIH, some feel it could be more complete. There's a method to convert raw case-based MRI data to a vendor neutral format that is endorsed by the ISMRM. There are different types of skull stripping and defacing algorithms. And of course you're left with something very basic. You can actually manually check as we did and the NIH has done with imaging data sets that were released to the public. So obviously that's a huge undertaking. Using skull stripped images sounds great. It nicely removes identifiable facial features from the image. However, skull stripping can negatively impact the generalizability of models developed using those sorts of data. Here we tried to implement one of the top performing models from the 2017 MECI Brain Tumor Segmentation Challenge. And the results are excellent when data are pre-processed with skull stripping and registration. However, when the algorithm is deployed on a standard clinical MRI, the segmentation isn't nearly as good. And in real life, as you all know, we don't work with skull stripped and pre-processed data. What's in the future? As we mentioned, there's controversy over redefining HIPAA. We need to work with manufacturers to avoid placing identifiable information in proprietary DICOM fields. We need to optimize protocols to minimize the risks to data. We need validated and frankly better methods of de-identification. And we need to investigate safer means of data sharing, which I know are being covered in some of the other talks in this session. It will be a balance in the end. Any individual right now may think that the benefit goes to the users of data. Why should I allow them to access my data? But by advancing healthcare, if done well and done responsibly, the goal is really to improve the lives of individuals in society who are the very sources of these data so that it does come back in a positive way. We've come a long way since the view box and the China marker. There are new questions that arise because of advances in technology. But can technology also provide some of the answers to these questions? And in the Journal of Clinical Neuroscience, Kazim Bora says, we need to tap AI itself to ensure the safety of AI systems. So we're following that command, that vision. We're starting to try to do this. Here's an example of a deep learning, defacing segmentation algorithm that one of our students is working on that identifies and ultimately corrupts facial features. Many thanks to my colleagues who have given this topic considerable thought. If you're interested in reading more, I will direct your attention to this article we published in JACR. A lot of the material in this presentation is based on work covered in that paper. In particular, I would like to thank Charlotte Sider for her insight on legal issues and Art Kaplan, who is our bioethicist at NYU. And I'd also like to acknowledge funding for our students whose work I showed from the Moore Sloan Foundation. Thank you for your attention. I hope this has shed light on privacy issues we face in the era of AI in medical imaging.
Video Summary
The provided transcript discusses the ethical and practical considerations in using clinical data for AI, focusing on balancing data utility with privacy. Initially, an ethical framework was established to guide data use and sharing, emphasizing clinical data as a public good benefiting future patients. Ethical obligations for healthcare participants include respecting patient dignity, providing optimal care, and enabling continuous learning to improve healthcare quality.<br /><br />Key topics include data stewardship, where anyone handling the data must commit to ensuring privacy and using data for societal benefit. The concept of treating clinical data as a public good is highlighted, wherein data's primary value is fulfilled when used for clinical care, and secondary uses should focus on public benefit without commercialization of raw data. Concerns about re-identification risks and data monetization ethics are noted, advocating transparency and informed consent about data use.<br /><br />Additionally, the notion of academic-industrial partnerships was explored, emphasizing symbiotic collaborations that align with ethical principles. Challenges, like the myth of anonymization and the fallacy of consent, showcase the complexities of maintaining privacy while harnessing AI technologies, advocating for better de-identification methods and exploring balanced data-sharing practices guided by ethical AI deployment.
Keywords
clinical data
AI ethics
data privacy
public good
data stewardship
re-identification
informed consent
academic-industrial partnerships
ethical AI
RSNA.org
|
RSNA EdCentral
|
CME Repository
|
CME Gateway
Copyright © 2025 Radiological Society of North America
Terms of Use
|
Privacy Policy
|
Cookie Policy
×
Please select your language
1
English