-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpodcast-3.json
1 lines (1 loc) · 38.4 KB
/
podcast-3.json
1
{"podcast_details": {"podcast_title": "The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)", "episode_title": "Privacy vs Fairness in Computer Vision with Alice Xiang - #637", "episode_image": "https://megaphone.imgix.net/podcasts/35230150-ee98-11eb-ad1a-b38cbabcd053/image/TWIML_AI_Podcast_Official_Cover_Art_1400px.png?ixlib=rails-4.3.1&max-w=3000&max-h=3000&fit=crop&auto=format,compress", "episode_transcript": " All right, everyone. Welcome to another episode of the Twin Wall AI podcast. I'm your host, Sam Charrington. Today I'm joined by Alice Shum. Alice is a lead research scientist at Sony AI and global head of AI ethics at Sony Group Corporation. Before we get going, be sure to take a moment to hit that subscribe button wherever you're listening to today's show. Alice, welcome to the podcast. Thank you so much for having me today. I'm looking forward to digging into our conversation. We'll be talking about some of the work you'll be presenting at the CVPR conference, focused on some core tensions that you see around fairness and trust in the computer vision domain. But before we do that, I'd love to have you share a little bit about your background and how you came to work in the field of computer vision. You've got a unique background for the field. You're a lawyer. Yeah, sure. So my path in AI ethics actually started around nine years ago when I was first building my first commercial machine learning model. And this is kind of before a lot of this terminology around AI versus machine learning versus data science was a little bit more settled. And at the time, we didn't really have all of the work that we see today around fairness, accountability, and transparency in AI. And so when I was building this model and realized that there were a lot of concerns around possible skews in the data that we're using such that the model would probably not work as well for certain subgroups and demographics compared to others, there wasn't a whole literature for me to look at in terms of how to address these sorts of issues. There really wasn't even a terminology for this at the time that was now what we think of as algorithmic bias and is now this huge field of algorithmic fairness. And so that's for me what really motivated me to look not only on the technical side of this, but also at the legal and policy side, since it sort of felt like the Wild West back then for myself as a practitioner developing these models, not really having the guidance of how to ensure that they manifested more fair and ethical properties. And so that's what kind of motivated me to also explore these issues from more of a legal and policy landscape and then kind of looping back around to the research space again. And in my role at Sony, I actually have two hats on. So one is leading our research lab around AI ethics, which focuses on these issues of fairness, transparency, and accountability, and in particular on questions of ethical data collection and bias mitigation techniques. And then my second hat is leading AI governance initiatives across Sony Group. So that includes our compliance, educational, and policy initiatives around AI ethics and working with our business units to operationalize our AI ethics principles. Fantastic. Talk a little bit about some of the ways AI is used at Sony and some of the work that you get called on to work on there. Yeah, sure. So Sony is a very exciting company to work at in terms of AI ethics related matters because it is just such a diverse company. So it's very diverse in terms of just the global reach and the types of business units that we have. So we are an entertainment technology company. So if we think of the major business units within Sony, we have our electronics business units that produce everything from cameras to headphones to robotics. And then we have our music company, our Sony Pictures entertainment company, our PlayStation as well. So there's all these different areas where we are exploring AI technologies at the intersection of entertainment and human creativity. We also have a major financial company in Japan as well. So these are all domains that intersect with AI in very different ways with kind of varying levels of risk and varying types of ethics issues that manifest. But I think it's very exciting that the work that we're doing is very clearly aligned with the overall goal of Sony, which is basically to fill the world with emotion to help augment human creativity and AI ethics is a key component of that. Can you talk about how your background in law as well as statistics as well as economics, like to do all of those things express themselves and your work there equally? Are you more focused on one or another of those modalities, if you will? Yeah, great question. So first, maybe starting on the research and AI ethics is fundamentally interdisciplinary pursuit. And so it's been quite helpful having the diverse background that I do. If we think about especially AI ethics becoming more developed as a space, it's sort of going from this Wild West that I described before to companies trying to come up with practices, researchers coming up with techniques to now we're getting to a point of more formal regulation coming down the pike as well. And so having that grounding in the legal space, I think is quite important in that it's important that any sort of AI ethics solution we have is at minimum legally compliant and ideally goes above that. And it sounds very trivial to say that your AI ethics solution should be at least legally compliant. But a lot of my work has explored the ways in which a lot of technical solutions might actually run up against legal barriers. So that's both the case in terms of if you're trying to say, okay, let's do ethical data collection where we collect tons of data of all sorts of diverse people, we have all these labels of people's sensitive attributes so that we can check for diversity and things like that. Well, that's going to very quickly run up against a lot of privacy issues because it's not quite so straightforward to say you're going to do this sort of mass scale sensitive data collection. Similarly, when it comes to issues of bias mitigation, this is an area where there has been a lot of legal debate already on what counts as fairness and to what extent can you actually proactively change things to make empirical distributions fair. So I have some work analyzing this in the anti-discrimination law context and specifically a lot of the debates we've had around affirmative action in the US and how that might shape some of the conversations we're having in the algorithmic fairness front of what counts as fairness there. So these are just some examples where that intersection is quite important because if you only know the technical side of things, a lot of the solutions that you propose might not actually be implementable in practice. It might actually run contrary to existing legal doctrines. You elaborate on the connection you just mentioned between affirmative action policy and algorithmic ethics policy. Yeah, sure. So see, I have a paper on this a few years ago called Reconciling Legal and Technical Approaches to Algorithmic Bias, and basically it's charting the course of how we've seen this evolution in anti-discrimination law in the US away from what we call anti-subordination, which is the idea that anti-discrimination law should actively seek to dismantle existing societal hierarchies that lead to discrimination, instead with the Supreme Court becoming increasingly having a bit more of a conservative bent and going towards what's called anti-classification, which is the idea that anti-discrimination law should basically be colorblind and the goal should be to have as much as possible race neutral, gender neutral policies that don't really take into account those dimensions. And the kind of compromise we've seen on the legal front, especially if we look at affirmative action debates in the higher education space, is that the court has been very negative towards any attempts at affirmative action that involve formal quantification of the types of boosts or advantages that might be provided to different groups, but you can still consider sensitive attributes as long as you don't really quantify it or don't really formalize what exactly is going on. Now, this becomes a huge problem when we talk about the algorithmic fairness context because you can't really operate without that quantification there. And when we reduce to its core a lot of the algorithmic fairness techniques that have been proposed in the literature, they have basically involved ideas of point systems or quotas or other sorts of methods to try to rebalance what's happening in terms of the outputs from different models. And the algorithmic fairness context arguably provides a very strong notion of why this matters because we understand that the data we have is fundamentally biased and the only way to correct that is to make note and acknowledge the types of biases that are in it. And the only way to do that is to have access to information about the sensitive attributes and then to actively counteract those biases in an empirical way. And so that's basically what folks have been exploring from algorithmic fairness technical lens to the extent it is eventually litigated would create some challenges from a legal perspective of, well, are you actually allowed to make those modifications given that they are race or gender conscious? Very, very interesting. The work that you're presenting at CVPR, you are involved in a couple of workshops there, but those presentations kind of revolve around some research that you've published called Being Seen Versus Miss Seen, which really digs into tensions between privacy and fairness and computer vision. Elaborate a little bit on that kind of core tension that you're calling out in this paper. Yeah, sure. So basically, one of the major goals of this paper was from a practitioner perspective in the AI ethics space, what makes things so difficult is not just that we want fairness, we want privacy, it's very easy to kind of list these ethical desiderata and say, yes, please do this as much as you can. The problem is actually in practice, some of these things directly conflict or intention with each other. And so then in the absence of more guidance, either from industry best practices or from regulation, it's really hard to say what we should do in practice. So a few years ago, and I was at the Partnership on AI, we actually did an interview study of algorithmic fairness practitioners to try to see what are the challenges they're facing in practice to doing algorithmic fairness work. And one of the major aspects that came from that was the challenges of actually accessing data that you'd need for fairness purposes. And for that particular study, a lot of the challenges were around sensitive attribute data. So if the mandate is we want to make sure this model works equally well for men and women and is not systematically downgrading women, how do we do that if we have no idea who a woman or man is in the data, for example? So that kind of set me on this path of, oh, wow, for practitioners, this data availability issue is quite key. And that issue of sensitive attribute data is kind of a problem throughout, not just computer vision, but tabular data and NLP and other areas as well. But then when I joined Sony and started working a lot more in the computer vision space as well, I realized that it's not just a problem of sensitive attribute data. It's a problem of ethically sourced diverse data in general. So we're all very familiar with the idea of garbage in, garbage out. And a lot of the foundational gender shades work, for example, in the computer vision space showed that a lot of the biases we see with models not performing well for minorities and for women stems specifically from a lack of representation in the training and evaluation sets for these models. So if we look at the major computer vision data sets that are used very commonly in research, they're all extremely skewed towards Caucasian men. And so it's no surprise then that if, for example, you're training a model to recognize whether there's a face or not a face in the image, that it would have a harder time recognizing the faces of people who are not Caucasian men. So since then, that the gender shades paper was in 2018. So since then, there's been a lot of push for people need to consider more representation diversity in the data used. And I think because that sounds like a very simple solution, people assume, okay, well, people must be working on that, it must be solved. But when you actually kind of look at it from the standpoint of what is the status quo, the status quo is quite bleak in terms of, you know, in that five year period, we haven't seen like the emergence of a ton of new, like really great data sets that address this problem. And we still see these issues of bias in computer vision models. And then if we look at why do people not collect more of this diverse data, that's when we start seeing these tensions with privacy law and the incentives really being misaligned for folks to actually do anything about this problem. What kind of jumped out at me is this core tension when we're talking about from a privacy perspective versus the desire to have more representative data sets is that particularly for the groups that are underrepresented in those data sets already, they don't necessarily want to be more represented in the data sets. And so you kind of refer to that and call that out as like seen versus miss seen, dig into that particular, you know, wording and what that means. Yeah, sure. So I use the concept of seen versus unseen to really reflect how data privacy law has thought about these issues. And part of this really stems from the fact that we don't really have AI laws for the most part, quite yet, we have a lot of data privacy laws that are now being applied to the AI space. And if we think of data privacy, really the harm there is the idea of your data being used in a way that you haven't explicitly consented to or you not being able to sort of track how your data is being used. And today it's kind of how we've thought about the harms or the benefits around data. And so right now, that's the primary protection people have in this space when we're thinking about people's data being used for the development of AI models. Now the fairness element adds this interesting dimension in that suddenly there's not just this harm of, oh, your data is being used or your data isn't being used. Now there's this additional potential harm of you either using or having AI models used on you that might or might not be very accurate for you or people like you. So we've seen a lot of harms from this in terms of wrongful arrests, in terms of products having offensive outputs because they have mislabeled people. So now there's this new category of harms that isn't really protected from a legal perspective. So for example, in certain cases you might be able to rely on, so there's no specific legal protection at the moment for being misseen in this way, even though that with the growth of human-centered computer vision technologies in everyday life, this is something that is increasingly happening. In certain areas you might be able to rely on product liability law or on existing protections against wrongful arrests or areas like that. But there's nothing specifically that makes or enforces companies to ensure that their products are not biased against particular subgroups. So that then creates this asymmetry between the protections that we have against being unseen, which privacy law very actively protects versus being misseen, which we don't really have specific legal protections around. And so this in the abstract might be fine, except insofar as in the computer vision space, there is this trade-off to some extent between mitigating, between preventing folks from being unseen versus being misseen. And what I mean by that is kind of going back to this idea of diversity, representation and size of data. If the goal really is that we need to collect these huge data sets that as much as possible represent the global population, how do we do that in a context where we aren't creating a lot of problems from a privacy law perspective? And this is something that is not necessarily unsolvable problem. That's why a lot of the paper does go into potential solutions here in terms of trying to have companies work through third parties for data collection, to develop closer relationships with communities for data collection, to have more of a right around preventing being misseen. So these are some of the areas that we might be able to improve upon this issue. But at the moment, we are kind of left with this tension as practitioners without much guidance in terms of how to resolve it. So let me see if I can play that back to make sure I understand it. Privacy law essentially, from a privacy perspective, privacy law essentially rewards unseenness and penalizes seenness. Like that's its measure. But this thing that's rewarded from a privacy perspective being unseen is essentially what creates the problem of being misseen from a bias perspective. And there's no kind of counterbalanced incentive system. That whole legal framework hasn't really developed just yet beyond issues of product liability and other things. So that's kind of the tension between these things. And I think the sounds like the key takeaway is, in spite of the fact that at least early approaches to algorithmic bias and ethics, there's a desire to just solve an optimization problem. Like you're fundamentally saying, there's no optimization problem here. These are tradeoffs and compromises, and we need to figure out frameworks for managing those tradeoffs. Yeah, exactly. And I think it's easy to say, okay, well, basically, in that case, what needs to happen is we just need to kind of start from scratch with data collection and pay as much money as possible to ensure folks get into this data set. I think there's a lot of open questions there in terms of how can data collection of sensitive or biometric information be done in a way that is in collaboration and is not exploitative? And how do we deal with the fact that with these models as well, a lot of the goal is to have very naturalistic imagery that is similar to what the model would perceive in practice. So basically, if we're thinking about self-driving cars, for example, it's very important for self-driving cars to be able to detect pedestrians. And the type of imagery that a self-driving car is going to perceive is most likely what you would see kind of just driving down the road. And if the goal is that they need to be able to detect pedestrians, then you need to have like a large data set of imagery that's similar to what would be perceived kind of driving down the road, but also with people there so that you can then label those as pedestrians and train the model to learn to avoid hitting the pedestrians. So then, you know, that instantly creates all of these questions of, well, so if I have a self-driving car that's going down the road and collecting all these images and videos of people, well, clearly there's no way to really get their consent in these sorts of contexts because, you know, I can't stop the car every time, have people like sign a waiver, explain everything to them, and then make sure they're paid and then get back in the car. And even if I do that, it's probably going to make things look really staged because at that point, people are just actors. They're not really kind of what you would see in practice as a self-driving car actually in the real world. And so there's that fundamental tension as well of all these ways in which we hope AI can actually interact with the real world. If we actually want to collect the kind of data that would really enable it to do that effectively, there's a realism gap between what's easy to collect from sort of a privacy preserving perspective versus what might be necessary to ensure that this actually works well in a variety of contexts for a variety of people. Yeah. So essentially saying that the kind of opt-in data collection schemes that some propose as a solution to the privacy problem aren't really tenable because the models fundamentally want to be trained on data that looks like surveillance data, you know, that is surveillance data if we kind of separate ourselves from the connotations of that. It's data captured kind of impromptu, using the same devices often from the same perspective as how that data will be used. Yeah. And to be clear, I'm not saying that the goal should be that we kind of give a carte blanche to companies to collect surveillance data. The point is just to point out this tension. Unless we resolve it, we're in a very bad situation where the incentives are quite misaligned in terms of privacy laws quite different in different places. So the incentive is for companies to consider that in terms of where they go and collect data and then that creates additional fairness issues in terms of certain jurisdictions being represented, certain ones not being represented. Or alternatively, if companies are completely really trying to basically optimize for privacy as much as possible, then these sorts of concerns we might have about diversity, realism, or fairness kind of get basically thrown off to the side. And so in order for us to make progress on this, we need to acknowledge this fundamental tension and try to find solutions that don't either go completely in one direction or in the other direction. And have you found kind of promising attempts at reconciling this tension either in policy or other cases that you've come across? So I would say that it's still not necessarily a lot of great solutions yet. So in the paper, I talk a lot about potential solutions. I think pursued more and with more careful thought, they could become promising solutions. So for example, to the extent that this problem can be at least addressed to some extent by throwing a lot more resources on this problem and ensuring that we really have ways of collecting data that truly is with informed consent and provides appropriate compensation for individuals and where they have appropriate control over how their data is being used. It's something that doesn't really currently exist, but theoretically with enough investment from companies, from civil society, from governments, we could try to imagine a different data regime like that that is able to collect larger scale data that might still suffer from this realism problem but at least would be better from the fairness and the privacy perspective. So there's nothing I think that solves completely for all of these, but that's one version of solutions. The second dimension is one of this incentive front of things. If at least there is more of an incentive to prevent being misseen, if we have a legal right to that, then that will automatically force practitioners to try to balance these two dimensions a little bit more. Not that they have great solutions at the moment, but if incentives are aligned, there's at least more potential for good solutions to emerge in this area. And then the final consideration is kind of digging a little bit more in terms of what we mean by surveillance in this context, because I think there's often this misperception that any sort of ingestion of data by a machine learning model is equivalent. And there's different kinds of data ingestion there. So if data is just being ingested purely for training a base model, for example, then that means the data is being used to train the model on how to do basic tasks like perceive different categories of objects or people, and that doesn't necessarily mean that this model will then be used to surveil those same individuals. The surveillance component typically requires actually being involved in some sort of reference set of identified individuals that are then compared with individuals perceived in deployment. And so if we take, for example, the facial recognition context, if your image is used in a facial recognition data set for training purposes, basically that will help teach a model how to perceive differences between different types of people, but does not necessarily mean that that facial recognition model will then be used to surveil you in particular. Whereas if you are included in a reference set, then that does mean that when the facial recognition model is deployed, that your face in the reference set will be compared with the deployment data, and if you are then perceived, then a match will be made. So that's a little bit of a difference there in terms of where we want to be more careful. So from a privacy perspective, we should be a lot more careful in terms of where this technology is deployed, who it can perceive in deployment, and who can be included in a reference set. And that's much higher risk than just being included in the training set. Is that always a mutually exclusive distinction, being included in thinking of the example where someone is included in some public data set, and that public data set is being used to train a model, and then some other organization, a law enforcement agency, for example, takes that public data set and does other things with it? From a data science perspective, there's all kinds of data leakage and reasons why you wouldn't necessarily want to do that, but I don't know that that necessarily translates to, if not practice, the concerns of the people that are in the data set themselves. Yeah, great question. So yeah, I would say for one, so those things are not mutually exclusive, which is part of that, but it's important to acknowledge the distinction and see to what extent the data set is reused for a different purpose than originally intended. But there is one additional element of that. So part of the question is whether the data is identified or not, so this is also a distinction that is not necessarily always acknowledged right now within existing privacy laws. So if you have an image that is also connected with your name, for example, then that's very useful for a reference set, because then if the technology perceives you in deployment, it can say, oh, this is an image of Alice, because I have in my reference set an image of Alice, and then I see someone who looks like Alice. Whereas if it's just my image was used in training and it's not identified, and then that image is also not used in the reference set, then those are kind of distinct things at that point. Of course, if my image is taken from that training set, put into the reference set, and then connected with my name, then that's where the problem would emerge. So fundamentally, you're trying to articulate the different privacy dynamics associated with being in training data set upon which a foundation model is trained versus or having your data used as part of like a driver's license database or something like that, that's fundamentally tied to your identity. Yeah, exactly. And I'm very careful with any possible solutions in this space, because there's no way that really is the silver bullet where I would say, yeah, this is great in every dimension. But one other area to try where you could try to reconcile these things would be to say, if an image is only being used for training for a foundation model, and is not being used anywhere else, it's not super clear that there's a super strong privacy harm to that. But that could substantially improve the fairness and accuracy of the model. But there should be a lot more protection around being included in things like driver's license databases or other databases, where you can then be identified. And I think one of the challenging things with this as well, and I think a lot of folks after a while have a bit of privacy nihilism in part because it's very common now for people to just put images of themselves with their names online. And with kind of existing technologies, it's quite easy with an unlabeled image to re-identify who someone is. And so then the question becomes, how do we kind of draw these sorts of guardrails such that not everything becomes murky and by agreeing to one thing, suddenly the data is being used for everything else. But this is where I think we need more basically legal guardrails to ensure that this sort of leakage, that we're protecting basically the areas where there are high privacy risks and in areas where there are lower privacy risks, there's more guidance in terms of how that should be handled. Are these conversations happening to try to kind of address these tensions that you're speaking to, certainly at conferences like the one that you'll be presenting at and that other academic conferences around fairness, bias ethics, all those things. But in terms of policymaking venues and other places, where do you see the conversation happening? Yeah, that's a great question. So part of the motivation for this work actually is my concern that a lot of these conversations are sort of happening separately at the moment. So we are starting to see with the EU AI Act and also regulatory activity in the US, a lot more interest in regulating AI. And of course, that's a very important step, as I discussed in terms of this idea of actually creating incentives around preventing being misseen. Hopefully those forthcoming regulations can help a bit with that. At the same time, though, a lot of these existing regulations kind of assume that we can just sort of pile on top another layer of requirements without addressing the ways in which these new requirements might create tensions or conflicts with existing requirements. And so part of the hope of this work is to point that out of, okay, we do have in the US and EU and some other countries as well, quite strong data privacy regulations at the moment. And it's great that there's more consideration now and more discussion and debate now about how to appropriately regulate AI and within that hopefully address some of these issues of fairness and accuracy as well. But if we don't address these tensions head on, then we might be in a situation where the incentives are quite misaligned in terms of the ways in which people might address these issues might be quite suboptimal. And I think this kind of goes back to the discussion earlier about kind of the pros and cons of the different solutions in the space, because there's no solution that clearly solves for every aspect. What will naturally happen is whatever solution is picked will evolve sacrificing some component. And I think it's better to have that discussion through actually policy debates in terms of is that actually the part that we want to sacrifice versus it just sort of happening on an ad hoc basis when folks are trying to implement these policies and practice. What do you think things are left for the practitioner? Do they just kind of need to sit in the sidelines and wait for policymakers to hash this out? You know, potentially, of course, being involved in some of those conversations? Or are there things that you think recognizing these tensions would lead the responsible practitioner to implement in, you know, as they approach these types of issues? Great question. I'd say that I think a very important area for practitioners and researchers to work on now while we're still waiting for more regulatory guidance is how do we actually operationalize ethical data collection and practice considering all of these like very challenging dimensions? The computer vision space in particular has historically had pretty bad practices in this area. For a long time in computer vision, the standard was just kind of web scraping data. And especially now as we see a lot of generative AI technologies emerge, that problem has only been further exacerbated where it's not just web scrape data. It's like as much as possible the entire internet that a lot of these generative AI models are ingesting. And of course, if we're thinking about that as the training data, that's a very, very low bar where there's no privacy that has been considered of the people in the images being scraped. There's not sufficient consideration of IP and artist rights. So we're at a point right now where I think there's some growing realization that these sorts of practices are inappropriate, even though they're still quite ubiquitous. And so I think the question is how do we push this forward such that people are actually doing things in a better way? And of course, regulation will help affect some of the incentives there. But even with the existing regulations, that's not necessarily changing how exactly these large models are being developed. And so I think there's a lot of room in terms of, okay, if we are going to try to sit down and say that we need to have appropriate consent, we need to have appropriate representation and diversity, we need to have appropriate compensation for individuals, how do we structure that in a way that, of course, it's never going to be perfect, but at least pushes us beyond some of the really problematic historical practices that have basically helped this field develop. Awesome. Beyond that, which is substantial, are there any particular research directions that you're excited about or looking forward to seeing develop? Yeah, so certainly that's kind of a huge can of worms to tackle. But yeah, in addition to that, my team is also looking more broadly at ways in which we can do fairness evaluation and bias mitigation that helps address some of the legal considerations that I mentioned earlier as well. So for example, one challenge that practitioners often have in practice is it sounds very easy to tell someone to go collect diverse data or to check if your model is fair. But in practice, that's actually quite difficult because let's say you collect data set that is very appropriately sourced, but if you don't have any labels for people's demographic attributes, how are you supposed to answer the question of if your data set is diverse? And then in terms of a fairness evaluation, how are you supposed to answer a question of does your model perform better for this group versus that group if you have no idea which group people are in? So part of this again is a regulatory question of should companies be allowed to collect sensitive attribute data for the purposes of fairness evaluation? But we're also looking into other possibilities in this space as well. So we had a recent paper at iClear, for example, which was examining to what extent can we come up with diversity or similarity metrics that are not about specific labels, but are just more generally based on what people would perceive to be similarity or difference in human faces. So basically for that paper, we trained a model on a bunch of judgments that people made where they saw images of three people and were asked like, which person is more dissimilar than the other two. So they were never asked to label the person based on gender, race, or any other demographic attribute. They were just asked about similarity versus difference. And so now we have based on those similarity judgments, a way to assign a similarity slash diversity score to a large group of images that is more along the lines of this group of people looks more similar to each other than this group of people, but without specific labels. So it doesn't solve for everything in that. Of course, in certain cases, you do care specifically about particular types of biases and mitigating those. But at least this gives us a way in context with unlabeled data sets to get a sense of whether one data set is better than another in terms of diversity and hopefully enable more checks along those lines. So that's just kind of one example where we're trying to figure out some solutions that might be able to accommodate this complicated space we're in where there's really kind of no singular right answer. Yeah, awesome. Well, Alice, thanks so much for taking the time to share a bit about your perspective on this and the research you've done into the fields. Really interesting stuff. Yeah, thank you so much for having me and yeah, really enjoyed this discussion. Thank you. Alright everyone, that's our show for today. To learn more about today's guest or the topics mentioned in this interview, visit twimbleai.com. Of course, if you like what you hear on the podcast, please subscribe, rate and review the show on your favorite podcatcher. Thanks so much for listening and catch you next time."}, "podcast_summary": "Alice Shum, lead research scientist at Sony AI and global head of AI ethics at Sony Group Corporation, recently discussed the core tensions around fairness and trust in the computer vision domain in an episode of the Twin Bull AI podcast. Alice's research explores the challenges of addressing algorithmic bias and creating more ethical and fair AI models. She highlights the need for diverse and representative data sets in training computer vision models, but also acknowledges the tension between privacy and fairness. Current data privacy laws protect against being unseen, but not being misseen, which is the risk of AI models being inaccurate or biased for certain subgroups. Alice suggests potential solutions, including robust data collection practices that prioritize informed consent and compensation for individuals, as well as legal guardrails to protect against the misuse of data. She also discusses the importance of operationalizing ethical data collection and evaluation practices to address fairness concerns. While there are no simple solutions, Alice emphasizes the need for interdisciplinary collaboration and policy discussions to navigate these tensions and find a balance between fairness, privacy, and accuracy in AI systems.", "podcast_guest": {"name": "Alice Shum", "org": "Sony AI", "title": "", "summary": "Not Available"}, "podcast_highlights": "- Highlight 1 of the podcast: \"The problem is actually in practice, some of these things directly conflict or intention with each other.\"\n- Highlight 2 of the podcast: \"We need to acknowledge this fundamental tension and try to find solutions that don't either go completely in one direction or in the other direction.\"\n- Highlight 3 of the podcast: \"There's no solution that clearly solves for every aspect. What will naturally happen is whatever solution is picked will evolve sacrificing some component.\"\n- Highlight 4 of the podcast: \"We need more basically legal guardrails to ensure that this sort of leakage, that we're protecting basically the areas where there are high privacy risks and in areas where there are lower privacy risks, there's more guidance in terms of how that should be handled.\"\n- Highlight 5 of the podcast: \"If we don't address these tensions head on, then we might be in a situation where the incentives are quite misaligned in terms of the ways in which people might address these issues might be quite suboptimal.\""}