Ethics and proper application of AI from a data scientist point of view

By Ahmed Zewain, Prayson Wilfred Daniel

Listen on Spotify

Watch on YouTube

Welcome to our eleventh episode of AI Watch!

In this episode Ahmed Zewain, lead product data scientist at 2021.AI, interviews guest speaker Prayson Wilfred Daniel, principal data scientist at NTT Data. Ahmed and Prayson dive deep into the ethical considerations in machine learning development. If you would like to gain actionable insights and steps for implementing ethical AI in your projects, this is the episode you’ve been waiting for.

Haven’t got time to watch the entire video? No problem; we’ve compiled a list of key highlights for you.

  1. Understand the overarching business objectives before diving into machine learning projects.
  2. Prioritize traceability and accountability for data.
  3. Assess the ethical implications of using sensitive data attributes, employing techniques like differential privacy when necessary.
  4. Include stakeholders in the entire machine learning development process, fostering transparency and accountability.
  5. Clearly communicate model limitations and potential biases through disclaimers, encouraging an open conversation with end-users.

Getting started

Ahmed: Hello everyone, and welcome to this episode of AI Watch. My name is Ahmed. I am lead product data scientist at 2021.AI. Today I have with me Prayson, who’s the principal data scientist at NTT Data, and we are going to touch upon the ethical considerations in machine learning development and cycles. Welcome.

Prayson: Yes, thank you. I have this joy of also leading a group of brilliant minds, data scientists, machine learning engineers and developers, where we help our clients in solving their machine learning problems. So that’s very short about who I am, and what I do.

Ethics and ML development

Ahmed: Back in the day, machine learning products were not always making it into the final solution, or the cases were quite simple. But now we’re seeing machine learning solutions coming into decision making, and the public is more aware of them, meaning ethical considerations are becoming a topic, and very soon also law.

What is your take on the ethical considerations when developing machine learning solutions? How does that journey usually start for you?

Prayson: Yeah, so I think I will go wider.

So I consider myself lucky because I have a background in philosophy and theology. So when you approach machine learning and when we start building things, we build it in a different way, because I have another package which I can bring into life.

So for me, whenever technology comes into intersection with humanity or with people, you cannot avoid the question of ethics. This is not just in machine learning. This could be with what we have seen in medical devices. So when medical companies make their products, they have to go through an extensive review.

Data science and machine learning is going through the same process because I think it’s becoming closer to the people than it was before.

To answer your question, like, when we develop machine learning, what do we usually take into consideration?

So, please, we usually ask a question, what is it we’re trying to do? The client will come and say, hey, we are trying to optimize this process, and somehow this process includes some sensitive data.

So the first question we need to ask, even before writing out the first line of code, is to understand the business goal. That’s number one.

And then we asked, do we have data to answer this business goal? And from that stage, the ethics play into place.

Yeah, because we say, okay, we’re going to receive data. So question number one, who’s going to see this data? Are they allowed to see this data? So they…

Ahmed: Am I even allowed to see this data?

Prayson: Am I allowed to see this data? We are not even talking about the content of the data. So here we are trying to get the traceability of who is responsible.

We tend to have this kind of data a lot. We said we keep it safe, but somebody downloaded it and pulled it in his Jupyter notebook, did some analysis, and never deleted the Jupyter notebook. You say, oh, we don’t have the data, but the data is actually there.

Ahmed: Yeah. With no traceability around that.

Prayson: No traceability. So all this brings accountability and traceability of who has data.

And then the second part is we usually ask, what does the data contain? What kind of algorithms are we going to use?

We usually start by building models that will first and foremost try to avoid these sensitive or protected attributes. They’re something like our gender, our religious preference, our sexuality, our location and political opinions. Everything that we call this attributes to the need to be protected because they can be abused or misused.

Do we have those ones and how are we going to play with them? Then here comes like, what do you do? In some cases, we usually use some technique which will make sure that they will overcome these problems.

So for example, if I have age, I want to remove this age, but substitute with another age, but introducing this substitution as a noise that I know I can remove.

Ahmed: This is linked to differential privacy.

Prayson: Exactly. That was the word I wanted. Differential privacy.

Ahmed: Are you testing me?

Prayson: Good that you’re here because it’s differential privacy.

From policy to practice

Ahmed: Interesting because for me this has been introduced as a theoretical topic in school, obviously, and also it doesn’t always come into play. So I’m also interested in how you’ve actually implemented it, because sometimes it’s easier just to remove that attribute if you know you’re not going to be able to use it or other simpler techniques. Differential privacy, that’s all the parameters that are difficult to play with. It’s really interesting.

Prayson: It’s very interesting because sometimes if you remove some variables, then suddenly something does not make sense.

After we build this model, then we have to ensure, okay, can we explain what happened when these attributes are included? And then here we usually do something called counterfactual testing. So we will switch gender to see, does it affect the model, how does it affect, we switch location.

Ahmed: It’s going to reveal the bias if it’s there or not.

Prayson: Sometimes we kind of need the biases. If you sometimes remove these biases, then your model is useless.

Ahmed: Can see it doesn’t reflect the real world.

Prayson: It doesn’t work in the real world. I usually give this example that when we’re buying a house, I build my model, which has to give me what’s the best price, how can I do the bargain.

But one of my features was the number of foreigners in the location. And I’m a foreigner, right?

Ahmed: I completely understand.

Prayson: But I knew that the number of foreigners, which is a very protected attribute, will affect the price. So were I to remove this attribute, then I’m cheating myself.

Ahmed: Transparent and traceability generally, right? It doesn’t mean just shut down. It should be a conversation.

Prayson: It should just be a conversation. It’s there. And also it gives. Because I was watching this Netflix documentary, she was dark-skinned with African heritage in, I think, England. And she went for a facial recognition tool and it didn’t recognize her because of her skin tone. And then she went all on fire, which is correct.

But what if this facial recognition had a big disclaimer that says, we have trained our model mostly on non-dark skin tones. So if you have dark skin—

Ahmed: Please don’t be angry.

Prayson: —please know that there’s a limitation of this device. Do you think that will escalate to that level? I don’t think so.

Ahmed: Be honest that this is not your intention.

Prayson: Exactly.

Ahmed: Otherwise your intention could very well be what she’s saying. But if you’ve read a disclaimer, listen, it’s just a limit of data. I don’t have that, then say that, and then she might be more understanding and give maybe a way of giving input. Maybe you need my skin tone, then please take more photos of me.

That’s an interesting argument because it’s a very simple one, and I’ve seen that even in the Assassin’s Creed game that I like to play. It says that this game was developed by a very diverse team.

Why is that important? I don’t know, but it made something, as a user for me. It’s like, okay, I definitely know they don’t have whatever stereotypes I might eventually put in them.

Prayson: And one thing that usually scares me the most is the comment, developers are building tools that they themselves do not know how they work. This leads to laws being made by politicians who might not really understand everything. And because now I usually say, if we make decisions based on fear, then we are most likely to cut our own feet.

There are so many things I don’t understand, but yet I believe in them. I don’t understand how my brain works.

Ahmed: Okay.

Prayson: But I trust.

Ahmed: But I trust it.

Prayson: And the same thing. You just say my model is transparent. I go like, yeah.

Does that then entail, then it’s ethical?

Philosophy and AI

Ahmed: Your background is in theology and philosophy, that obviously gives you some advantage over me. I would say in that sense, they are pre-installed. If you’re an environment, if you’re a whiteboard that’s pre-installed before your career of data science started.

How will you be able to, for example, give me tools to do that?

Obviously, you can tell me, go study philosophy. But if you were going to translate that to me, to steps, guidelines of way of thinking, besides just dialogue now.

Prayson: I think where you do work today, I can see there is an emergency of different tools that are helping data scientists to know what to follow.

Given that, okay, we have the rules, like the trustworthy AI, I think EU rules and everything. And then if one starts there, the guidelines.

Ahmed: But that is already translated into law. And we might have just agreed that law can be just an outcome of some fear.

So the question is, do you think that is a big problem now that some of those things that are posed by the EU AI Act are already quite in line with how you would have seen it?

Prayson: So some of the things are spot on, like when it comes to data traceability, when it comes to ability to explain how the model came with these predictions. How do we ensure that these protected attributes are kept safe? All this, I think, is spot on.

What is our job? Our job actually is not building models. Our job is to answer specific business problems.

So in the end, I usually say everyone is in the business of people, not machines. At the end, we are saving them and including them in this process, they become part of our building asset.

Then this helps a lot. It also helps in what you call a handover. Show them how to use this model, because they’ve been part of building this model.

Actionable points

Ahmed: So if we should summarize the question of ethical considerations in machine learning development, put them as points that are actionable from tomorrow, not waiting for some future things to kick in.

But what can I do tomorrow? As one, two, three.

Prayson: So number one is, before you begin a project, just go back and say, what are we trying to build?

So that’s already part of our dashboard. So remember, our dashboard is not going to show our model is 98% accurate.

Ahmed: No.

Prayson: What is that? Right, because that’s not the aim of…

Ahmed: Aim is the business case.The story that we told in the beginning.

Prayson: Yes, the story. And then after that you go like data, what data do we have?

Traceability, accountability, who has seen it? When was it deleted? And here comes, given the question, what kind of modeling do we need to do?

Do we need all the data? Do we need some of the data? And if we do, are we going to perform differential privacy or not? Or drop them or do something about it. Right?

And then, number two, are we including the stakeholders in the whole journey that they are part of this? Right? And also being transparent with our disclaimers and explaining the limitations of what we have built.

So once all these things are in…

Ahmed: In a backpack.

Prayson: In a backpack.

Ahmed: Then you’re ready for the journey.

Prayson: Then you’re really ready to go.

Ahmed: Drop the ring in the mount.

Thank you, Prayson, for the talk. This was a pleasure and a very insightful point you have. And your background in philosophy is really, really interesting to me also as that input.

So yes, thank you for watching the AI Watch. Please subscribe to watch our latest videos.

Ahmed Zewain

Ahmed Zewain

Lead Product Data Scientist, 2021.AI

Ahmed Zewain is a Lead Product Data Scientist at 2021.AI with an MA in mathematical modeling and computing, and extensive knowledge of several data engineering tools. Ahmed’s skills include building ML POC projects and taking them further into production for a wide variety of clients.

Prayson Wilfred Daniel

Prayson Wilfred Daniel

Principal Data Scientist, NTT Data

Prayson helps organizations leverage ML to boost business, build loyalty, grow revenue, multiply efficiency & make data-driven decisions. He is passionate about building innovative solutions with IaC, Microservices, Test-Driven Development, SOLID Principles & DevOps pipelines.

Watch the previous episodes

AI Watch Video Newsletter

Get the latest know-how from those in the know. Sign up for our AI Watch Newsletter and receive the latest insights from AI experts.