AI: Expectations, Ethics and Innovations
TechTalk SMB Host Charlie Guarino and data scientist Thomas Decorte talk all things AI, from machine learning to the wide world of LLMs
Click here to listen to the audio-only version
This podcast is edited for clarity.
Charlie Guarino: Hi everybody, this is Charlie Guarino. Welcome to another edition of TechTalk SMB. Today as you can see, I am thrilled to have Thomas Decorte. Thomas Decorte is a PhD researcher and data scientist with a passion for accounting, programming, data analytics, and data science which is a big description I think for you Thomas but I think it really encompasses so many things in what you do; we’ve had so many great conversations about AI. I am thrilled to have you here today. I really, really am. I know you’re in Antwerp Belgium right now which is where you live. Of all those things that you-that I’ve read about you so far to me is the more important one is the ability to call you a friend because it’s really-we’ve had the great-I’ve had the great pleasure to present sessions with you in the US and in Europe and I’m so grateful for that so welcome to our podcast today.
Thomas Decorte: thank you. Thank you so much for having me. It’s really an honor to be a guest on the podcast, really nice, thank you and also thank you for the nice words in introduction.
Charlie: Well, we try so hard but you know what? They’re from my heart so that’s important.
Thomas: Thank you. Thank you so much. Yeah, really excited for it.
Charlie: Me too, thank you. Thomas, we’ve-we’ve-we’ve had some pretty interesting discussions in AI and certainly this is a topic that has really captured our imaginations of the-of the world population and I think within two years, AI has just become the first thing that anybody in IT is talking about right now. I think that’s really a fair statement. Obviously, AI is not new; it’s been around for decades and one of the things I-I want to talk about to start our conversation is while it’s still so new in-in-in many people’s minds it’s not new but one of the problems that I-I hear over and over again are the capabilities of AI. There seems to be a lot of misunderstanding out there with what AI can do, what it-what it cannot do and things like that and people just think that it’s this new magic thing, maybe some voodoo in fact, that it’s capable of doing things that we-we can’t even imagine so I want you to speak to that just to get to the conversation started about maybe some examples you’ve seen or just-just in general how people have-might have maybe unfair expectations of AI.
Thomas: Well, that’s a really good question and an excellent point actually. It’s one of the most common things we tend to see currently both in like real life when you meet some people some other programmers, they tried something like a ChatGPT or one of the other large language models where they you know try something, try to code with it and then they see it’s not always as they expected and they’re very disappointed but also customers for example, it does happen quite often that an IT manager or somebody within the IT department uses ChatGPT. They see it on the news; they need to do something about AI sometimes from higher up where the board requires them to implement something and then they immediately expect it to be similar to like a silver bullet, it will solve any one of my issues whilst in reality they don’t really know what to expect or how something would work. There’s quite often a disconnect between stuff like ChatGPT and more traditional AI or machine learning models that can be deployed within a business context so there you sort of see that the expectations can be very different because they assume that they can basically use something like a ChatGPT for everything.
Charlie: You know we-we mention ChatGPT and that’s the one that everybody knows about but obviously there is so many large language models out there but that’s-that’s the most popular one of course and well I think it’s the most popular one and I just-I heard recently another study that now many people are-that’s their first thing to learn about a new topic and you know it’s replaced Google in fact.
Thomas: Yeah, yeah, indeed.
Charlie: I mean what are your thoughts on that? Is that fair? Is that an appropriate thing?
Thomas: Well, I always think it’s-it can give you an easy answer but there’s also a lot of problems or issues that have arisen because of it, right? There’re many cases in which the LLM tends to hallucinate and then it might give an output that’s incorrect or it might give a seemingly correct answer to a user that might not be aware about the actual-what the actual output should be. More interestingly if I may-may continue a little bit on this, there recently was a study where if for example they gave a group of students a ChatGPT and with a task to program something small and they also gave another group of students a different more focused on coding so this is one of the other LLM that they then used, more focused on generating code with the same task and then also students that had to do it the old school way and use Stack Overflow and Google and the internet and their course textbooks to fix the same issue. They noticed that the speed of course of the ChatGPT group was the fastest because they can ask prompts and it generates code. Then the code specific one is a little bit slower but had also pretty good time and then of course the slowest were the ones that had to do everything in a-in a more traditional manner but when they later were asked to recall for example how did you solve this issue? How does this actually work? That ChatGPT group actually didn’t-wasn’t able to answer any of the questions sufficiently; then the ones that used the code LLM were a little bit better but then especially the ones that had to research everything themselves were a lot better so I do believe that there is still a lot more value in Googling something and searching for something on your own because you’re a lot more involved in the thought process. You can sort of see how you could implement something, how you can get to a solution and even if you’re just searching for regular information, I feel at least that people are a lot more critical if they read something on the internet whilst if they ask something to a LLM, they immediately assume it might be the right and correct answer whilst it can be hallucination.
Charlie: That’s an interesting point that you make because you can ask me any topic, literally any topic, excuse me and I can go to ChatGPT and give you a response immediately but I won’t have the same ability to your point to synthesize that data to-and you know to get a deeper understanding of it. I-I hadn’t considered that until just now but that’s as very interesting point which will always support, I think deeper research you know on your own when-nothing like doing it on your own to really-to make things stick.
Thomas: Yeah, that’s definitely true and also more importantly you also know the sources where it comes from right. If you search for something yourself, you know where you got it from. You know a little bit if it’s reliable yes or no whilst if you use something like a ChatGPT, it might use one of the unreliable sources’ underneath and it’s very difficult to actually attribute that to especially for the end user.
Charlie: You know there’s also the expectation I think that-I mean I-I mentioned it briefly just a few minutes ago that we think AI can solve everything. It’s this all-knowing Oracle that’s capable of doing everything and even people who use or people who are in IT, I think this is-this is a common problem that anybody in IT will have. People who are not in our industry they think that any one person, a developer, a single developer or whatever the case is can do anything IT related, anything no matter what it is, very similar I think to a medical field. You have someone who specializes in a certain field of medicine, oncology, dental, whatever the case happens to be but yet people who don’t know their industry expect them to-to know everything about IT-about medical I should say. I think you can make a-there’s a correlation in AI. I mean is that-have you-have you seen that?
Thomas: Yeah, yeah definitely. It’s something very common that you also see in research where various even subdomains tend to work together because they don’t have all of the expertise. For example, you can have somebody that’s really specialized in AI fairness so for example in how bias in AI works, how can you combat that, how can you actually measure something like that for example compared to somebody that specialized in building a LLM or somebody that does more traditional machine learning where you want to have more like a tabular format of data and you want to output something to. For example, computer vision, somebody that learns like writes programs to learn from images of videos so even within the AI space, there are so many subdomains and specializations and if you then extend it to the IT, it’s very normal that some programmers don’t necessarily always know the answer to every specific issue so it’s very similar within the A-AI field, right? If you have a programmer that mainly does back-end development, it becomes a bit of an unusual question if they suddenly have to fix something in the front end or have to do something else and it’s very similar. It’s not that you don’t know anything about it but it’s also not exactly your specialization and what you do day to day.
Charlie: So how do you combat that when people ask you these questions that are not necessarily in your-your scope? Is it just more-do they need more education on AI in general? How do-how do you combat that or give them a credible response?
Thomas: Yeah. So I-I believe the main thing that you see across companies is that they learn about AI from the news or they heard about somebody doing some project or they use a LLM like a ChatGPT to ask a question or their children use it for their homework but they don’t really know that much about the AI so mostly if you get asked the question outside of your field, it’s mainly related to like a general AI question. If you get very detailed questions, then it’s more of an issue where you have to research it yourself or ask somebody that’s more specialized. That’s also one of the advantages of working in research is that you tend to know a lot of people in different fields that are related so you can easily ask a question about something if it’s more-more detailed but what I think is that most people learn or hear about AI without necessarily understanding a lot of the underlying research that led to it and also without necessarily understanding how everything works.
Charlie: You’ve already mentioned a couple of key words that I want to expand on because it’s so fascinating to me. We talked about-well you’ve already said bias and you even used the word ethics and ethical and those to me are such an important topic in the conversation of AI because they really have such great influence on the data that we consume back and how AI is just used in general and there are always or I suppose there always should be concerns of ethics when you’re dealing with an AI system I suppose if that’s even the right way of saying it but to-to kick off this-this-this brief conversation we’ve going to have now, how-how does-how do you define ethics or what role do ethics have in the world of AI?
Thomas: Yes so well it’s a very broad spectrum maybe but the ethics is mainly related to something like a bias for example where maybe for the audience to explain it. If you have for example training data on which you train your model, there might be a misrepresentation of the actual situation within the data. If you then train a model on it, of course it will output a biased prediction or a biased output based on the training data that it was given. In most cases that’s how it can for example happen that an AI model is actually unfair, biased, and ethically irresponsible to use for example in loan approvals. There-there might be a bias within the original training data that then could lead to biased predictions in the end. There have been a lot of effort into measuring how bias the data set is, how can you actually construct an unbiased data set and then also more importantly that the end users actually understand the deployed model because you can build a fair model within one context but if it is then used in another one, it might be biased so there’s a lot of different ways in which the-and also especially with data privacy the ethical concerns that an AI model might be trained on your data or they use some form of your data which they got from a third party or they collected themselves, that’s also a very important part that it’s very transparent on which data it is actually trained. I think IBM is one of the-the leaders in that space where they are the most transparent about which data is used to train their LLMs, how is it actually curated, what is part of it. I think you can even try and find the exact documents that were used so it’s very transparent, very open which makes it I think a perfect candidate for an enterprise grade application.
Charlie: And that’s the Granite model we’re talking about?
Thomas: Yeah, those are like the Granite LLMs, yeah those are the-the sort of IBM suite of large language models because there are various-variations in sizes but also in-in-in the actual purpose so IBM ops to have most focused models rather than the super large lang-of course they also have the large language models but they also have a lot of focused models for example to write COBOL code or something. Yup.
Charlie: Which is an interesting-interesting discussion because I know when you have a very broad LLM, it’s-it’s-you know that’s-that’s-it’s not the same as a narrow-a narrow LLM as you just mentioned so there are of course different LLMs for different purposes obviously and there are some also I imagine for particularly industries like maybe one for health care I suppose, different things like that.
Thomas: Yup. Yeah, definitely. There’s a-especially right now there are so many different variations based on an industry, a specific purpose also or even like the size as well because they’re obviously quite resource intensive as well and it’s also something that should be monitored and managed, right, the resource management that a LLM uses so for example in the case of the Granite models, they’re usually meant to be lower in resource requirements than some of the other competitors for example, yeah.
Charlie: Let’s-Let’s talk about the bias-
Thomas: So, it’s a very-choosing the right LLM is a very complex situation. I’m sorry. I’m sorry.
Charlie: No, that’s perfectly fine but I think-I think only when you realize that, when people realize that because again the general population thinks ChatGPT is the-is the one that everybody should be using which is good if you’re ChatGPT but-
Thomas: Yeah, there’s a-yeah, yeah definitely. It’s a-also what we tend to notice is that it’s better to sort of mix and match you know there’s a lot of open-source. Try and see which works best for your type of solution that you’re aiming for because sometimes even what works very good for another person might not necessarily work in your use case so it’s very case dependent.
Charlie: Let’s go back to the example you just mentioned about the-somebody applying for a business loan or a-a loan of any sort and now the lender is using AI to help make a decision on somebody’s financial worthiness if they want to give them a loan or not. From a pure ethical perspective, do you think or not do you think necessarily but I know that there are questions about that and the questions are for example should the person applying for the loan be told that AI was even used to make the decision or is-is AI the one who makes the decision and not even help the person make a decision so is that really-that’s really to me where some of the ethics come into this conversation.
Thomas: Yeah, that’s-that’s definitely true. It’s-they try and be very transparent on the data that is used, where it is used and in some use cases how it even came to a decision. In the loan use case, it’s required that they know how we get to a certain decision so a yes or a no or a suggestion to a yes or a no because obviously if you get disapproved for a loan, you want to know why and not just because the computer says so, right? There’s a decision process that needs to be able to be explained where is where the xAI which is coming like it’s very big right now, the explainable AI comes into place. How can we actually explain how a decision is made? It’s also for example something that you see in the EU AI Act. If they want to apply a model or something in a critical industry so for example in health care or anything with the government, etc. or miliary operations, it should be for example monitored. It should be in some use cases explainable because there is obviously some risks and some ethics that are involved in those use cases and there they need to be-they are monitored and they need to require certain regulations or even like which data is used, how is it actually anonymized, how do we deal with this, how can we be so transparent to the end user and the act is specifically focused on the providers of the LLM so for example if you would be a company working in health care and you built your own healthcare model, then you are required to adhere to these regulations even outside. If you’re a company in the US and you have a-you sell the model to the EU, you need to be compliant with the regulations. Yup.
Charlie: So, you’ve-you’ve-you’ve brought up another interesting topic to me and that is the EU, the European Union AI Act that was just put into place this year-early this year in fact 2024.
Thomas: Yeah.
Charlie: That act is quite interesting because it really, I think and correct me if I’m wrong, I think it’s the world’s first attempt to really add some more governance to this whole topic, I think. Is-is that-do I have that correct?
Thomas: Yeah at least what I know of. It’s the first really also like a transnational because it’s across the entire EU so it’s the really first effort to make a broader governance also with certain bodies in place. There’re certain EU AI offices that have been put in place to actually monitor this so yeah, it’s the first thing and I think in the future probably there’s going to be some more governance. I think it will be something that will come into the future yeah even more so.
Charlie: I think the-the AI Act is going to be used as a template for other-other you know places. I-I-I can’t imagine that that wouldn’t be-some form of that would not put into place in the United States. I know a lot of companies right now are trying to self-govern-a lot of self-governance that they’re trying to implement but I-I don’t know. Maybe the government does need to get involved on some level. It goes-it goes beyond just bias. I know there are even concerns about the power consumption for example.
Thomas: Yeah, yeah. It-it-it’s like they divided into the significance so depending on how risky the model is of course because for example if you have something very local like a sales forecasting or a customer churn model, the req-like you’re not going to have to regulate that much because it’s more of a local implementation but as soon as you really start selling things that are actually using data of customers and everything needs to be monitored a lot better. Indeed, they are looking to add certain things to this-to the AI Act. That’s also the main thing, right? It started actually right now in August 1 of 2024 was when it came into place and it’s gradually expanding every I think six months but I’m not sure about that but every few months it will actually expand and it’s mainly meant as a starting point but in the future they already have plans I think for like 36 months out or something so it’s just now the starting point and will then expand also to not-I believe to not push everything in one go because it’s also a lot of work probably.
Charlie: Well, that’s the thing. I mean how have you-how do companies or any consumer of AI, how do you get into compliance if the restrictions run too deep too quickly? How-how do you adopt-how you adopt the governance?
Thomas: Yeah. Yeah, it’s really-it’s-it’s true. Currently they are mainly targeting providers in order to protect for example end users. It’s also for example in there that it needs to be said when an AI model is used, how it is used, etc. so for example in healthcare, it’s very relevant you actually know how a decision is made and also which data is used, if for example a physician actually uses model for something. It’s mainly focused on the provider so the builder of the model whether it’s internal ID or somebody selling it because usually it-like there’s a lot of companies that are selling models for example or selling outputs of models that have collected a lot of data and then they allow you-similarly to how a ChatGPT works, right? You can buy tokens and then you use the ChatGPT model online to actually output some things somewhere else.
Charlie: Do you think the AI Act is too restrictive or do you think it’s-is it appropriate for what we have right now?
Thomas: I think it’s appropriate, yes. I definitely think it was important that something came into place and that they really should start looking at this because it’s a very relevant issue. This was very unregulated space before and now they’re really starting to build something together. I think it’s very important especially with the speed at which this is going. There are regulations, putting everything in place, making sure everything is-is okay takes a lot of time and if you see how fast LLMs for example iterate or how fast improvements come, they really need to catch up in that space as well, yeah.
Charlie: I just wonder if they will ever be able to get full control because of what you just said exactly. It’s going so quickly and it takes a long time to introduce new regulations-
Thomas: Yeah.
Charlie: So, it’s always-it’s always a cat and mouse type situation.
Thomas: yeah, yeah indeed. It’s going to be very difficult I think especially if they-if they-well right now they’re already doing like starting with it so that’s the best point but I think it will be a very difficult situation in the future especially if this like continuation of next models, newer things that can be done continues.
Charlie: I don’t think it’s going to stop. I mean I think-I think if anything the-just the overall interest of the world in fact will just continue to grow on this. This is one area that really feels so different to me than other areas in IT just because how quickly-
Thomas: Yeah.
Charlie: And-and how it’s capturing our imaginations of the capability of what AI can do. It’s really fascinating.
Thomas: It’s-it seems sort of like a-like when the internet started. It’s very similar I think now. Also, there were some issues at the start as well so it’s very similar now I think to what happened then.
Charlie: Let’s go back into the healthcare discussion just for-for a little bit. So we talk about the ethics of that, you know is it appropriate for for example a doctor to tell you-just like the loan situation, is it appropriate for a doctor to tell you that they are using AI to help come up with a diagnosis for example or a prognosis?
Thomas: Yeah, that’s a-a very difficult question of course because it’s also a bit of a personal opinion but I-what I see is that currently there’s a lot of research going on in this space where they try to build models that are trained on a lot of data but then can output individual predictions which is very interesting of course when you can learn something from a group but still have an individualized predication but at the same time it also sort of sparks a controversy. Do you want to have a physician that then uses an AI model and also in which way does it influence the decision? For example, on all the papers it’s always stated that the physician needs to make the end decision and that it’s just a tool, right? It’s similarly to how a-how a ChatGPT would be used by a programmer, right? You’re not going to ask to have an entire program written by ChatGPT and then just push it to production and be done with it. It’s probably not going to work so that’s a little bit of a difficult question to have like a clear answer on but I think it’s definitely helpful if you can have a model that maybe serves another insight or that might have some pattern recognition that you didn’t really think about or it might give you a little bit of a different perspective. Similarly to how a programmer if you ask ChatGPT to write a function, it might be oh okay. I didn’t really think about it to do it this way but this seems an interesting approach and maybe I should look into that.
Charlie: Yeah, you’re right. I can imagine there are some people-let’s say patients in this case, patients who would only go to a doctor who heavily relies on AI and another patient who would say no, no, no. My doctor-I want him to be-I want to have his best knowledge and expertise in this and of course right and we go back to the same point we started with. Is the bias also need to be concerned about and the bias might give-
Thomas: Yup.
Charlie: Might influence an incorrect diagnosis.
Thomas: Yeah, definitely. Yup and it’s also one of the issues that we sometimes see when clients label data themselves. They introduce bias within their data. For example, if you ask about like fraud detection or customer churn model or something similar, sometimes they have to input some data themselves mainly like the target variable, label it, and then it’s also the opinion of one person or a couple of people that then is introduced in the model through the labeling data. It’s also for example one of the issues that might arise where you-the model actually sort of learns the opinion of the one person that labeled the training data in the end.
Charlie: Right so you’re getting mostly that person’s one opinion which can-
Thomas: Yeah.
Charlie: Right again bias. Right back to bias again.
Thomas: Yeah, indeed so it’s-it’s all a little bit connected.
Charlie: It sure is. So, let’s go towards the end of our conversation, I-I-by the way this is-again this is just so fascinating to me. It truly is but I just want to-I guess one of the points I want to really point to or drill-drill home I should say to people who are listening to the conversation is that this is really just a tool and we need to look at it as a tool, not as a replacement for us, right?
Thomas: Yup, yeah indeed. It’s mainly used to make you do stuff faster or having some extra insights or basically enhancing what you would be doing so it’s not necessarily meant as like ahh, this will replace everything and everything will be gone. No. It’s mainly used as like a tool to make you work more efficiently to maybe have some extra insights to maybe do something in a way you wouldn’t really think about in the first place as in my previous example so it’s more meant-I believe at least it’s more meant as a tool rather than a pure replacement. You have some people that claim that it will replace everything so soon but at the same time it also creates like there’s always going to be some use cases where in the future indeed AI will be like the main thing to do it because it’s just more efficient and faster but at the same time it also creates so many new opportunities so it’s-but I believe it is more used as an enhancement tool for what you do. For example, in the healthcare use case you would more want it to enhance an opinion rather than be a pure replacement of it.
Charlie: So, what would be one of your final messages to maybe companies or CIO’s or anybody in the C suite, what would be your-your message to them because surely everybody is talking about this. This conversation is a buzz. It’s front of mind for many corporations if not most corporations today. What would be one of your final messages to them for those who have not really begun their journey but are entertaining the idea maybe on getting their data properly aligned or whatever your message might be but what-what would be your message for somebody beginning this journey?
Thomas: It’s a very interesting question also. I think or at least what I believe is that the main way to start with something like data science and AI is basically to start with the really small well-defined problem so you tend to see that most people want to go for a big bang idea that will solve everything. I think it’s very important to start small and really learn how do-because these types of projects do work differently than for example your regular software type of project so it’s very different than how the other things work and very slowly build something out, some small use case that then also gets actually adopted in deployment where you have end users that have to interact with it. They need to know something about it so it sort of creates also a culture within the company that’s AI aware which is very important as well. Then you can expand upon that success project and move towards bigger things that might be a bit more complex but at least you have some experience already with how it actually all fits together, how everything works so I believe that’s basically start small and then scale up to more ambitious projects. That’s how I would do it because in the smaller projects even then for example now we have to look at our data. How does our data actually look like? How clean is it? Do we need to do a lot of data transformation and then you immediately on a smaller scale get to have a feel with also how well the company might adopt an AI project right because for some companies that already curated their data quite a lot and have a very clean database, you know everything is well documented. Then it might go fairly smoothly but for other companies that have more difficulty there, quite often 90-95% of the time can be spent on actually trying to figure everything out and then you still have to do the actual AI project, right or actually build a model to output something.
Charlie: There could be a significant commitment up front just to get something even-even the smallest project just to get them started. You know we talked about data cleansing. You may have to have a significant amount of time just to get to where you need-as a starting point.
Thomas: Yeah, yeah to just have like a decent starting point to then actually move forward with something yeah and that’s why I always advise to start with something small that’s very feasible and then move forward.
Charlie: I guess in the end it goes back to understanding what you’re trying to accomplish you know because-
Thomas: Yup.
Charlie: I mean we started the conversation with somebody misunderstanding what AI can actually do and I imagine-I’m guessing but I would imagine that there are some projects that are just out of scope for AI. It’s not really an AI type project.
Thomas: Yeah, yeah, yeah. Yeah, indeed or something that can just be done without using AI that some times they try to because it’s such a buzz word, they try to implement it in everything and sometimes it’s also okay to just for example with regular programming do a project. It doesn’t always need to rely on AI and I think that’s also a very important key take away and also being able to actually measure how successful an implementation is rather than oh, we’re just trying to do something here. Have actually like okay this is how we will then measure and see how successful it is rather than just implementing something and not being sure if it actually was very good or if it actually works so that’s also very important.
Charlie: Fascinating. Thomas I’m so glad we had this conversation. Thank you so much for your time today. This has been so interesting. This is one topic that I would like to have you back on at some point in the future because this is so interesting and we-we-we really just focused on just one small piece of this whole AI thing, just like the medical doctor we talked about earlier you know this is one-this is one small fraction of a larger space and so I’m-I’m putting in my calendar Thomas maybe for the next six months. Hope you don’t mind about that. No, I’m joking [Laughter].
Thomas: No, it’s okay. I’m very glad that you invited me and very happy to talk about this. It’s very nice. Thank you so much.
Charlie: Oh my gosh, this is my-my pleasure for sure. Thank you. So, we will wrap it up here. We’ll-we’ll stop here with-with my thanks on behalf of myself and TechChannel and of you know the IT population at large, the community who listens to these podcasts. Thank you so much. Thomas always a pleasure and I’m sure I’ll be seeing you again down the road you know in the near future and I look forward to it as always.
Thomas: Thank you. Thank you so much for inviting me.
Charlie: Absolutely. Thanks, and thanks everybody. See you at the next podcast. Bye now.
Thomas: Thanks everyone. Bye.