IBM Power Champion Birgitta Hauser on SQL and Modernization
Charlie Guarino: Hi everybody. Welcome to another edition of TechTalk SMB. Today I am joined by the queen of SQL for Db2 for i, a world—renowned expert on SQL, somebody who freely gives so much of her time in the IBM i forums, somebody who I've had the great pleasure of speaking with in many cities in the US and abroad. She is an expert on database modernization, education and is an independent consultant. In addition, she is also a RedBook author and more recently an IBM Power Champion. If that doesn't give it all away, I think by now you know who I am speaking about, no other than Birgitta Hauser. Birgitta, thank you so much for joining me here today.
Birgitta Hauser: Thank you. It's a pleasure for me to speak with you. That's great. Thank you, Charlie.
Charlie: Great. Thank you Birgitta. Oh, it's always a pleasure to chat with you and you know that. So Birgitta, we talk about database modernization. This is obviously your area of expertise and in preparation for this little discussion we're having today, I did speak to several developers and what I found, I'm sure you have found similar, is that the level of expertise is really from beginner to very advanced. What I find is that I know the beginner developers are somewhat intimidated. They don't even know where to start and of course they don't even know perhaps what the word modernization even means with relations to database so give me a little bit of background on that first. What—how do you describe database modernization? What actually is that because I have a functioning system today with data DDS defined files for example? What is modernization and what is my goal to—my reason for doing that?
Birgitta: So, the final goal would be to redesign our database. Unfortunately, we are living today in—we are in a world where we have a grown application and the database is also grown over the years. You have a lot of redundancies in it just because 20 years ago you could not do things that you can do today so yeah but on the other side, we have an application. We have hundreds of programs who access the same table so it is difficult to then say OK let's redesign the database and then let's wait five years and then we can continue. So, for me the first step in modernization is to move as much business logic as possible out of the programs into the database. So, the easiest way for doing so is to use SQL views so you just write your SQL statement, more or less complicated and then finally you wrap in a view and in your program you have only this view or you use only this view. That's the first step and in this way, you can do so many things today with SQL. You can use let's say keep track of cash and you can build subtotals and totals with SQL. You can even access files in the IFS; you can access a web service; you get data returned in a chase and XML format; you can decompose it and all these things are not as trivial but if you have someone who is able to read this data who access data, wrap this data into a SQL view so you have only the view and in your program. The easiest way to use it just SELECT field 1, 2, 3 far FROM view, a few rare conditions and then autobyte. In this way, you have one or two guys who are able to write this complex SQL statement and the average programmer, he should not care about that. He just uses the view. That's one thing. The other thing is performance so the guy who can write this complex SQL statement can also build the right indexes for it. Another thing is to today we have com—we do not only a single programming language so we have let's say RPG programs; we have COBOL programs; we have Java programs; we have PHP and so on and everybody access the same data so if we have all these things in our programs instead of our views, instead of a view, then we have to reinvent the wheel over and over again and so this is the first step. In this way we can reduce our source code first to a minimum and then we can also externally—externalize data access. That means just to wrap update insert the lead in and procedure and so we can asset externalizing and if we go this way, we are finally in a state where we are able to redesign our database. That means remove redundancies, build new tables, draw in new tables, implement things like referential integrities, etc. so this is altogether but the first step in my opinion is to concentrate that the business logic is moved off the programs into the database.
Charlie: That's a lot of information you gave me there and I—I absolutely agree with you that the ultimate goal is to make our database datacentric where we have the business logic in there but—but that's a big first step in my view because I have all—I have in all my programs I have much of the business logic in there so if I—if I simply move that logic to the database itself with constraints and triggers and other things that you've mentioned right there what about my code? I can't just leave that behind. I have to update my code at the same time. Is that true?
Birgitta: You have to—you have to rewrite your code. That's clear but on the other side if we talk about modular programming so we just write the right function, write a procedure which will do that and then we can start to revamp our programs. For example, what I always do for each table of each physical file, I have a single function which will do the insert, a single one which will do the update, and a single one which will do the delete. All the problems are if I check something, how can I—can I delete this record? Do we still have inventory if I want to delete an item—an item from our table and so on so what we can do is we can start with that without touching our programs. Just to say OK I write for this table my update routine, on my insert routine, on my delete routine and then I can do a lot of things already in these routines. Let's say from insert, OK we can set default values; we can compare something. If I have two fields, does it match, etc. because in the first step it might not be possible to put all these things directly in the table but if we concentrate it already in a function so then we can go step by step. Oh, in this program I do an insert. OK, let's replace. Oh, in this function, I have the delete and with this delete I have whatever 25 checks whether I can delete this record so I know have concentrated it in a single procedure so I can reduce step by step my source code. Then I said this is the first step also this automatic of this function so what I did, I have just skeleton programs so in the first step I just can take one of the ske—my skeleton programs and just replace the filename in it so I have already the basic procedures generated and then I fill with life and then I go step by step and move it from my programs into my—or call my procedures instead of the program. It's the same thing as if we are doing some modernization, some modularization in our programs. We just put it out and have things—and then go step by step and rewrite our programs.
Charlie: This seems like a lot of—a lot to do and I—I recognize that the ultimate goal of what you're describing is the best solution but maybe we can give the people listening to this podcast a very—very first step and I—the first thing I generally hear and do as well is just first get rid of the DDS. I mean that's even before we do anything else. That should be even our first step. Do you—is that a valid point?
Birgitta: Going from DDS to DDL is a valid point because there are some differences. We can use a DDS describe table in the same way as we can use a physical file but internally, they are not 100% the same and they are not 100% the same. For example, if you try to copy something into a DDS file, let's say just copy file with no check and you can copy everything in your physical file, no check at all. If you try the same thing, the same statement copy file, no check in the same table but this time SQL described so you cannot insert any invalid data. This is because when we write into a DDS described file, there is no checked—check internally but if you read from a DDS file, then the data is checked. In SQL, it's just the inverse so if we write into the—into the table, then data is checked; if we read from the table, data is not checked so just a small idea. How often do you read and how often do you—do you write or update? Maybe 80% read and 80% write so even if we only convert from DDS to DDL without doing anything, our programs might be slightly faster because there is no check anymore if we read but it's checked if we write. On the other side we then can get—run into some problems. Let's say we still have old—old programs and we use a data structure to write the data and the data structure is not properly initialized—initialized so then you get a crash. OK, we then can fix rel—relatively fast because you know here is—here's a problem but I said as soon—as soon we convert it, we have this advantage but in converting itself if we do nothing else then convert from DDS to DDL, we can do it relatively easy because we have ACS and within ACS, we have an option which is generate SQL. By the way, there is a start procedure which can do that and the generate SQL will generate the SQL code for our DDS file. Then if we do not have a key on the DDS file, that's very good. Then we just can run this create or replace table statement. Then the database or the table is converted from DDS to DDL and what's even better, as long as we do not add a field or something like this, the format level is the same. So, there is no need to recompile any program but this is a very first step now. Now we have converted from DDS to DDL and now we are able to use all the good new features that are available for SQL such as an identity column such as adding auditing column etc. We can include now in our table and then OK then we have to recompile if we have a native I/O. That's it. Something we also can convert by the way a logical file, a DDS described logical into a SQL index. An SQL index cannot be used in a SQL statement itself but it can be used in the aspects. For example, an RPG program like any key logical file so converting from DDS logical file into SQL index and your program still runs, no need to recompile anything.
Charlie: OK but if I just convert my files to tables and you've already said it, that—that potentially will give me data errors because the older style programs may—may in the past have put bad data into my tables where now it's no longer allowed so if I'm going to do that as my very first step and that is convert to DDL, if I do that as my first step which I think shou—is a valent first step, how long do I—do I let that sit before I start the next step? How long do I let that simmer and cook for a while until I'm—or I have to make sure that everything, every program that is active is exercised properly so is—so is there any kind of formula you might you have or recommendation before I go to the next step to let the files—?
Birgitta: No, it's all—what I'd suggest is just to do one table after the other and then just let it run. I said it will crash immediately if you have wrong data or you may say OK yeah, I still have old tables and there might be—they are not properly initialized so let's have a look on it and check it. So, what I have done with customers was we have a test environment where we have the same programs, etc. and the same data and say OK let's convert and now let's run it. Let's test it and if something goes wrong, then they know OK here we have to fix it, here we have to fix it.
Charlie: But do you do that with the entire database or do you do it by application or by master tables and then transaction tables? Is there any particular method that you do that first or is it just—
Birgitta: No, normally we start with the tables that are yeah yeah not critical so just to get it up and then we plan. OK which table do we want to convert? Which is important? Then we do one after the other so there is no need to say OK, I have thousands of files. I have to convert them all this weekend. We can do it step by step so let's say we can say this weekend we convert three to five files or we convert it here in our test environment. We test it. If everything works, good, then we convert the same tables on the productive machine and yeah then have a look. For the next week, we prepare the next five or ten tables so it's not a problem. What I have done with customers where we convert the tables so I wrote some additional functions so that you do not have to do things like this manually. OK I did a little more than only conversion. I also said OK we convert the numeric date also in real dates so I added date fields. I added—generated a trigger program that keeps the data consistent, etc. That's what I have done with my customers. We sit together. We have a look on the database. We describe what we do and then I wrote them some individual programs which automate this so you can still have some exceptions but after this was converted, I have written the source code so we could convert a table in—depends on the number of records we had—generate all the new files and the trigger programs in 15—30 minutes. So, you can do it very fast and then test it. If something goes wrong, you just reset it.
Charlie: What do you—what do you say to the—the developers or even the managers or the executives who you have a hard time convincing them of the benefit of finding a real budget for this—for this process because there is—there is—this could be a process that does require a significant amount of time and testing? You have to find time—first of all you have to find a time budget to do this in between your everyday job of course but there's also a financial budget and so how do you—how do you convince or what—how have you spoken to management in finding the real benefit to this, the return on investment on this so they understand and want to really get behind this whole project?
Birgitta: Yeah, that's—that's sometimes not very easy. If you want to add a new interface so a nice interface so the management can see it but it's really difficult to convince them. One thing you can say is OK, we have an old database. It is 20 years or 25 years old and time changes. What we have to do we have to be able to access our data very fast. We have not only our data but also foreign data. We have to bring them together so the faster we can access the data, the faster we can build new reports and the less we have in the source code, the more we have in the table also security. Yeah, if you add things like constraints or referential integrities or row and column access control. These are very important things because the IBM i is no longer an island so if someone comes in and he circumvents my programs so in an old table there is someone who has an all—object authority can do with the data whatever he wants. Everybody who can access the table or the data can do what he wants and if someone has come in and tried to destroy my data, there is no rule set. So, we—that's one thing. We have to make sure that this is really important because someone who comes in our IBM i and wants to hack something, steals some data, etc. he will not go over our programs. It will be just things like SQL injection, etc. so that means we have to make our database secure. These are all things we move it again from the program into the database. This time we do not talk about views which is a very helpful step. We talk about constraints, referential integrities, row and column access control. This—these are very important to keep your data because your data is the most important thing you have. If you lose a program, you still have your data. If you—if you have not a web application, you may still have your green screen but if you lose your data, then you have a problem.
Charlie: You know even as—as a developer I can tell you that this is—I—I understand that there is also a rewrite of the code. Certainly, we have to do that as well but it's also a—a larger change to the code because once you introduce SQL into your—into your applications, the way you process the data is so different whereas RPG traditionally is record level access, RLA we say but that's not the case with SQL. It's a complete shift in our paradigm, you know set processing, things like that and we need to be prepared to handle—
Birgitta: Yes.
Charlie: Have the programs handle that methodology.
Birgitta: Yeah. You have to rewrite this and it is one thing if you—it's just the thinking so we have to learn rethink so let's say in the old days a lot of programs are written in this way. We have our native I/O; we read first the order header; then we change to the address master; then we read the position change with an item master table and finally we have a nice picture, nice screen or we have a nice list but then you have to test it. What I have learned in my experience how do they test it? They write queries. Instead of writing the queries after we have written the program, we start differently. We write the queries first and say OK, we need exactly this data. How do we get this data? Then we can prepare it so we—we just have to rethink so the easiest way is to start rethinking with our new programs and then continue and also from performance perspective. We also—those guys will say OK I want to use embedded SQL instead of native I/O. OK you can do it but if you start translating just the native I/O into embedded SQL, that's not a good idea so because we are—as I said the RPG program reads first the order header, then the chain, etc. So, for SQL if you have to do far full opens and a full open is very time consuming because the ODP has to be opened and this is an expensive step. If you compare, if you write a program with native I/O and with embedded SQL 1:1 so RPG will win at least in the first or second run because no optimization and SQL optimization. In RPG, you just specify a logical file in the f—spec and this logical file is taken without any questions. If you iterate over a hundred thousands of records, it's your problem. SQL is much more flexible—flexible in this point. It can join it together; it can optimize it; it can even use multiple indexes for a single table in a single SQL statement. It has a lot of different ways and if you rethink and do it in this way, so SQL should be much faster than RPG native I/O because the customers native I/O, you only read to records, the current one and the next one because you need to know if it's last record or not. In SQL, it works completely differently. It optimizes it; it decides which index, then it goes into the database, tacks blocks of the data, returns them. Also, you should use select * from because in RPG you also always bring back the whole record. In SQL it brings back what you—what you want so if you take the whole record and need only 20 characters, so you throw away the rest? Otherwise, you go into the database, put these blocks, you get much more information in a single block. It comes back and you can then use it but as I said you have to rethink and with older programs it might be more complex but with all new programs you should do it in this way. The best thing is to put just this complexity in SQL views and then say OK, here's a view from this report. OK I will use it.
Charlie: And as you just said, you're getting the benefit of the SQL query engine and the SQE which is—which is terrific so you're getting all the extra benefit that you're giving up by staying in traditional native I/O. You're getting all this extra benefit that you don’t even have to worry about. It happens for you. So that's—
Birgitta: Yeah so—
Charlie: That's the benefit here.
Birgitta: Yeah, it happens for you and that's one thing that a lot of guys do not know. The team in Rochester did a really good job on Db2 for i. Even if we are working with SQL, we do not have to care about a lot of things that other has to be done so things like—a lot of guys do not even know that we have statistics, that the statistics are updated permanently, that we have index advices, we can have a look it. We have a really good analysis tool in SQ—ACS so we can get all the information about our SQL statements what happens. We have—we should explain which is a central product of this performance tool which is an access plan. We see exactly what steps we do, how long it takes for the step and then we can try to rewrite. I said there are only two things that we can affect with SQL if we talk about performance. One thing is we create the right index but the other thing which is even much more important is that we write the SQL statements in the right way so that means if I write the SQL statement in this way, I get the result. Great. If I write it in a different way, I also get the same result but in the first case I—the optimizer can use an index; in the second case not so a long as you have only 500 records in a table so it makes a big difference but if you have 500 million of records in a table, it makes a big difference on whether you read all—the complete table or if you can access directly because SQL is a wide range and it is not as easy as it seems. IBM always preaches you should have a DBE, database engineer so better two database engineers so the database engineer care about the database. They care about SQL views that a regular program said OK I need this and this data. Then the guy who is DBE say OK let's do it. He writes the SQL statement as a test and he is also able—so he knows the right syntax. He is also able to create the right indexes so you can already say OK I know I need an index like this and this and then we have also things where we can say OK let's run the query. OK I know we want to add this and that rare condition so we need another index. Yeah, and the average programmer, he just say OK here is my view from my program and I put—I add my rare conditions and that's it. The average programmer independent of which programming language he speaks does not care about the database. Here we have these guys who are the DBE's care about the database. They prepare the SQL statements; they will also be then responsible for optimization, for redesign, etc. and the programmer just takes what—what he gets. In RPG, we are the owner—RPG and COBOL they are only guys who ever cared about the database. Same with Java guys. They just take what it is and if you ask them, oh that's a DBA who cares about that and we are today in a way to do it in the same way. To have someone or two guys who are responsible for the database, will do these things, who can do these things and the programmer just writes his program. SQL is so powerful we cannot only access our data in the same way as native I/O. We can also access a web service. We can—we have compared with RPG for example in SQL we have around 180 built in functions. RPG has 50% of that. We have so many powerful things such as building all kinds of subtotals and totals that we can do in SQL so yeah.
Charlie: You know I think about all the programs that I've written or have modified in my entire career and certainly a large percentage of the lines of code in any one program is just for data validation—
Birgitta: Yeah.
Charlie: And that would all potentially all go away so now I—now I—what remains is the pure business functions and application and that's really moves us along to a more standard and industry accepted process.
Birgitta: Yeah, that's the next thing. So, you get young guys who never have seen native I/O. They learned all a little—at least a little SQL at school so again they feel more familiar if they see oh, that's SQL so I understand it. Now I can do it and the big logic is in the prog—is in the database itself and then hidden in the database in a view in constraints and I have seen it. It was already long ago I worked for a company, a warehouse management system and it was me who wrote all the strategies to find a pallet, to find a place in the warehouse, etc. so I changed them from native I/O into SQL. First, they were all skeptic about that and then I could reduce some of the programs from more than 2000 statements. What we did, we had a lot of use then OK and ideally it was time and place, take this view, OK found, great. Stop. If this does not work, take the next one. OK, great. Take it up and so on and finally yeah, the last thing you could not find anything but you had only a program with let's say a few hundred statements instead of ten thousand statements and even better was all these views. At that time, we had a lot of query users and if something goes wrong, they always called. Why did you find not this pallet, etc. so what we did, we just gave them query for a hundred based on the same views we used and say OK, you can go here. That's this and here and here and so we reduced all the phone calls to a minimum because they could have a look in the query for 100. OK, this one, this one and this one. Ah, here it is. OK yeah, it's crap. If the user did not do itself and called it, it was—we called the—OK we used the views and checked it and say oh yeah so, we had it in five minutes. Before we debugged and debugged and debugged and finally after two hours, we find out oh, yeah, it's crap. So, this is also one thing which reduced work.
Charlie: Wow. So Birgitta let me tell you where we are. We—you know right before we started this conversation, you and I both said how quickly time goes by and we—it—it sure has. I encourage anybody who is listening to this podcast what I would recommend you do is listen to it and listen to it again and maybe even a third time because we have compressed hours and hours and hours of content into this short amount of time and it's worth—it's worth hearing every single thing that Birgitta has said because what she has just offered us here are some really interesting and advanced techniques and certainly a roadmap that we should all be stiving towards ultimately so Birgitta, we could—we could—I mean you give—you give workshops that are hours and hours and days long, you know multiple day workshops. There is no way we can possibly fit that in no matter who small we compress these little bits of information but this has been a great start. I just want to thank you because this is just incredible information. It really is. What is your—what might be—if we're going to wrap this up now, what would your final piece of advice to anybody who is just starting on this journey now? What would be your final piece of advice just to get them started and to take that first step moving forward?
Birgitta: So, the good thing is you can start at multiple points. The most important thing is you have to start somewhere. My suggestion is you can start with convert from DDS to DDL or you can start with SQL views. You can start with externalize your data access, prepare functions, etc. These are all things where you can start and the most important things is let's start. If you create a new table, then make it correct. That means add an identity column; add auditing columns. Add constraints. Add referential integrities, etc. With the grown application, it is more difficult so we can only go step by step but as I said, starting to use—to hide complexity into views to start rethinking. These are points where you can start and you have to start.
Charlie: And of course, this is a journey definitely worthwhile taking of course because the benefit—
Birgitta: Yeah.
Charlie: On the other side, the benefit is tremendous.
Birgitta: It is.
Charlie: Terrific.
Birgitta: It goes on. With each technology refresh we get so many new things in SQL and the database. It's—it's really great and why to use old techniques if I—why to use RPG? I'm an RPG programmer but why should I write something with native I/O if it's much easier to do it with SQL?
Charlie: And can be maintained by a broader set of developers as well.
Birgitta: Yeah, yeah.
Charlie: For sure—for sure.
Birgitta: Sure, and if you get young guys, I just—I do not teach them anymore native I/O. We just have a look at SQL or I had some time ago someone who wrote—who was a COBOL programmer and he had to write very easy RPG programs so what I teached her is that OK use embedded SQL so that's the same thing in COBOL and RPG. Then a few if's and then’s and do a loop. OK, that was—that was everything she had to learn and then she could write her simple programs in RPG even though she never had seen it.
Charlie: Wow. Well, Birgitta as—as always, you've always—you've always—you've given—you've given far more than—far more information that I expected you could have fit into this short amount of time so thank you very much to that. It is always, always a pleasure to chat with you. I know we've been—as I said in the beginning we've had—I've had the good fortune to present with you in many, many different places, different conferences—
Birgitta: Yes.
Charlie: And it's always been a lot of fun so I do hope we can return to those days. I think—I'm optimistic that we will be in the next year so—
Birgitta: I hope too. I got also several invitations from user groups in the US. Hope we can travel again and yeah meet up in person.
Charlie: I think that's a good possibility.
Birgitta: It makes much more fun.
Charlie: Of course it does.
Birgitta: You know we—we—we met us in Denmark in the first conference at least for Europeans which was in person, it makes much more fun.
Charlie: Of course. It always does. It's always better in person. That's my story and I'm sticking to it. I love it, nothing like being there so it's great. Birgitta always a pleasure. Thank you so much for doing this. Thank you for sharing.
Birgitta: Thank you too.
Charlie: Just a fraction of your vast amount of knowledge in this topic so thank you very much.
Birgitta: You're welcome.
Charlie: Thank you everybody for listening to this podcast. As I said, if you want to follow Birgitta on Twitter you can do that. Her handle is Birgitta Hauser. You'll find her on Twitter. She's very active there, very active in the forums and as I said in the beginning very freely giving of her time so you can reach out to her or post questions and I'm sure you'll get a response at some point. Birgitta thank you so much, always a pleasure and we will end it here. Thank you so much.
Birgitta: It was also a pleasure for me truly.
Charlie: Great. OK, bye now.