Kai Stroh on Db2: A Not So Simple Relational Database
Kai Stroh, head of mainframe development at UBS Hainer, on the inherent complexities of Db2
Reg Harbeck: Hi, I'm Reg Harbeck and today I'm here with Kai Stroh, who is the head of mainframe development at UBS Hainer. Welcome. Kai, tell us how did you end up on the mainframe?
Kai Stroh: Hey Reg, thanks for having me. How did I end up on the mainframe? It's actually funny because I started with something similar to the mainframe when I was about 19 years old or so, which is I guess pretty young to be working in that environment with the mainframes and all that. Back in 1999, I had finished school and back then over here in Germany where I'm from, we had conscription. So everybody basically after finishing school had to either spend a year or so in the military or do something else, and there were a couple of options that you had, joining the firefighters or work at a hospital or something like that. And I ended up working at a hospital. Now, usually when you do this, what you do is you take care of the patients there. But what I did, and I guess I had just some connections there because I knew some people that worked at that hospital, and they knew that I knew my way around computers. So I ended up in the IT department and they didn't have mainframes, but what they had was the AS/400, which is a little bit like the mainframe—it's also an IBM machine. I think nowadays they call it the IBM i. And that's basically my first contact with the green screen. And when I was done with that, I went to university instead, and in between semesters I was looking for a job just for 2-3 months to get to earn some money. And I ended up at a company—which was actually not that far from where I live—called UBS Heiner. I had not heard much about that company, but it was only like five kilometers or so from my house. So I just went there and asked if they could use somebody like me, and that's how I started here. And that was a year later in—well, 2000—when I started at university. It must have been 2002, 2003 when I spent time between semesters at that company, and I stayed here after I finished and got my diploma.
Reg: Oh, so that's celebrating two decades there. That's pretty impressive.
Kai: Yes, I am actually.
Reg: So now I guess that UBS Hainer had been a mainframe software company by the time you joined them? How long had you been around before you joined?
Kai: Couple of years. I think they were founded in '97 or '98 and they were founded with—actually the first product that they had was a fairly simple product, but it was selling extremely well. And it was 1997 or '98 around that time. And you can probably imagine what that product was. It was something that helped mainframe customers make that Year 2000 transition. So it was something that would simulate the transition to the year 2000, and you could do that a year ahead of time basically and see if your JCL and all your programs were able to work with the four-digit year. And that thing sold so well that the entire company basically got started off of that. And that gave us the funding basically to do all other products as well.
Reg: Cool. Okay. Now my impression is that you folks really are focused on Db2, but I gather there's a lot more than Db2 in that case. So maybe you can talk about some of your products there.
Kai: Db2 is certainly the main focus, especially nowadays. It used to be that we had more products that were not so much focused on Db2, like the one I mentioned. Now interest in that dropped quickly after 2000. You can probably imagine that maybe you should bring that back for the looming 2037 problem, when the 32-bit integer time value is going to overflow. But yeah, we had some other products that would help you convert between currencies of different kinds, and also keep track of programs that were being executed on the mainframe for doing inventory, and also for seeing if you had maybe programs license that you were not using anymore. That gave you some ammunition that you could go to that vendor and say, hey, I'm only using that on this one LPAR and I'm paying for the whole system, so can we renegotiate maybe. But there have also been Db2 products pretty much from the start, and one of the first ones was actually one that would clone an entire Db2 system with everything that was in it. And it would do it in a very fast and efficient way, which was great, but if for some people it was just not flexible enough. Some people said, we don't want to clone the whole Db2. We want to clone parts of it, just a couple of tables. And so from that product basically we built other products as well that added more flexibility, sacrificing a little bit of speed. But that's basically how it all got started.
Reg: So one of the things I've observed dealing with various mainframe software vendors is how, unlike other platforms, that the software in the mainframe really is a function of customer demand first that leads to the innovation. And so there's a lot less speculation about whether or not it's going to play because people are actually asking for it before you get developed. And so you get some really interesting journeys that the software takes of becoming this, that, and the other thing. Maybe you could just take a look at one of your key products and give us a sense of its life journey, how it started out one way and became something else.
Kai: Well, the one product that I actually know extremely well is called BCV5. And that actually was started because we had some demand from an IBM customer who was looking for ways to make quick copies of their database tables, which is more complicated than you would think on the mainframe system, on the mainframe Db2. You don't just copy your file and then you have a copy of your database table. It's much more complicated than that. And IBM has something that is part of Db2 that allows you to do that—it is however, not exactly easy to use. So you have to code a lot of JCL and you have to know exactly which file needs to go where. And there's this thing in the Db2 database that inside those files that hold the data, there's also like hidden IDs that have to match exactly certain numbers that Db2 assigns to your table. So you have to translate those IDs when you make a copy. And IBM had something for that, but it was really awkward to use. So these guys approached us and they said, isn't there a better way to do this? And first of all, we need to automate this whole thing. We need you to find these hidden numbers automatically and translate those. But we also need you to make sure that we even have the definitions for our target system—like the table itself, the logical object, the database table—that needs to be created. It's not sufficient to just put a file somewhere. You need to tell the database system there is a table with that name and it has these columns, and then there's a file that goes to with that table, and you override that file with new data. That's basically how it works. So that was customer number one. Basically we built BCV5—I guess you could say specifically for them—but we already tried to make it into a product that other people could also use. We're trying to make standard software, so we couldn't make it too specific to that one customer. We had to make it generic so that we could sell it to other people as a full product basically. And there was actually—I think I can think of at least two or three products that started out that way. Another one would be a lock analyzer that was started because people were saying, well, there are two or three lock analyzing tools on the market. They're super expensive, though. Isn't there a way to have something cheaper? It doesn't have to have all the bells and whistles—just give us this and this information from the Db2 lock. Can you build something for us? And that's how another product got started, both in the Db2 area. And those are just two examples basically.
Reg: Now, one of the interesting things about Db2—I get a kick out of how on the one hand, the theory of Db2 is just, it's like a spreadsheet. It's just a simple relational database. And the practice of Db2 is just mind-bogglingly complex, and especially because there's all these really hidden things. We talked about the hidden numbers that refer to the tables. There must be just so many things that you have to track in order to do—oh yeah, AEG legitimate copy and especially real time copy. Maybe you can talk about some of those issues.
Kai: Well maybe before I do that, maybe let's have a look real quick at the reasons why it needs to be so complicated. Because you would think, why does IBM have to make it like this? Why did they design it that way? Why couldn't it be easier or simpler? And I guess the main reason is that Db2 really focuses on data integrity, and also that your data is basically safe when something bad happens like a crash or a disk failure or something like that—which to be fair doesn't happen a lot these days. But a lot of these mechanisms and a lot of these things that seem over-complicated come from the fact that Db2 really tries to keep your data in order and keep everything—keep the integrity and everything like that. So one of the things that IBM does, for example, they have some sort of timestamp built into every page of—well, the files that are sitting on disk. Now a page is typically 4-kilobyte block that can be 8, 16, or 32 kilobytes. And each of these blocks has a built-in timestamp that tells Db2 exactly when this block of data was last touched. Now when you take this and copy it to another table—and that table could be hosted on a different Db2—the other Db2 will see that copy of the data, and it has this an internal clock of its own. So when I say timestamp, it's not really a timestamp—it's more like a counter. And when it compares the counter in the file with its own internal counter, it could very well be that the target Db2 says, well, I see the data, but I also see that this data seems to come from the future. So that's definitely a problem, and I'm not going to touch that data. So that's just one example. There are more, like when something bad happens—I mentioned that could be a disk failure or something—Db2 needs to be able to recover the data that maybe was just being updated as the problem occurred. And it's not just Db2; basically every database system has mechanisms like this. So there are things like the Db2 log and there are things inside the files that Db2 can use to see exactly what was already done when the crash happened, and what was maybe done by the user but it had not yet made it to disk because it was maybe stuck in a memory buffer and Db2 didn't get the chance to write it to disk before the system went down, something like that. And I think these mechanisms are mainly responsible for a lot of the complexities that we see in seemingly simple things like making a copy of Db2 data.
Reg: So there's obviously a whole lot of really interesting considerations here, but I guess my sense is then one of your challenges is that—on the one hand, your utility is trying to be efficient, but on the other hand in trying to, for example, copy directly from this versus going through Db2, you can be faster, but then you have to do a whole lot of figuring stuff out that Db2 isn't telling you because you're figuring it out yourself. Maybe you should talk about some of the things you have to scramble and figure out just so Db2 will accept what you've copied.
Kai: Oh yeah, definitely. So I mean a lot of things are actually documented. There's official documentation from IBM. It's not publicly available, but you can get it if you are an IBM—what is that? Like a business partner—I believe it's what they call it. Then you have access to certain internal information, and there is a manual that is called the Db2 Diagnosis Guide and Reference, and that actually has some internal information that you just cannot find in at any other placeyou certainly cannot find it publicly on the internet. And I even believe as a normal Db2 customer who is maybe working at a bank and just keeping their data in Db2, I'm not even sure if you have access to that kind of manual, but this manual describes a lot of the internal structures that we find inside the files. And you are absolutely right. It's basically like reverse engineering a file format. It's like—imagine you were trying to create new spreadsheet software from scratch. Obviously you'd want it to be compatible with Excel, because that's what everybody's using. So you want to be able to read and write Excel files. That's basically the same thing. You need to understand what an Excel file looks like inside, how it's structured, and you need to be able to read and write properly and also take care of all these little edge cases. So for Db2, one example—maybe an it's an ugly example, at least I find it's an ugly example because it's in my opinion, one of the things that is not very well implemented in Db2 is the way that Db2 keeps track of schema changes. A schema change is when you already have a table—it has columns, it has rows—and now you decide that you want to have a new column for that table. So it used to have 10 columns; now you need an extra one. You want to have 11 columns. From an outside perspective, what you do is you say alter table, add column, and then you specify the name and if you want it to be a numeric column or a string column. Then you hit enter and the new column is there immediately. You will see it; you can query it. You will even see that new column for all the existing rows that you already had in that table—it could be a billion rows in that table already. You say add column and it happens instantaneously. And of course you need to specify some sort of default value that Db2 should assume for all the existing roles. And then for all the new roles, you are able to specify new values. So how can Db2 do that? How can Db2 make that happen instantaneously? The way Db2 does that is they call it versioning. Db2 keeps track of a version number of a table. When you create a table, it starts with version number 0. That's my example: I have 10 columns, that's my version 0, and now I decide that I want to have another column. So I add one and now I get table version 1. Sometime later maybe I add another column or I make an existing column longer—that gives me version 2 of that table and so on. As an end user, I always see the most recent version. I never see the old stuff; I always see the newest format, the newest layout for all the roles, even the ones that are very old internally. Db2 can make it happen instantaneously by basically assigning a version number to each row. So all the existing rows that I already had on my table, they're now version 0 and all the new rows that I add later, they're version 1. And when I access row, Db2 will see if it's the most recent version, which is great. Then it has all the columns that it needs, or if it's an older version, which means that columns may be missing or columns may be too short physically on disk. And Db2 is of course clever enough to then—they call it materialize the row. So they convert it internally real quick into the most recent version by just showing me the default value for that column. That doesn't really exist in that row and that's what my application sees or what I see when I do a select on that table. So from my point of view, that new column is there immediately and it's there for all the existing roles as well. Internally we have these two different versions, so we have two different little formats. So for each row that I have in my table, when I work with that file or a file system level, I need to be able to tell if this is a version 0 or a version 1 row, and I need to treat it correctly when I make a copy. And of course that's not enough, because the description in Db2 itself—what that table looks like—it has all the columns, but there must be some sort of history so that Db2 knows that that table used to have 10 columns, now it has 11. But it used to have 10. And that is the ugly part because that description is inside an internal Db2 structure, but it's also inside the file where the roles are. And when you copy that over into a different Db2, you may be copying from a production system into a test system. So you need to be very, very, very careful and also do some sort of translation and adjustments to make that target Db2 accept that, because the target Db2—maybe I created the table from scratch with 11 columns from the start, so the target Db2 doesn't even know that there used to be a time when that table had only 10 columns and not 11. So that's the kind of internal things that are going on there. And we need to adjust this structural information. We need to make sure that there's no discrepancy between what's in the file and what's in Db2's internal structures. What Db2 knows about the file and what's actually in there—that must always agree 100%. Otherwise, when you try to access that target table, anything might happen, from incorrect results to your application getting a weird error from Db2.
Reg: Well of course on top of that, then there's just a question: If you're copying a live Db2 database, what do you do about copying insufficiently written information—when you sort of copy halfway through a transaction? What do you do about that?
Kai: Yeah that's a fun one, because that's when we need information from the Db2 lock. And the Db2 lock is also documented, but the documentation, when you look at it, it's kind of an Assembly language structure that you need to be able to understand. So it's not like written documentation, it's not written in English, but basically you need to understand Assembly language to see what's going on there. And what happens is that when you allow people to make a copy while other people are actively using a table and inserting roles, deleting roles, or changing existing roles, that's when you need to be extra careful, because you can end up in a situation where while you are copying that file, somebody changes it and there's this added difficulty of changes. They usually don't go directly to disk because that would be too inefficient. Db2 has buffers in memory—they call it buffer pools. And when you insert a row, it goes into that memory buffer first. So it may be on disk, maybe not yet on disk—it depends on the exact second when you copy that file. Also a single insert or a single delete or something like that is usually part of a larger transaction. So maybe I'm inserting a thousand rows and you caught me while I was just inserting row no. 567, and that's when the file was copied. So the first 567 rows were in that file, and the other ones that I inserted later were not yet in that file, so they never show up in the target. However, all 1,000 rows belong to one transaction. So from a database point of view, I either want to have all 1,000 rows or none of them, but I can't have some of the rows from a transaction and some I don't have, because that would violate the transactional integrity. So that is something that we also need to look at, and that's when we start looking at the Db2 lock—because the Db2 lock is that big journal that has just every single change that happens in to any of your tables. There's an entry in that journal in the Db2 lock, and there are also entries that tell us when a transaction starts, and then we see all the inserts in my example, and there's a little marker there that says: this insert belongs to this transaction. And then at the end there's going to be a special lock record that says this is where the transaction ends. And then in this case our BCV5 product goes in there and says, okay, I see that this table was copied while it was being used, so let's have a look at the lock. Okay, I can see that the copy took place while the transaction was open. So what the product is going to do, it's going to roll back all those changes—all the roles that had already made it to disk when we made the copy—but from a database perspective, they were not yet committed, so they should not be visible to anyone, so we take them out of the target. That of course requires some extra processing. We copy the thing, and then when everything is copied, we need to do some post-processing to get rid of those open or those half-committed changes, if you will, to make sure that everything is consistent. And of course, that's not just for one table, because people typically copy not one table, but maybe 1,000 or 5,000 tables, and every table has an index or two or three or five. So there are a lot of objects to take care of, but it works beautifully. It takes some time depending on how much activity you actually had while you were making that copy. If nothing happened, then it can be very, very fast. If it needs to undo a billion changes, then yeah, it'll take a while, but it will get the job done.
Reg: Well I'm going to ask next, I think, just about what you do for speed, but before that there's a word you keep using that is so important. That's integrity, and that word shows up in so many important ways when you deal with DVG tools. One of them is of course referential integrity. Maybe if you can use that as just a way to illustrate the importance of integrity and what your tools have to be consistent with.
Kai: Sure. I don't even need to get to a complexity of my previous example, where I cut transactions in half or so. Let's just say we're doing an insert one single row, but that table happens to have an index. So what happens is that Db2 inserts that new row into my table, but of course it also has to add a new entry to that index, because the index always needs to reflect what's in that table. So that's actually two changes to two different files. The table has a file and the index also has a file on disk. So two different files get changed. And if I have a big copy process that starts at—let's just say it starts at 8 o'clock and you cannot really predict the order in which we copy the files. We have 10,000 files to copy, and maybe at the very beginning of the process we just happen to copy the file that has the data from the table, and when that's all ready, we copy that thing over to the target, and then 20 minutes later, that's when the insert happens. So in my production system, Db2 inserts a new row into my table file and a new entry into my index file. And then another couple of minutes later, our copy program, BCV5, gets to the point where it's now copying the file that belongs to the index. And at that point the new entry is in that index file, so that gets copied to the target. So what do we have in the target? We have a table that doesn't have the row yet, because the file that belongs to the table was copied before the insert was made, and we have the file that belongs to the index and that has the entry for the new row because that was copied after the insert had been done. So now we have a discrepancy there. It's not a referential integrity problem because we're not talking about two tables where one is a parent and one is a child table, but this is actually an integrity problem between a table and its index. So the index has an extra entry that we don't have in the table. So you might think this is just a test environment. I don't care if there's an extra index entry that I'm not going to even use, or maybe it's going to affect 0.01% of my tests. But the truth is that when you have a situation like that and you run a query on your table, you can get different results based on how Db2 decides to access the data. Because Db2 knows it has an index, and two, it also knows it has a table, and it knows a little bit of how many rows there are in that table. And when I retrieve rows from that table, Db2 could say, it doesn't really make sense for me to use that index because of certain reasons. I'm just going to go straight to that table. And in that case I will not see that row because it's not in the table file. However, if Db2 says for this query that you're giving me, it actually makes sense for me to access the index and then fetch other information from the table, that's when I get that extra row, because I have something in the index for that. I get that when the information in the index is sufficient to answer my query. But when Db2 has to go to the index first and then retrieve additional information from the table, that's when there's a big problem. So I mean, there was a problem right now, because depending on whether I go to the index or not, I see this extra row or I don't see it, but when I want to have column information from a column that's not on the index, Db2 goes through the index because it says, well, you said where ID = 1, 2, 3, so I can use the index for that. It finds that entry, the entry points to the file in of that belongs to the table, but the point, the offset word points to is empty because we copied that thing when the road wasn't there yet. And that's when Db2 will tell us there's a big integrity problem. I think what you get is an SQL error. I think it's a -911 or something like resource unavailable. And the error description will basically say we have an index entry that has no corresponding role on the table. That's a big, big, big problem. Processing is going to stop. Your application is most likely to crash because Db2 is really trying to guarantee data integrity. And there are mechanisms built into Db2. When Db2 is not sure if everything is okay, it will actually restrict access to your tables. However, when we go ahead and do something on a file system level, we're basically working behind the back of Db2. Db2 doesn't know that we did some magic there, so it cannot put the objects in any restricted states. That's what they call them. So Db2 thinks everything is fine until it gets to that point where we actually want to retrieve data, and that's when Db2 detects a big problem, and that's when my application crashes.
Reg: So all these different things of integrity, I start to appreciate why Db2 can move so ponderously slow. And so the question is why does your product not move ponderously slow?
Kai: Because we're not going through the Db2 engine, if you will. We're basically bypassing a lot of Db2. We're operating directly on the files that Db2 writes to disk, and that avoids a lot of conversion back and forth. So the normal tool that you would use to copy data from A to B, it's called the unload utility—and the load utilities, those are fine utilities. There's nothing wrong with them except for the fact that they're a little slow. What they will do is they pull data out of a table stored in a normal file, and then that's what unload will do. And then load takes data from that file and copies that into a different table, and this goes through the Db2 engine. So Db2 sees every row—every row passes through Db2 and Db2 can do all sorts of conversions and checks and make sure that the index is always up to date, which is great for integrity, but it costs time. And when we copy the thing on file system level, we avoid all these conversions, all these different checks that Db2 does, and that gives us a lot of raw copy speed. But we need to be very careful that we don't violate any integrity between tables and indexes or between two tables that have a relationship. And to do that, obviously you can stop the source tables, then make a copy, then start them again. So essentially don't allow anybody to modify the source while we're making a copy. Or you can do what I just mentioned where we take information from the log—some people call it a dirty copy, right? Because there could be little problems in there, but then we will get rid of those problems by taking the log information and bringing everything in the target back to a consistent state.
Reg: When you talk about source, of course I think of programs, and one of the interesting things about Db2 is—I mean it it's nothing if not extensible. There's even things like user-defined functions and all kinds of other stuff. That must have been an interesting part of your journey in terms of doing a business relevant efficient copy and yet dealing with every last little detail that Db2 offers. Maybe you could talk about some of the interesting challenges that you've had of keeping up with Db2's complexity.
Kai: It turns out that the actual data movement that's really just—I was always saying tables up until now. What I should have said was table spaces, because the table space is basically the file on disk, and the table is then the logical object that's in the table space. But for the sake of simplicity, I was always talking about tables. But it turns out that on disk there's really only table spaces and index spaces. And no matter what other objects you have—if it's a view, an alias, a trigger, a user-defined function or procedure—those are all logical objects that only live as descriptions in what they call the Db2 catalog. So it's just internally Db2 structures, but it's not reflected physically on disks. There's no extra file for your trigger, there's no extra file for your alias. So the data movement itself, that's really just table spaces and indexes and that's that. But you're absolutely right there's more, especially when you look at an application. Because an application needs the tables obviously, but the application might also need views because certain queries refer to views to make things easier for the application programmer or the application relies on the fact that there is a trigger on the table that will do stuff when I insert a row or a function that I can call or a procedure or something like that. So in addition to the raw data, basically we also need to make sure that the logical objects are all there. And that can be also a challenge because you often have your target environments—well, often they already exist, but they may not be up to date. So tables may be missing, some indexes may look different, a table may exist, but it doesn't have all the columns that we have in production because production was migrated to version whatever last week. And now we have a couple of extra tables and a couple of extra columns, and we didn't do that migration yet in our test system. So now we want to make a copy with BCV5. How do we handle that? And what I didn't actually mention is that when you copy Db2 data on a file system level, you better make sure that the logical objects—the tables in your source and target, which is usually production and test systems—are 100% identical. If they're not, then on Db2 the target, which is usually the test environment, is going to struggle with the data. It's not going to accept it—it's going to give you error messages because it says, well, my table definition says that table should have 10 columns, but the data, it has 15 columns. So that's not right. Something is wrong here, and I'm not going to continue. So that's another aspect basically—which is also built into BCV5, the whole thing with structures—taking care that you have the same structures and source and target and the ability to create all those tables, all those indexes, all those views for you if you don't have them already. And also compare if you have them, compare if they're compatible or not. And if they're not, do something about it. Make them compatible, align the structures and test and production basically. So that's also something that we do.
Reg: Well, that's obviously very important. But one of the weird things that happens when you copy data from production to test is sometimes you don't want some data copied, and that obviously must complicate things even further if you want to avoid certain PL/I or at least make sure it's handled properly. How do you deal with that?
Kai: Basically we call that masking data. Masking is a term that we use for that, and it is something that BCV5 can do. It's maybe not the main focus of the product—the main focus is clearly making fast one-to-one copies. But there are situations where you just don't want to or even are not allowed to make a one-to-one copy of production data and use it for testing. And the obvious example—you just mentioned it—is if you have PL/I in your production tables. And that could be anything from Social Security Numbers, credit card numbers—even something like a postal address is considered personal information—and there are basically rules and regulations in every country on this planet, what you can and cannot do with this kind of information. And that's why we have added the ability to BCV5 to modify data while we copy that—and I mean, there are some restrictions. What you can and cannot do can't be too complicated. But for certain things—like replacing names or postal addresses with fake data that looks real or replacing credit card numbers or Social Security Numbers, social insurance numbers—with numbers that have, for example, a valid check digit credit card numbers have those, that is something that BCV5 can do. Speed will suffer. I'm going to be honest here, because that's when we can't make a one-to-one file system level copy anymore. We can't even do the straight unload load anymore. But now we actually have to go into every single row and look at the data and do something with it, and maybe even go far as retrieving a different address or a different name from a table that has just a big list of names somewhere. And that takes time, unfortunately. It can be done. BCV5 is able to do that, but it's not going to be as fast as a straight one-to-one copy. And as much as I hate to say it, you also lose the ability to make those copies where people can modify your tables and you still get integrity, because when you change data while you make that copy, the information that you will later find in the log no longer reflects what's in the target table, because we changed that. So we can't apply the lock anymore. The solution for something like that is to make it into a two-step process. So first step is to make a copy without modifying data, but where people can make changes to production. So once you have that copy, you need to restrict who can access that. Not everybody should be allowed to access that, so put some restrictions on that table. But then when you have that first copy, that's a table that nobody is really working with, so that table can be stopped, you can be sure that nobody's making any modifications, and then you can make that second copy where you then get rid of the PL/I that you have there.
Reg: So many considerations for really meeting business needs. So on the one hand, personal identifiable information—PL/I is one area for that matter—just the whole question of regulatory compliance. But there must be such an incredible range, because this is about doing business. Maybe you can think of some of your favorite customer examples of how they've been able to take this and actually get a substantial business value from the full range of features and functionality that you offer.
Kai: Well, the prime example is always when we have somebody who has an existing process to copy. It's mostly copying data from production to tests. That's the main thing that people use BCV5 for. And when you get to a customer, or somebody's interested in BCV5 and you look closer at what they're currently doing, everybody has a process for copying data from production to test. Everybody needs to do testing, and to do testing, you need data. Where does that data come from? Where it often comes from production. So everybody has an existing process, and in most cases that would be a process that is based on unload/load, because that is something that comes with Db2. It's not fast, but it's solid. So it's easy enough to use and it usually doesn't give you any trouble in your target—unlike that other tool that I mentioned in the very beginning, which could copy on a file system level but was really difficult to use, which was what got PCV5 started as a product. So let's focus on unload/load. Let's say you have a process that moves an awful lot of data from production to test. Typically that takes several days to get everything done, maybe a week or two. So that's not unusual, and there can be reasons for that. Maybe because you know can't take data from the real production tables because they're constantly changing. So unload is not really equipped to deal with that. That's the part where we will take the lock, but unload can't do that. So what people end up doing is they have image copies—that's what they're called. They're basically backups of your table. Db2 needs them to be able to recover if something bad happens, but you can unload those backups and that's where people take the data from. Now you need to identify where those backups are, what those files are called, and you need to tell unload, okay, this table, please unload it from this backup file and the other table unloaded from that backup file. And you do this 10,000 times because you have 10,000 tables. That's a lot of jobs and a lot of job steps that you need to write and also maintain. If you add a table, you need to modify your process. If you drop a table or add something else, you need to modify your process on top of that. It creates an awful lot of temporary files that are then just sitting there containing production data. So you need to think of protecting those files from access. And then the second half is of course loading all that data then into your target tables. And I mean load is reasonably solid, but there are things that can go wrong. Load goes through Db2, so Db2 will do a lot of checking and for example, if you have two tables, they are in a parent-child relationship: referential integrity. And you load the parent table, what Db2 will do is it will restrict access to the child table because it says, well, you have just loaded new data into the parent table, and the load is usually executed in a way that integrity is not checked for every row, but instead you accept the fact that load is a little faster, but then access is restricted until you check the integrity of every row. So access is limited to your target tables. You need to get rid of that. There's an awful lot of things that you need to do on top of just making sure that the jobs run these things, like getting the order and then doing additional things in the target—some people call it babysitting the whole process. That takes a lot of time, and you don't have all that with BCV5. And so when we go into a company like that and we show them BCV5, we say you can basically use this as a drop-in replacement for your existing load process, because you specify the tables that you want. And instead of 500 jobs they suddenly only have six jobs, because that's how BCV5 is designed. It's always the same number of jobs no matter how many tables you're copying. So they only have six jobs to execute, and they run much faster too because they're operating on file system level. They don't produce any temporary files, they go straight from source to target. So straight from production to a staging area or maybe to test even if those are on two different machines, by the way. So we can copy over network if we need to. And that saves a lot of time, saves a lot of distance space and a lot of headaches for the DBA. And typically we can replace a process that I mentioned, like a week or 10 days maybe with a BCV5 process that finishes it for you. You started in the morning and it finishes before lunch.
Reg: Wow. Yeah, I just saw the business value flash right across my perception of all this. That is spectacular. Now, I gather that BCV5 is not an island, but it's sort of in the context of a bunch of related solutions. Maybe kind of take a 5,000-foot view of all of your solutions and how they bring a business value proposition to your customers.
Kai: Yeah, I'll try. So BCV5, the way it's built, it's working on file system levels. So it's great if you want to be a little selective and you say, I want to copy of this table, but not that one. That's great. If you say, I want to copy the entire Db2 with everything that's in it and it's maybe 50 terabytes worth of data, that's where you get to a point where it doesn't make sense anymore to copy that on a file system level because you want everything anyway. So it makes more sense to clone entire disks. And there are specialized tools for that that do it extremely fast. They work similarly to how a rate system is set up. So basically you push a button and then logically the copy is done in a second. Physically behind the curtains, the disk cabinet keeps working until everything is copied. But logically you see those new disks immediately and you can work with them and they are already independent from the original disks. So there are ways to make these extremely fast copies. It requires hardware support—IBM sells it as FlashCopy. Other vendors have different names for that, so there's things like TimeFinder, but essentially they all work in the exact same way logically. They make an extremely fast copy which you can use immediately. And when you want to clone an entire Db2 with everything that's in it, that's a good alternative because it's much, much faster than BCV5. I mean, BCV5 is already so much faster than unload/load, but a volume-based clone is even faster. So 50 terabytes in two hours, not a problem. If you look in the other direction, maybe you don't want to copy everything that's in a table, but maybe you want to be more specific. You say, I want only 10% of the roles, or maybe you say I only want customers from a certain ZIP Code area. Then it gets a little more complicated because it's usually not only one single table that you're looking at, but it's other tables that have relations with that table. And if you're picking the customers from a certain ZIP Code area, that also means that the invoice table—you need to be very specific here as well and only copy the invoices from those customers in that ZIP Code area, because the other ones you couldn't even insert because those customers are missing in the parent table, so integrity would be violated for that kind of kind of stuff. We also have specialized tools. Actually, it's called XDM. It's designed to do this. And when you think of what needs to be done there—look at every single row and decide whether it needs to be copied to the target or not—you can imagine that it can never be as fast as a file system level or even a disc clone, but it gives you so much more flexibility. And since when we were developing this product, this XDM product, we said you know what? Since we're looking at every single row anyway, we might as well use standard SQL statements here like select and insert. And that way we're no longer limiting ourselves to Db2 for z/OS, because everything that we talked about so far was always limited to the mainframe platform. So the BCV5 product can only work on Db2 for z/OS. The one that I mentioned that is the full disk clone, that's also mainframe-only. But with XTM, we actually opened up and we made our very first steps on the non-mainframe environments, so the Linux, UNIX, Windows, databases like SQL Server, Oracle. There's of course also Db2 for Linux, UNIX and Windows, Postgres, SQL, MySQL, MariaDB. So these things can be supported by a tool like XTM easily because it's using standard interfaces. And that's basically on the one end of the scale. There's raw speed—you can't get faster than cloning disks. On the other end of the spectrum, there's this super flexible tool which uses select and insert. So it's slower, yes, but it gives you that extra flexibility. So you're always sacrificing one or the other, and the middle ground is kind of BCV5, which is very, very fast—not as fast as a full disk clone, but much, much faster than copying individual rows—and still gives you a lot of flexibility when it comes to making copies without stopping the source. And even doing things—I haven't mentioned that so far, but renaming your tables: Your tables could have different names in your test environment, different schema, different table names. That's not a problem for BCV5 or XDM. For a full system clone, that's not possible unfortunately, because the full system clone always includes the Db2 catalog where all these logical descriptions are, so you can't change those.
Reg: So basically, over I guess a quarter century, your products have grown up with the evolving nature of Db2 and obviously other database usage. As you take a look at that journey and you sort of look into the immediate and not so immediate future of databases in Db2 and how your company is going to continue to enable business value, what are some of your thoughts about where we're going next?
Kai: Well obviously something like BCV5 is a very mature product, so we're not reinventing or inventing new stuff every other month or so. That's something that exists; it does its job extremely well. That doesn't mean that we have nothing going on there. I mean, we keep track with IBM. When IBM releases a new version of Db2—or maybe a new function level for Db2, version 13 or so—it could be that we need to make certain adjustments to BCV5 in order to support new features or in order to make sure that everything still works as expected. So there is ongoing development there. And also still to this day, there's sometimes customer requests for features that we simply didn't have because nobody was asking for them. And one example—and I believe you even mentioned that before—it was user-defined functions. It's something that, I mean, you could argue that it's just a logical object you created once and then it's there and then you can use it. So maybe there's not a need for a process that can copy those functions every time you push a button, but there is actually some interest in that. And an existing BCV5 customer approached us and said, I have a couple of UDFs here. And I guess the way they work is they often clean up their systems. So it probably has something to do with the whole DevOps thing going on. You create an environment, you use it for a while, and then you throw it out and they actually want PCV5 to copy these UDFs too, which for us meant adding support for, in this case, just the DDL generation. So DDL is the language that you use to create objects because there's no physical file behind the function. It's just a definition inside of Db2, but we need to create the statement that we can then use to add that function to the target Db2 so that applications or maybe views or triggers can use that. And that's something we have added. We didn't have that. Nobody had asked for it in over 20 years—now somebody's asked for it, so we're adding it in general. Yeah, I mean in general—like I said, the mainframe products are pretty mature. There's a lot of stuff going on in XDM. I guess the non-mainframe world is maybe, I don't want to say moving faster or something like that, but it's maybe not as homogeneous as the mainframe. So there's a lot of more different database systems, and people regularly ask us for support for a certain database system that we've never heard of, and then we need to check that it actually works with standard interfaces and such. And there's a lot of things going on there in the whole DevOps department where people create very, very small environments and they don't want to copy, not even for 30 minutes or so, huge amounts of data, but they need five tables with 50 rows, and then like five minutes later when the testing is done, that table gets dropped again. So that's something that it's getting to the mainframe, but we see it more on the non-mainframe platforms. It's a little more Agile maybe, if you will, where the mainframe is maybe the more conservative platform where things move a little slower, but it's rock solid.
Reg: Well Kai, this has been absolutely fascinating and very informative. Are there any closing thoughts you wanted to share with us?
Kai: Closing thoughts? I mean, when it comes to our products, if the things that I talked about hit a chord and you say I want to see how that could maybe help me in my environment because I have these unload/load jobs and I'm always tasked with copying this environment from A to B, and it always takes me so long. Or maybe you're saying I wish I could have my refreshes done much more frequently, because right now we're only doing refreshes of our test environment maybe once a year. Testers would like us to copy that much more often, but we just can't because it always takes three weeks, so we can't do it every other week. Then maybe if you want to look at PCV5, get in touch with us. We offer free trials; that's not a problem. We help you install it. Get started on your z/OS platform. If you are not working with z/OS but a different platform, maybe look at XDM if that sounds like it could be something for you.
Reg: Thank you very much. Really enjoyed this.
Interested in learning more? Check out UBS Hainer's website.