The Balancing Act Between Security and Maintenance
Vanguard exec Milt Rosberg on patch management and other security conundrums on the mainframe
Reg Harbeck: Hi. I’m Reg Harbeck and today I am here with Milt Rosberg, who is the global VP at Vanguard Integrity Professionals, and we’re here to talk about patch management. Now Milt and I are both mainframers by background, so we have a particular experience with what the rest of the world called patch management that is somewhat focused on SMP, and it’s sort of interesting that the two things have kind of grown up together over the years. So Milt maybe if I could start by asking you to really give us a picture of what is patch management in the world of mainframe today?
Milt Rosberg: Well historically, when you had a mainframe system—let’s say you’re in the banking community or the insurance community—the last thing you wanted to do was fool around with your mainframe if everything is working. I mean they just didn’t want to apply anything to it. Actually, we have clients that we’ve worked with that claimed they were going to get off the mainframe and they wouldn’t touch it for 5-6 years and then come back to us and say, you know what? We changed our mind. The application doesn’t work on whatever other platform they were trying to use and we had to bring them up to speed but traditionally they would, maybe every year or every 18 months, they would apply a thing called maintenance at that particular point. Maintenance would come in two versions. One would come from the vendor. It would have all their recommendations and you would load the maintenance on, or maintenance would be like PTFs, things that were pretty critical to make sure the system worked correctly, but it was more on the operational side than it was on the security and compliance side. So the idea of patch management in my opinion in the last three years or so on the z/OS environment is being pushed downward from organizations with a board of directors making sure that all the systems in the enterprise have the latest releases and patches in them, because they don’t want to introduce anything into their system that could possibly cause an outage for any of the applications that they are running. So my early days I just remember nobody wanted to touch it if it worked. In fact, they would actually stay on older versions of their hardware because they didn’t want to go to a new version, take a chance it would drop. But now it has changed where you want to put in the very latest things that you can to make sure that you don’t have any vulnerabilities in your software, or if you do, you correct them as fast as you can.
Reg: Now I remember—I mean one of the things I was taught as a junior system programmer is the whole process—you know receive, apply, test, test, test, test, accept, and then roll out to production—and there was this whole discussion of leading edge versus bleeding edge, and as you point out the hesitancy to apply any maintenance that hadn’t proven itself in other environments, you know that hadn’t had further fixes added to it. Now as you point out, the contrast between that and the urgency that comes with security-related maintenance seems to create almost two separate things, but the problem is everything touches security. So maybe you can kind of give us some thoughts about how has the balance changed? How has it shifted?
Milt: It’s interesting you bring up the whole change management concept. We’ve worked with clients before that have applied what they thought was a fix or a system person applied something that did not go through the change management process and put it on a test system, and the put that on a preproduction, then put that on an LPAR that was going to check something out even more and then go ahead and go live. Maybe they applied something to a production LPAR, and when they did, without going through the normal channels that the organization provides, they could actually bring that LPAR down or bring the network down or bring that system down. We’ve been called in when that has happened—not necessarily with our tools but just we get notified that hey, somebody made a tragic error and they loaded something without going through the process. So you still have to go through the very same process that you talk about. It has to go through the change management control, making sure that the software is correct and that these are the kind of things that you want to have. There’s another part of this that really needs to be discussed, and that is making sure that the code you’re putting on the system has been scanned to make sure that it doesn’t have any vulnerabilities in the code itself, and this code vulnerability scanning software—KRI had it, and then they got purchased by Rocket [Software], now Rocket has that and I believe IBM also provides it—that’s a capability that should also, this is my opinion, be introduced to this whole idea of making sure that whatever you put on the system is correct. Now you have the other half of this. How do you determine which is the correct patch that you want to have implemented on your system when every patch becomes a priority and then no patch ends up being prioritized. It’s almost like it has this life of its own now. If you knew in the past with the mainframe that every six months you were going to go ahead and put whatever applied maintenance—that’s from your vendors and from IBM, the z/OS environments and you were going to apply all of that every six months. You would go through the very careful process of change control, looking at the software, making sure that it works correctly, loading it up, putting it on the test LPAR, and then you would gradually distribute it across your enterprise. What we’re finding with our clients is that they have gone from the 18 months to 12 months to six months to once a quarter and in some cases, once a month. So what do you do if you have an environment that’s just, say, 60 LPARs? That’s really not that large a system. We have clients that have over 100 LPARs, but even 10 LPARs—how do you get the software loaded up on the 10 LPARs with great confidence that you have the correct changes in there, and you get it distributed across the LPARs and you could validate those in the process of being taken care of. An example of a client that we’ve worked with is they have an enterprise-wide environment and they actually take the software that they want to load and they put it in test LPARs. So they have roughly 100 LPARs, they put it in each of the test LPARs, which are DR LPARs. They run it in those and then they spend with a group of people—I’m going to say anywhere between 8-12—they would load it up on the other 100 LPARs and then they would make it production. So if you do the math and the amount of hours that you have to put in to get the systems up to date with the “patches,” it’s a tremendous amount of work. Historically we’ve gone from this place where 18 months is great, once a year is really fast, every six months is, well that’s pretty aggressive, to once a quarter, and now we’re looking in some cases where it's once a month.
Reg: Hmm. Well looking at the concept I think is important, because of course the idea of a patch is you’re fixing something, but not all maintenance is fixing stuff. It often introduces new features and new functionality, and it also introduces more security exposures. What process do you recommend for people—of course you think about the prereqs and all that sort of thing—what process is best in the world of mainframe where we have this whole established approach to focus, especially on the security related stuff, while giving the functional stuff the appropriate elbow room to kind of prove itself?
Milt: That’s really a great question. What we’re finding is some clients do not want to have any of the new capabilities. They just want to have whatever PTFs that you think need to be taken care of, because they been called in as trouble tickets if you will, and then all the patches come in for those and they’re kind of looking at that space. They prefer to have that as opposed to all the new features. So you really have two roles. One is all the new features that a vendor could provide, which could introduce new security risks because you don’t have the framework in place to go ahead and look at all the new features. Then you have the other part, which is like continuous maintenance where if you want to make sure all the latest maintenance is applied, and the maintenance that is applied are the things that are really concerning for that environment. You know, what is the environment? So if you just think about it for a few minutes, if you looked at the whole enterprise, it’s going to have all kinds of different applications that are up and running, and then there’s one part of it you’re going to have in the z/OS environment is this whole patch management concept. In the mainframe environment you’re going to look at things like well, is my database protected? Is my baseline reporting working? Do I have all my libraries in place? What about automated security and monitoring? What about privileged access monitoring and backups? What about my MFA? Is my password working correctly? What about encryption? Am I encrypting things in flight? Am I encrypting things that are not in flight? Then you have this thing called “patch management.” How do I get this installed and implemented? It’s not a simple task because you’re looking at all these other spots in your enterprise that you want to make sure that are protected. Each one of those is an individual group. Let’s just take backups as an example. Let’s say that you’re using a backup system and you have to implement patches for your backup system to make sure that it’s working absolutely correct because you have to send them off to disaster recovery. When do you put that in and why do you put it in? What is the reason you put it? You don’t want to have your backup system not working correctly, because let’s say that your disaster recovery system is there for a disaster. So should that be the very first thing that you make sure is absolutely correct because the system you’re relying on in production all day long is the one that’s working. So you now have this whole science of how do you prioritize to make sure that your enterprise environment is being secured, protected, and operating with operational efficiency, yet at the same time reducing as much risk as possible. It’s not a simple task and we’re probably not going to cover all the points in 20-30 minutes but I think it’s almost like a new science that’s coming out for the mainframe environment. How do we get all the patches in? How do we get them correctly, and how do we manage them? How do you prioritize which are the most important ones to implement?
Reg: Now this is sort of interesting because it’s also parallel to another journey we’ve had on the mainframe. You know for a very long time IBM and the various ISVs have tried to offer solutions that front end SMP if you will—and try to simplify putting together all the stuff, applying and obviously z/OSMF—or as I like to say, zed/OSMF—is one of the more recent ways that they really make the maintenance a lot more simple and straightforward, but you know you’ve got some additional challenges—you’re talking for example about ensuring the provenance and the reliability of given patches. So what’s the current state of the art then in terms of being able to deal with all these various challenges on the mainframe?
Milt: So an approach that we’ve been requested to look at, is a way that we can put it all into one file if you will, and I’m going to call it the gold file, and that gets processed across the z/OS enterprise. Then we ensure that each of the LPARs that this is going to is the correct information that is supposed to be sent to each one of the LPARs. We apply some math to it to make sure that each one of them are loaded correctly and that we get the latest updates that are put in, but I’m finding is that if you don’t have a really well-organized automated process to get this across your enterprise, the amount of manpower that you have to implement in order to make this work, and it shouldn’t be really external security manager-dependent. Whether it’s RACF, ACF2, or Top Secret, you should be able to get this sent to all of your whole enterprise environment, and what we have found and our customers have found is that you have to have a way of aggregating the information. Then you have to have a way of delivering the information. You have a pipeline that’s going to update each of the LPARs and each of the databases in a very organized fashion, and you apply math to it to make sure that everything has been implemented properly, then that becomes your gold standard to move forward. We actually didn’t think of this. We had a couple of large clients come to us and said, we really need some help with this. We’re not getting it done fast enough and correct enough, and we’re nervous because auditors are stepping in. They audit the whole enterprise. They’re saying okay, on your active directory systems we’re doing good—the Microsoft, the open systems, the IBM i, the UNIX systems, the Linux systems, all of these systems are getting all the patches put in real quickly. Then you get to the mainframe environment, and the auditors are looking hard at this. They’re saying are we making sure that all the latest releases of software and all the latest patches are put in in a timely fashion? When they find that they’re not being implemented quickly and properly, it becomes an audit finding. In the past when somebody would come in and do an audit—I don’t mean to throw stones—but the auditors would come in and ask a mainframe person are you doing this, this, and this and they have a checklist because a lot of the auditors don’t even know what they’re looking at. They’ll talk to a systems person typically and the systems person would say, oh yeah, that’s good. The authorized libraries are protected. Oh yeah, we have that taken care of. We encrypt everything in flight. Yup, we have no open ports; all of those are taken care of. Yes, everybody went to security training. We automatically make sure our passwords are changed every 30 days. We don’t recycle our passwords at all, and you know that’s kind of what happens. So the auditor gets this list and he just checks them off. Now we’re finding that we’re getting firms like Deloitte are coming in with very seasoned, smart z/OS people looking at these systems to make sure that they’re following the rules that have to applied to audit, that they’re just not a checklist anymore. I believe what’s happened is because so many systems are being hacked—it’s almost like everyday something shows up, [but] not because we don’t have good security on the mainframe. It’s the most securable platform there is. You just need to apply all the capabilities that you have. We talk to shops where whether they’re using our tools or not, they implement some percentage of all the capabilities that we provide or that IBM provides or Broadcom provides. So the whole getting the software updated and getting it delivered and tested and making sure that it gets across the whole enterprise is one more of those requirements the auditors are really starting to pay attention to. We spend a fair amount of time as a company with ISACA and going to the audit events. When I talk with them, the people that are actually doing the audits, they are paying a lot of attention to the mainframe, much more than they used to. The people that are doing the work are skilled and frequently we get called in by someone like Deloitte to look at some things that maybe when they do a migration or something, they want to talk to us about was it done correctly. Do they have the latest release of software, the right stuff on it, do they have any homegrown stuff that hasn’t been checked in years? All of those kinds of questions have to be looked at, and then you want to make sure it gets dispersed correctly across the enterprise.
Reg: You have something there when you talk about homegrown stuff, because one of the things that really struck me about the mainframe—the mainframe itself is like 59 years old, since System/360—and we got shops where they’ve written stuff that was back in the 1960s and 1970s that is still in use you know and may have lost the person who actually created it. IBM and ISV software has long since come out with functionality that’s in the product that you can move to, but people are all still maintaining their own in-house stuff, and then add onto that having a reliable provenance for the maintenance that they get from IBM and all the various other vendors, being able to track all of these things and ensure that they’re not creating exposures just by the very fact of having stuff that isn’t properly controlled. There must be some really important tooling in place to really ensure the reliability of what is being done.
Milt: That’s a really great point. Really, this is one of the key things we end up with some of these mainframe shops that we’re working with. They have developed their own delivery system. A lot of this stuff was written in REXX and Assembler—and maybe it’s some guy in the corner that’s getting ready to retire in three years, and an example of that is a client we are working with. They’re actually not using our tools. They had systems written by a very sharp individual who lives in another country, and he’s getting ready to retire. He ran into some health reasons and now they’re nervous and they’re one of the key sponsors and architects of this system that we’re building. We’ve got to find another way to make sure we don’t have this problem. They ended up with a serious situation because if anything happens, now you have this code that’s doing all this delivery of software across their enterprise and the person that’s developed the code and knows the stuff intimately—they have a point of failure which is really a bad thing to have. We run into other organizations that have similar things to what you’re talking about, Reg. I went to a client recently and this person that we’re working with—he’s a system guy, a developer—and they wanted him to write the compliance checks, and he said I don’t want to do that. I’m not going to write the checks. They said no, you have the capability with this software that we have from this vendor. You can write the checks and modify them yourself and everything else. He said I’m not doing it and they ended up doing business with us. They wanted to use a vendor’s tool—they have all the checks in place—and I asked him why after we spent some time with them. He said because I don’t want to be responsible for the maintenance for the checks and all the regulations that are coming out. Let’s say that I say that we’re compliant and then somebody gets into our system and they hack us. Who do you think they’re going to hold responsible? He said I’m the person they’re going to hold responsible. I’m the systems guy. I’ve been here 30 some odd years and I am the smartest person on this system. I know it extremely well but I don’t want to be responsible for doing the maintenance for all the compliance stuff that comes out, because that puts me in a situation that I’m telling the company that I work for—it’s a bank—that everything is okay when I don’t know that it’s okay, because I’m not on the front line discovering all that stuff on a daily basis. So the thing that you’re talking about, Reg, is that it’s moving so quickly. It has to be dependent on vendors that have all the capability of looking at the latest stuff that’s out and tracking it and then putting out an easy method for them to go to the customer zone, pull the stuff down, get it on this system, make sure it doesn’t have any holes in it and then get it broadcast across the network, whether it’s for compliance or it’s for encryption or password protection or whatever it happens to be. But that’s the space that we think as a vendor that we need to help the market space. This isn’t like we’re going to go in and rip and replace IBM out or rip and replace Broadcom out. This is just a capability that you add onto your system to meet a wild requirement that you have today and how do you go ahead and implement it? You’re absolutely right. These systems have been around a long, long time. We get calls where somebody says I’ve got some REXX code this guy wrote 20 years ago. It’s running this application. Can you help figure out what it is? They’ll ask us to take a look at it. We’ll dissect it because we’re a development shop. We’ll figure it out. We’ll document it and just say it’s a good thing you didn’t cut that off, but the very thing you’re talking about is this is not a trivial task. So when the auditor comes in and says what does that code do, and the systems guy that’s been there 15 years goes, I have no idea. How do you apply maintenance to that?
Reg: Now one additional dimension to our conversation until we sort of bring it to the apex of our topic here, I guess, is the concept—and it’s been implicit in so much of what you said—of vulnerabilities. On the one hand the vulnerability of in-house code, where the vulnerability is perhaps the departure of the person who wrote it as much as it’s no longer covering all possible circumstances. But then also the vulnerabilities—recent PTFs come out just to deal with a zero day vulnerability, and that leads to some really radical behaviors like something called Patch Tuesday. Can you maybe talk about some of the additional perspectives that come with the whole concept of vulnerabilities?
Milt: Yeah, the whole idea of Patch Tuesday—you do you know where that originated? Patch Tuesday?
Reg: I’m not sure I do. I’m thinking of Patch Adams, but I’m sure that’s not it.
Milt: That’s not it [laughs]. No, Patch Tuesday I believe was when Microsoft would come out and every Tuesday you had to update your computer. So that was the whole concept of Patch Tuesday. You know we use a lot of PCs here. We’ve dealt with PCs. We use a lot of Microsoft tools and we go through the same process internally that we go through in patches. Look at your phone. How many times do you get an alert on your phone that you need to download this patch? The way the hackers are working today, if they could find a way to get into the Vanguard software and get into our coho—which is impossible to do because we don’t have a single telephone line that’s connected to it—get into your source code and figure out how to implement some kind of hack, some kind of code that you can go ahead and violate it, they would do it. So we’re very, very careful not to do that, but you know there’s a lot of vendors out there. They’re getting this software from all around the world. I have no idea where they’re getting it from, and it’s being built all kinds of places. Take a look at Solar Wind. Was it Solar Wind that had that big hack?
Reg: That sounds right.
Milt: Yeah, and so what they did is they went to the supplier of the software. They looked at the supplier of the software to figure out how they could do it. Take a look at what just happened at MGM. They got into their system, and it was Okta data I believe. They got into the file transfer system of Okta data and they were able to do whatever they needed to do. I could be wrong about that but I’m pretty sure I’m close to it and so—
Reg: Yeah, that sounds right.
Milt: Right. So the reason they do that is because it’s much easier for a hacker to get paid on a ransom than it is to steal the data and try to sell it on the dark web. There are companies right now, if you go online and take a look, there are actual companies that run a ransomware capability where you hire them and you say okay, I work at this organization and I want to attack them. You can pay per month or pay a fee or a commission, and they will go in and do all the ransomware work for you. They’ve outsourced theft, and so—this is my opinion—it’s just a matter of time that the mainframe is going to be falling under the same requirements.
Reg: This is so important. You know, it’s so funny. People run the whole spectrum, from the mainframe is long gone to the mainframe is so reliable we don’t need to worry about it, but all of it amounts to it’s somebody else’s problem. I don’t want to worry about it, but it’s the system of record and so at some point—you know, hopefully before a bunch of disasters—we need to be conscious of the fact that it is the system we have to be most responsible for.
Reg: We have to always anticipate possible vectors of vulnerability and deal with it. So maybe if you could kind of tie this altogether for us now and talk about how we can look on the mainframe to managing all these things, everything from the latest maintenance to the latest vulnerabilities, in a manner that allows for the fact that the people doing this are probably not going to have 20-30 years’ experience. They’re going to be relatively new, but they have to be effective.
Milt: Yup. It’s an area that we end up talking about as a company to the CEO and CISOs, and the very thing that you’re bringing up is that many of these people that we’re talking to, you know I probably have 15-20 years on them, and they’ve come from an open systems background and they automatically think because it’s the mainframe it’s automatically secure, that z/OS is automatically secure. But we get called into environments all the time where people are getting and doing nefarious things. We had a university where the kids figured out how to hack it and they changed their grades. We had another organization where they had some code outsourced and the people that they outsourced it to weren’t vetted properly and they put some bad code into the system; they were able to pay themselves a fair amount of money before it was discovered. So it has to be looked at like any computer system that you want to make sure that everything is correct on it. My first recommendation is for the people that are looking at these systems, you really need to do a thorough assessment. There are a lot of good companies that do it. We’re one, but there’s other companies that provide that capability. You get in and really do a full assessment on your system. Sometimes it takes a week or two weeks and they put a team in and they look at every single vulnerability that’s on the system. Then once that’s complete, we go ahead and do a pen test on it where we look at all the vulnerabilities that were in there, and we map all those through and we ask for a low-level ID. We see if we can take that low-level ID and elevate it to go through their system. We have never failed in being able to take a low-level ID after we’ve gone in and done the assessment to get control of it. So that’s amazing, and what that tells me is that it’s not because the people that are using the system don’t have money and they’re not spending money or they’re not interested in it, it’s that these are very complicated systems doing a lot of hard work and it only takes one spot in the system to have a vulnerability in it. If you’re not applying the maintenance the vendor gives you and if you’re not applying the maintenance like IBM gives you, you’re like two or three releases back on your software, there’s a reason they come out with a new release. You’ve got to apply this stuff and you’ve got to make sure that it is in and you should be doing an annual health check on your system by a third-party organization that you completely trust, that has to answer to somebody else. Pay the money and get it done and get a complete assessment of your system, pen testing of your system, and make sure there’s no back doors on it. We have always gone into systems and found places that need to be fixed. That’s my recommendation: to make sure you get everything applied, all the way to having open ports to authorized libraries. It’s just set up wrong, not taken care of and today the number of people that are leaving, that are retiring, and what they’re doing now is they take this and they’re giving it to an outsourcer, like five or six major outsourcers. You cannot outsource your fiduciary responsibility. If you’re a company and you’re in the banking industry and you outsource it, you’re still as a banking industry, responsible to the people that you are protecting their money for.
Milt: There are tools out there to help you go ahead and lock it down, create a white list, run compliance checks, make sure the people have password protection. One place that we see a lot on the mainframe, which falls into this whole idea of making sure that you have patch management, all the latest stuff in is the people that have privileged use—they need to have multifactor authentication. We think about maybe 60% of the mainframes do not use multifactor authentication for privileged users. So what difference does it make if you put all the latest stuff in if the people that have full access to it aren’t using multifactor authentication and then if you’re not having software in there to follow what they’re doing—privilege access monitoring. You need to have all that in place to make sure that you can monitor that they’ve gone ahead and put the patches in, they’ve spread it across all the LPARs that have to be there. You can watch all their activity that they’re doing exactly what you’re expecting them to do from an audit perspective. So that’s just my 40 years of 30 seconds of information. This is a complicated, complicated system. It’s doing real hard work. It’s running the critical infrastructure of the world. I think it was put out by Broadcom—it’s like 70-80% of the data is still housed on mainframes. It’s a high number and it’s used in critical infrastructure.
Reg: Yes. So that said obviously we’ve proven we can keep talking about this stuff. It’s important, it’s interesting and it’s fun to talk about because it’s so compelling. But we’ve got a limited amount of time to work with this, so where can people go to find out more?
Milt: Well, they can go to go2vanguard.com and they could find out. Another place they really need to look is SHARE. That’s where all the mainframers go, the systems people. I highly recommend that. I recommend that you look at ISACA. If you’re not an ISACA member, you should sign up for ISACA, get your certifications for audit or at least get an idea what they’re looking at. We provide all kinds of training. Whether you’re with us or not, you can go online and get free training. IBM has a lot of free stuff that you can go to and learn how to best support your system and NIST I think is an excellent place to go to take a look at what they’re providing in the marketspace today to help lock your systems down. They not only do the DISA STIGs, but they are doing the CIS capability which I think is something you should take a look at and get that implemented in your system. Think about it like you’re flying an old 747 that had all these dials. You want to make sure you have the dials set on your system so when you talk away from it, it’s going to fly in the spot that you want it to. These are complicated systems and it takes a lot of work and maintenance to make sure that they are correct. You can’t pass anything up. I would use NIST as a reliable resource and then the training that you get with SHARE and training that you get with ISACA. GSE is coming up in the UK, so if you’re out of the UK and want good training, they have great, great training. The Vanguard Security Conference, we just had that. It’s all about security. We had IBM there, Broadcom, KRI which was bought by Rocket. We go into encryption. It’s not a demonstration on tools. It’s the idea that you tighten your system up for your organization. A lot of resources, you just need to work them and you don’t have to be a user of Vanguard to call us up and get help. We’ll help you no matter what you have.
Reg: Well, this has been outstanding. I always value and get a lot out of our conversations. So that said, I’ll be back with another podcast next month, but in the meantime check out the other content on TechChannel. You can also subscribe to the weekly newsletters, webinars, ebooks, solutions directory and more on the subscription page. I’m Reg Harbeck.