Skip to main content

COBOL: What’s XML Got to Do With IT?

When I first encountered the works of Thomas Aquinas, which I’m embarrassed to say wasn’t until the beginning of my Master’s degree, commenced 29 years after I completed my bachelor’s degree in Computer Science, the first thing I thought was, “This could be put into XML!” Little did I know that XML was a direct result of these works—a fact that took me another half decade to figure out.

XML Origins

The reason I reacted to the massive and highly structured literary output of Aquinas that way was precisely its structure. As you wander, for example, through the vast troves of the Summa Theologica, you discover that it is formatted in a precise and consistent manner that lends itself exactly to a structured repository such as XML instantiates.

Fast forward … oh, are we talking XML or Aquinas here? Let’s say Aquinas first of all. Aquinas was still active when he passed at the paltry age of 49 (at which age I hadn’t even begun my Master’s degree), and the dates for the composition of his Summa, according to Wikipedia were 1265-1274. While he wrote other works of a similar nature and format, all of them available online at the INDEX THOMISTICUS (title all in uppercase because it was put together before lowercase was common on computers), the Summa is certainly seen as his ultimate achievement – at least chronologically.

So, let’s see: 2021 minus 1274, according to Google, is 747. So the history of XML stretches back a jumbo jet’s worth of years! Makes COBOL seem kinda young by comparison, doesn’t it?

COBOL Origins

Oh, but Rear Admiral Dr. Grace Hopper was born before either COBOL or the origin story of XML in computing really got going. In fact, the good maternal progenitor of so much that is human in modern computing, was born on December 9, 1906, and by 1949 she was working for the Eckert–Mauchly Computer Corporation (EMCC) that built UNIVAC I. Within three years she had become one of the originators of the idea of a compiled programming language, displacing the need to manually enter numbers as direct machine language programming.

Meanwhile, the year that Hopper started working for EMCC, Dr. Roberto Busa, S.J., met with IBM CEO Thomas J. Watson, Sr., at the IBM Manhattan headquarters to ask him to make alphabetic characters available on IBM’s 80-column punch-card machines so he could use them to create a lemmatized index of the works of Aquinas.

Watson pushed back, saying his people had told him it would be impossible. Fortunately, in the lobby of the IBM building, Busa had noticed and picked up a punch-card-shaped advertisement for IBM on which was written, “The difficult we do right away; the impossible takes a little longer,” and he now played this card. Watson folded, saying, “provided that you do not change IBM into International Busa Machines” according to Busa’s telling of the tale (taken from “Roberto Busa, S. J., and the Emergence of Humanities Computing” by Steven E. Jones).

As a consequence of this, not only did the required machines become available with alphabetic characters for the Index, and not only were alphabetic characters now firmly entrenched as part of the computing context, but a markup language was needed for the creation of the Index, to bring the textual data and electromechanics together in a functional result. So, at the very birth of modern electronic computing, alphabetic characters and a markup language were introduced in order to bring computing accessibility to the structured works of Aquinas, a mere 675 years after the passing of that great intellect.

Bridging Decades: Connecting COBOL to XML

Take a breath, now slowly exhale as you whisper “COBOL.” Yeah, I hadn’t forgotten where I was going with this. Fast forward 10 years. It’s 1959, and Hopper is part of the team that designed COBOL, a compiled language, a name and most of its early programming entirely lacking lowercase. That wouldn’t remain the case forever.

As COBOL became the expression of how to automate business processes into an electronic computing context, building on the humanizing effects of alphabetic characters and English-like syntax, the way its variables were structured lent them to taking an intentionally-structured approach to both programming and a rigorous approach to data handling (see my previous article).

Meanwhile, as Busa’s project moved to completion, his early work on a markup language was one of the threads leading to more advanced markup languages—remember Script and Generalized Markup Language (GML)? Eventually, as the internet hit prime time, HyperText Markup Language (HTML) became the standard for browser-compatible markup languages. Go to the INDEX THOMISTICUS page and do a “View Source” and you’ll see plenty of it. But there was a need for more: a language that was effectively a superset of HTML, that was about more than just telling a browser how to display text, but rather about storing textual data in a structured manner that was usable by any context where such structured data could be processed.

In 1998, then, five years after the original HTML standard was released (1993), 38 years after the COBOL definition went GA, 49 years after Busa met with Watson and Hopper joined EMCC, and 724 years after the passing of Aquinas and the conclusion of his Summa Theologica, the standard for eXtensible Markup Language (XML) was initially released, allowing structured data to be passed between programs and systems in a consistent and usable way.

Then something happened: COBOL started speaking to internet-connected systems and applications, even directly from established systems like the IBM Z mainframe. And if it was going to do so, it made sense to use a lingua franca of Internet textual data exchange.

Of course, as I’ve already said, one of the nice things about COBOL is how naturally structured its approach to data is. Well, another thing that’s nice is that this made it natural for it to share that structured data using a structured communication approach such as XML. So it was that XML functionality was subsequently introduced into COBOL that allowed it to express its structured variables as XML when sending their contents, for example, over the Internet, and to parse XML received into those same variable structures.

And now you know the connection.