Hashes and How to Create Them on z/OS
Hashes can be really handy. Want to confirm that a file was received intact? Compare sender and receiver hashes. Want to see if a file has been updated recently? Compare hashes generated over time. Want a fast key for a lookup table? Use a hash as the key.
But what are hashes, and how can they be created on z/OS?
What Is a Hash?
A hash is data generated from other data. Ideally, hashes will be much smaller than the original data, and can be generated fast.
Suppose we have two really big files, and we want to know if they’re the same. Rather than comparing each and every byte, we could generate a hash (or “checksum”) for each, and compare those. Much faster.
Hashes ideally have the following features:
- The same: generating a hash multiple times for the same input should always produce the same output
- One-way: the original data cannot be recreated from the hash. So, we can share the hash without sharing the original data: security. This is especially important when using hashes for encryption.
- Well hidden: we can’t find out information about the original data from a hash. For example, hashes are usually the same length regardless of the length of the original data. We can’t determine the length of the original data from our hash.
- Almost unique: there is a very low chance that different input data will create the same hash code, avoiding “hash collisions.”
Creating a Simple Hash
So, how do we create a hash? We could create our own hash algorithm. For example, the following REXX code will create a hash by merging the length of a string with the first and last 10 characters:
hash = Left(string,10) D2c(length(string)) Right(string,10)
If we apply this to the first three sentences of this article, we would get the hash “Hashes can > er hashes” (the greater than sign is the EBCDIC representation of the length).
This algorithm is OK, but not great. We include some of the original text, and a small change in the original text won’t change the hash much (if at all).
Stronger Hashes and ICSF
Math geeks have done a lot of work to create “standard” hash algorithms that are far stronger than our attempt above.
One of the most famous of these is the MD5 algorithm, creating a 128-bit long hash. We could create our own code to generate a MD5 hash. A better idea is to use the API provided by z/OS Cryptographic Services (ICSF).
The following COBOL program shows how this can be done:
* ----------------------------------------------------------- * Variables we use in our program * ----------------------------------------------------------- Working-Storage Section. 01 HashStuff. 02 RetCode Pic 9(8) Comp-5. 02 ReasCode Pic 9(8) Comp-5. 02 Exit_Data_Length Pic 9(8) Comp-5. 02 Exit_Data Pic X(04). 02 Rule_Array_Count Pic 9(8) Comp-5 Value 1. 02 Rule_Array Pic X(8) Value 'MD5'. 02 MyStringLen Pic 9(8) Comp-5 Value 112. 02 MyString Pic X(112) Value 'Hashes can be really handy. Want to confirm that a file was - 'received intact? Compare sender and receiver hashes.'. 02 Chaining_Vector_Length Pic 9(8) Comp-5 Value 128. 02 Chaining_Vector Pic X(128). 02 Hash_Length Pic 9(8) Comp-5 Value 16. 02 Hash Pic X(16). 02 ICSF_API Pic X(08) Value 'CSNBOWH'. * ----------------------------------------------------------- * Procedure Division * ----------------------------------------------------------- Procedure Division. Call ICSF_API Using RetCode ReasCode Exit_Data_Length Exit_Data Rule_Array_COUNT Rule_Array MyStringLen MyString Chaining_Vector_Length Chaining_Vector Hash_Length Hash. * Output the results. Display 'RetCode : ' RetCode Display 'ReasCode : ' ReasCode Display 'Hash_Length : ' Hash_Length Display 'Hash: ' Hash
This produces the output:
RetCode : 0000000000 ReasCode : 0000000000 Hash_Length : 0000000016 Hash: D Ypö k
The hash created is a hexadecimal number: x’7431 FEC4 FB01 E897 CC29 9D0C 2B92 74EA’.
The good news about MD5 is that it is a standard algorithm. So, anyone should be able to create an MD5 hash of our paragraph, and get the same results. Let’s check this using one of the many MD5 generators online (https://www.md5hashgenerator.com). This produces X’ E173 73A8 1931 8EDA 56D8 C7FE 225A 4C7A’. But this isn’t the same!
Our string on z/OS was in EBCDIC; the website we used stored the string in ASCII. So, we have different data. If we want to share our MD5 hash, we may need to convert our string to another code page before generating the hash.
More Secure Hashes
MD5 was originally designed for cryptography, but has since been “broken” or compromised. Today, it’s only used when security isn’t that important.
If we want more secure hashes, we need a better algorithm. ICSF supports several stronger algorithms, including SHA-256, SHA3-512, RPMD-160, and SHAKE256.
The COBOL code above can be easily modified to use some of these algorithms. For example, to generate a SHA-256 hash, we change the Rule_Array declaration to:
02 Rule_Array Pic X(8) Value 'SHA-256'.
Other ICSF Hashes
The CSNBOWH API we called in our program is part of what IBM call the Common Cryptographic Architecture (CCA). The idea is that programs on different systems and platforms can use the same APIs. If a program on z/OS system A encrypts data using CCA, another program on Linux system B can decrypt it using the same CCA APIs.
ICSF offers another CCA API to generate Modification Detection Codes (MDC). These are also hashes, using different algorithms. MDC algorithms such supported by ICSF include MDC-2 and MDC-4: the CSNBMDG API is used to generate these.
An alternative to CCA is the PKCS#11 standard, sometimes called “Cryptoki.” ICSF also offers PKCS#11 APIs, including the CSFPOWH API to create hashes.
Commands and Files
So far, we’ve used application programs that call ICSF APIs to generate our hashes. But that’s not the only way. z/OS also provides z/OS UNIX commands. In the following example, we use MD5 command to generate a MD5 hash for a z/OS UNIX file:
DZS:/: >md5 /u/dzs/test.txt MD5 (/u/dzs/test.txt) = 09217b768dcdc32dcc3aed3f1ba73c47
Let’s pipe a string to this MD5 command:
DZS:/: >echo 'Hashes can be really handy. Want to confirm that a file was received intact? Compare sender and receiver hashes.' | md5 MD5 (-) = 09217b768dcdc32dcc3aed3f1ba73c47
DZS:/: >od -cx /u/dzs/test.txt 0000000000 H a s h e s c a n b e r e C881 A288 85A2 4083 8195 4082 8540 9985 0000000020 a l l y h a n d y . W a n t 8193 93A8 4088 8195 84A8 4B40 E681 95A3 0000000040 t o c o n f i r m t h a t 40A3 9640 8396 9586 8999 9440 A388 81A3 0000000060 a f i l e w a s r e c e 4081 4086 8993 8540 A681 A240 9985 8385 0000000100 i v e d i n t a c t ? C o m 89A5 8584 4089 95A3 8183 A36F 40C3 9694 0000000120 p a r e s e n d e r a n d 9781 9985 40A2 8595 8485 9940 8195 8440 0000000140 r e c e i v e r h a s h e s . 9985 8385 89A5 8599 4088 81A2 8885 A24B 0000000160 n 1500
DZS:/: >md5 "//'DZS.TEXTN'" MD5 (//'DZS.TEXTN') = b4aceea10e6339d9f9cb7033bde6d38e DZS:/: >md5 "//'DZS.TEXTNO'" MD5 (//'DZS.TEXTNO') = 5c4acb2f5f7481b7568532eae26133e3 DZS:/: >md5 "//'DZS.TEXTFBL'" MD5 (//'DZS.TEXTFBL') = 500c3a496e1cce729dee831ecb6649e2 DZS:/: >md5 "//'DZS.TEXTVB'" MD5 (//'DZS.TEXTVB') = 7431fec4fb01e897cc299d0c2b9274ea
Different every time. TEXTNO is RECFM=FB, LRECL=126. z/OS pads the end of these records with spaces, resulting in different input data to the MD5 program. TEXTNO is the same, but has been edited in ISPF with NUM ON: adding a number to the final eight characters of each record.
TEXNFBL is also RECFM=FB, but with LRECL=256: a longer record with more spaces at the end—different data, different hash. Finally, TEXTNVB is a RECFM=VB dataset. Variable blocked datasets have information at the beginning of each record with the record length.
This MD5 command isn’t the only z/OS UNIX command for hashes. Others include sha256, sha512 and rmd160.
Other Hash Creation Options
There are a couple of other options that can be used to create a hash. We could call these z/OS UNIX commands from batch using the BPXBATCH or similar utility. For example:
//GENHASH EXEC PGM=BPXBATCH //STDOUT DD SYSOUT=* //STDERR DD SYSOUT=* //STDPARM DD * SH sha256 "//'DZS.TEXT'"
SAS users can use hash functions like md5 and sha256 provided with SAS. So, we could code:
Data MYDATA1; Set INFILE.DATA333 Hash = md5(THREESENT);
Java programmers can use the MessageDigest classes to create hashes. For example:
MessageDigest md5Digest = MessageDigest.getInstance("MD-5"); md5Digest.update(firstThreeSentences); byte[] outHash = md5Digest();
LA R4,STRING R4 -> String L R5,STRINGL R5 = String Length XR R2,R2 Zero R2 LOOP CKSM R2,R4 R2 = Hash (checksum) BNZ LOOP If CC <> 0, have more to do … STRING DC C'Hashes can be really handy. ' DC C'Want confirm that a file was received intact? ' DC C'Compare sender and receiver hashes.' STRINGL DC AL4(*-STRING)
This program returns the (hexadecimal) number x’ 8E51768C’.
Standard Hashes and ICSF
There are several ways to generate a hash in z/OS. However, we’ll probably want a standard hash, and that’s where ICSF comes to the rescue. It can be called from a program calling an API, or a z/OS UNIX command, and provides features to generate many different types of hashes: using CCA or PKCS#11 APIs.