Skip to main content

Hashes and How to Create Them on z/OS

Hashes can be really handy. Want to confirm that a file was received intact? Compare sender and receiver hashes. Want to see if a file has been updated recently? Compare hashes generated over time. Want a fast key for a lookup table? Use a hash as the key.
But what are hashes, and how can they be created on z/OS?

What Is a Hash?

A hash is data generated from other data. Ideally, hashes will be much smaller than the original data, and can be generated fast.

Suppose we have two really big files, and we want to know if they’re the same. Rather than comparing each and every byte, we could generate a hash (or “checksum”) for each, and compare those. Much faster.

Hashes ideally have the following features:

  • The same: generating a hash multiple times for the same input should always produce the same output
  • One-way: the original data cannot be recreated from the hash. So, we can share the hash without sharing the original data: security. This is especially important when using hashes for encryption.
  • Well hidden: we can’t find out information about the original data from a hash. For example, hashes are usually the same length regardless of the length of the original data. We can’t determine the length of the original data from our hash.
  • Almost unique: there is a very low chance that different input data will create the same hash code, avoiding “hash collisions.”

Creating a Simple Hash

So, how do we create a hash? We could create our own hash algorithm. For example, the following REXX code will create a hash by merging the length of a string with the first and last 10 characters:

hash = Left(string,10) D2c(length(string)) Right(string,10)

If we apply this to the first three sentences of this article, we would get the hash “Hashes can > er hashes” (the greater than sign is the EBCDIC representation of the length).
This algorithm is OK, but not great. We include some of the original text, and a small change in the original text won’t change the hash much (if at all).

Stronger Hashes and ICSF

Math geeks have done a lot of work to create “standard” hash algorithms that are far stronger than our attempt above.

One of the most famous of these is the MD5 algorithm, creating a 128-bit long hash. We could create our own code to generate a MD5 hash. A better idea is to use the API provided by z/OS Cryptographic Services (ICSF).

The following COBOL program shows how this can be done:

* -----------------------------------------------------------     
      * Variables we use in our program                                  
      * -----------------------------------------------------------     
       Working-Storage Section.                                        
       01 HashStuff.                                                    
          02 RetCode               Pic 9(8) Comp-5.                        
          02 ReasCode              Pic 9(8) Comp-5.                        
          02 Exit_Data_Length      Pic 9(8) Comp-5.                        
          02 Exit_Data             Pic X(04).                              
          02 Rule_Array_Count      Pic 9(8) Comp-5 Value 1.                
          02 Rule_Array            Pic X(8) Value 'MD5'.                   
          02 MyStringLen           Pic 9(8) Comp-5 Value 112.            
          02 MyString              Pic X(112) Value                      
           'Hashes can be really handy. Want to confirm that a file was 
      -    'received intact? Compare sender and receiver hashes.'.
          02 Chaining_Vector_Length  Pic 9(8) Comp-5 Value 128.            
          02 Chaining_Vector       Pic X(128).                             
          02 Hash_Length           Pic 9(8) Comp-5 Value 16.               
          02 Hash                  Pic X(16).                              
          02 ICSF_API              Pic X(08) Value 'CSNBOWH'.              
 
      * -----------------------------------------------------------
      * Procedure Division                                        
      * -----------------------------------------------------------
       Procedure Division.                                         
           Call    ICSF_API     Using  RetCode                     
                                 ReasCode                          
                                 Exit_Data_Length                  
                                 Exit_Data                         
                                 Rule_Array_COUNT                  
                                 Rule_Array                        
                                 MyStringLen                       
                                 MyString                          
                                 Chaining_Vector_Length            
                                 Chaining_Vector                   
                                 Hash_Length                       
                                 Hash.                             
 
      * Output the results.
            Display 'RetCode : ' RetCode  
            Display 'ReasCode : ' ReasCode
            Display 'Hash_Length : ' Hash_Length

            Display 'Hash: ' Hash

This produces the output:

RetCode : 0000000000 
ReasCode : 0000000000
Hash_Length : 0000000016
Hash:    D  Ypö    k

The hash created is a hexadecimal number: x’7431 FEC4 FB01 E897 CC29 9D0C 2B92 74EA’.
The good news about MD5 is that it is a standard algorithm. So, anyone should be able to create an MD5 hash of our paragraph, and get the same results. Let’s check this using one of the many MD5 generators online (https://www.md5hashgenerator.com). This produces X’ E173 73A8 1931 8EDA 56D8 C7FE 225A 4C7A’. But this isn’t the same!

Our string on z/OS was in EBCDIC; the website we used stored the string in ASCII. So, we have different data. If we want to share our MD5 hash, we may need to convert our string to another code page before generating the hash.

More Secure Hashes

MD5 was originally designed for cryptography, but has since been “broken” or compromised. Today, it’s only used when security isn’t that important.

If we want more secure hashes, we need a better algorithm. ICSF supports several stronger algorithms, including SHA-256, SHA3-512, RPMD-160, and SHAKE256.

The COBOL code above can be easily modified to use some of these algorithms. For example, to generate a SHA-256 hash, we change the Rule_Array declaration to:

02 Rule_Array            Pic X(8) Value 'SHA-256'.

Other ICSF Hashes

The CSNBOWH API we called in our program is part of what IBM call the Common Cryptographic Architecture (CCA). The idea is that programs on different systems and platforms can use the same APIs. If a program on z/OS system A encrypts data using CCA, another program on Linux system B can decrypt it using the same CCA APIs.

ICSF offers another CCA API to generate Modification Detection Codes (MDC). These are also hashes, using different algorithms. MDC algorithms such supported by ICSF include MDC-2 and MDC-4: the CSNBMDG API is used to generate these.

An alternative to CCA is the PKCS#11 standard, sometimes called “Cryptoki.” ICSF also offers PKCS#11 APIs, including the CSFPOWH API to create hashes.

Commands and Files

So far, we’ve used application programs that call ICSF APIs to generate our hashes. But that’s not the only way. z/OS also provides z/OS UNIX commands. In the following example, we use MD5 command to generate a MD5 hash for a z/OS UNIX file:

DZS:/: >md5 /u/dzs/test.txt                             
MD5 (/u/dzs/test.txt) = 09217b768dcdc32dcc3aed3f1ba73c47
z/u/dzs/test.txt holds one line of text: the same three sentences we’ve used above. If you’ve been paying attention, you’ll note that the hash produced by the MD5 command is different to the hash we obtained from our COBOL program before.

Let’s pipe a string to this MD5 command:

DZS:/: >echo 'Hashes can be really handy. Want to confirm that a file was received intact? Compare sender and receiver hashes.' | md5                           
MD5 (-) = 09217b768dcdc32dcc3aed3f1ba73c47
The same hash, and still different to our COBOL program. But why?Let’s get a hexadecimal dump of our test.txt file using the z/OS UNIX od command:

DZS:/: >od -cx /u/dzs/test.txt                                              
0000000000     H   a   s   h   e   s       c   a   n       b   e       r   e
                C881    A288    85A2    4083    8195    4082    8540    9985
0000000020     a   l   l   y       h   a   n   d   y   .       W   a   n   t
                8193    93A8    4088    8195    84A8    4B40    E681    95A3
0000000040         t   o       c   o   n   f   i   r   m       t   h   a   t
                40A3    9640    8396    9586    8999    9440    A388    81A3
0000000060         a       f   i   l   e       w   a   s       r   e   c   e
                4081    4086    8993    8540    A681    A240    9985    8385
0000000100     i   v   e   d       i   n   t   a   c   t   ?       C   o   m
                89A5    8584    4089    95A3    8183    A36F    40C3    9694
0000000120     p   a   r   e       s   e   n   d   e   r       a   n   d    
                9781    9985    40A2    8595    8485    9940    8195    8440
0000000140     r   e   c   e   i   v   e   r       h   a   s   h   e   s   .
                9985    8385    89A5    8599    4088    81A2    8885    A24B
0000000160    n                                                           
                1500
See that last line? z/OS UNIX ends the file with end of line characters (the hex numbers x’1500’). These characters are included in the input data that MD5 uses to create the hash: giving a different hash. Other control characters like ‘end of line’ characters will also be included when generating the hash.This isn’t the end of the issues with files. We created three z/OS datasets with the same three sentences, and used the z/OS UNIX MD5 command for each:

DZS:/: >md5 "//'DZS.TEXTN'"                             
MD5 (//'DZS.TEXTN') = b4aceea10e6339d9f9cb7033bde6d38e  
DZS:/: >md5 "//'DZS.TEXTNO'"                            
MD5 (//'DZS.TEXTNO') = 5c4acb2f5f7481b7568532eae26133e3 
DZS:/: >md5 "//'DZS.TEXTFBL'"                           
MD5 (//'DZS.TEXTFBL') = 500c3a496e1cce729dee831ecb6649e2
DZS:/: >md5 "//'DZS.TEXTVB'"                            
MD5 (//'DZS.TEXTVB') = 7431fec4fb01e897cc299d0c2b9274ea 

Different every time. TEXTNO is RECFM=FB, LRECL=126. z/OS pads the end of these records with spaces, resulting in different input data to the MD5 program. TEXTNO is the same, but has been edited in ISPF with NUM ON: adding a number to the final eight characters of each record.

TEXNFBL is also RECFM=FB, but with LRECL=256: a longer record with more spaces at the end—different data, different hash. Finally, TEXTNVB is a RECFM=VB dataset. Variable blocked datasets have information at the beginning of each record with the record length.

This MD5 command isn’t the only z/OS UNIX command for hashes. Others include sha256, sha512 and rmd160.

Other Hash Creation Options

There are a couple of other options that can be used to create a hash. We could call these z/OS UNIX commands from batch using the BPXBATCH or similar utility. For example:

//GENHASH  EXEC PGM=BPXBATCH
//STDOUT   DD SYSOUT=*      
//STDERR   DD SYSOUT=*      
//STDPARM  DD *             

SH sha256 "//'DZS.TEXT'"

SAS users can use hash functions like md5 and sha256 provided with SAS. So, we could code:

Data MYDATA1;
   Set INFILE.DATA333
   Hash = md5(THREESENT);

Java programmers can use the MessageDigest classes to create hashes. For example:

MessageDigest md5Digest = MessageDigest.getInstance("MD-5");
md5Digest.update(firstThreeSentences); 
byte[] outHash = md5Digest();
Finally, Assembler programmers can use the CHECKSUM instruction. This doesn’t produce a standard hash like MD5, but is still a lightning-fast way to create a 32-bit long hash. Here’s some code to do it:
LA  R4,STRING                  R4 -> String        
         L   R5,STRINGL                 R5 = String Length  
         XR  R2,R2                      Zero R2             
LOOP     CKSM R2,R4                     R2 = Hash (checksum)
         BNZ LOOP                       If CC <> 0, have more to do 
 
…
STRING  DC  C'Hashes can be really handy. '                  
        DC  C'Want confirm that a file was received intact? '
        DC  C'Compare sender and receiver hashes.'           

STRINGL DC  AL4(*-STRING)

This program returns the (hexadecimal) number x’ 8E51768C’.

Standard Hashes and ICSF

There are several ways to generate a hash in z/OS. However, we’ll probably want a standard hash, and that’s where ICSF comes to the rescue. It can be called from a program calling an API, or a z/OS UNIX command, and provides features to generate many different types of hashes: using CCA or PKCS#11 APIs.


Key Enterprises LLC is committed to ensuring digital accessibility for techchannel.com for people with disabilities. We are continually improving the user experience for everyone, and applying the relevant accessibility standards.