I recently had two enlightening and divergent telephone discussions that illustrated how important rigorous error handling is to a business application. First, unbeknownst to me, my cable company—one of the world’s largest—double-processed a payment I made late at night. I suspected something was wrong when my password also got revoked, so I called them, waited for 40 minutes, then was told it was because maintenance was underway. I was also warned to ignore a message to fix my signon because that would revoke me if I followed posted directions. I only discovered the double-billing when I checked my bank statement days later.
Finding this, I called for an explanation, and the customer representative (CSR) said it was likely due to their website coming down while I scheduled the payment. I asked why there’d been was no warning and he didn’t know. I asked the extra payment be refunded, and he declined by saying they did maintenance after midnight. This company was a worldwide enterprise, and this practice affected many people. I suggested this problem could be fixed with simple procedures and programming, then asked him to submit my suggestion. He declined by saying it was too expensive, and I knew further debate was moot.
The next call was much less contentious and much more productive. I was placing a Christmas order for some friends from a mail-order company that specialized in top-shelf meats and other culinary delights. Having earned a couple special offers from previous purchases, I’d reached a conundrum of how to use these discounts with items that offered free shipping. When I selected one option, the other one disappeared, but I couldn’t tell why. So I gave up and called for help, explaining my situation to one of their CSRs.
She explained the two options were mutually exclusive and I suggested it would be nice if a message would have explained that while ordering, eliminating the confusion. She replied that was a good idea, and she’d forward that to her fellows in IT, because they welcomed improvements. I further suggested the message be displayed in a real-time pop-up window while ordering, and again, the CSR enthusiastically agreed and thanked me, stating the company was very eager for these ideas because internal studies had shown ordering usability was directly related to profitability.
It’s Not Fun, It’s Not Exciting, but It’s Important
Obviously, building programs with effective and comprehensive error handling facilitate error determination and resolution, but they also simplify usage for employees, customers and other users. Ironically, many programming staff members choose to avoid the coding because it’s time consuming, difficult and hard to test. IT management disdains it because it can cause delays, cost overruns and adds complexity, but in the end, error handling reduces overall cost. If an application has inherent defects, they’ll have to be fixed sooner or later, and almost invariably, sooner is cheaper—especially during development.
IT management must set standards and establish a productive error testing environment by establishing a coherent and mandatory methodology for dealing with errors and confusing or cumbersome process flows or functional interactions. Standards must be defined, testing procedures developed and easy-to-use tools provided to enable an application programming staff. Test results must be vetted and testing steps should be continuously reviewed so the testing process itself can periodically be evaluated for effectiveness, effort and overall cost. Improvements should be continuous so the entire process evolves in simplicity, accuracy and functionality.
Let Your Constituency Be Your Guide
If structured properly, it’s not just IT that’s responsible for integrating usability into business processes—it’s the entire organization, both in terms of implementation and participation. Users should be encouraged, even incented, to report and document the problems they encounter; similarly programmers should be encouraged or incented to work with users to understand reported problems, have the patience to work with a technically-illiterate audience, determine a defect’s or procedure’s shortcoming or malfunction, identify a resolution and work together with their user to validate the correction.
IT management’s response to reported problems is often programmer-punitive and minimizes errors’ importance, and it’s partly because they’re measured on project completion time, cost, effort and function. It’s only when errors or catastrophic failures cause major outages, functional inconsistencies, degraded availability or poor response time that application integrity becomes an issue. Often, that’s too late, and a company—led by IT—goes into crisis mode. Suddenly, all that matters is fixing the problem ASAP, regardless of cost or effort. The company is forced to do what should have been done at the start.
Preemptive Error Testing Prevents Failures
A way to reduce disastrous project rollouts and avoid hindsight fixes is to build as much error handling as possible into the application during the development and testing phases. I’ve been intimately involved with various products that have extensive error-handling interfaces built into them (e.g. Integrated Cryptographic Support Facility (ICSF), Db2 and Data Base Control Facility (DBCTL)), but the product I know best is CICS Transaction Server and its younger cousins on midrange and server platforms called TXSeries. These error-handling interfaces provide the capability to preemptively manage errors.
CICS in all forms offers the Command Level Interface, a prebuilt set of routines that can be coded in application programs to invoke CICS error handling routines using syntax of “EXEC CICS HANDLE CONDITION (condition, various parameters)” as source code in COBOL, Assembler, PL/I, Java or other programming languages, and using a facility called the Command Level Translator to produce the appropriate programming language code (COBOL, PL/I, etc.) for compilation. Conditions that can be coded for are errors such as record-not-found (NOTFND), duplicate-record (DUPREC), length-error (LENGERR), etc. Names within the parentheses are label names in the source code of routines coded to handle the various errors. Here’s an example:
EXEC CICS HANDLE CONDITION ERROR(ERRHANDL)DUPREC(DUPRTN)LENGERR(BADLENGTH)NOTFND(MISSING).
Either next or elsewhere in the program:
ERRHANDL.MOVE BAD-ERR-MSG TO OPERATOR-NOTICEEXEC CICS SENDFROM(TERMINAL-MESSAGE)LENGTH(80)END-EXEC.
*
NEXT-STEP.
There’s an alternative way of error handling that uses the RESP and RESP2 parameters instead of HANDLE CONDITION. RESP and RESP2 represent two storage areas on a CICS command that CICS will populate after a command has been executed with numeric values that represent different error conditions. Here’s an example:
EXEC CICS
READ DATASET('TNTOKE')
INTO (WS-PCI-COMPLIANCE-RECORD)
RIDFLD(WS-CONTROL-RECORD-KEY)
UPDATE
RESP(WS-RESP)
RESP2(WS-RESP2)
END-EXEC.
*
IF WS-RESP IS NOT EQUAL TO DFHRESP(NORMAL) PERFORM 5050-READ-CONTROL-ERROR THRU 5050-EXITGO TO 9000-RETURN
ELSE ADD 1 TO WS-CURRENT-TOKEN-VALUE
END-IF.
Using an error handling interface like CICS Command Level, it becomes possible to anticipate errors and process them appropriately to avoid a transaction failure, provide an end-user a text that can explain how to recover from the error, provide failure information an IT professional can use to fix the problem or possibly provide actions to fix or take alternative actions that bypass or resolve the problem. Preemptive error handling is a non-disruptive way to deal with errors and failures, often enabling normal processing despite things having gone wrong.
When Things Go Wrong, They Can Be Fixed
Errors can be managed, anticipated, controlled and sometimes resolved if an IT development unit is structured to not only deal with them, but also build applications written to handle them. Not all errors or malfunctions can be dealt with, because there’s a plethora of things that can go wrong. But with a development infrastructure designed to use error handling facilities, a development philosophy that expects errors to occur and consequently creates logic to handle repetitively-occurring miscues, and general-purpose routines that provide information, hard errors can become soft errors.
There’s a cost to establishing an error-handling architecture in terms of tools, programmer time and effort, establishment of a programming infrastructure sensitive to error impact and resolution, end-user enablement, and improved usability. Applications should be designed to be simple, straightforward, dependable and flexible sources of productivity and foundations of customer satisfaction. They should be the softener of errors’ impact—a key component of availability, responsiveness and usability. Building an error-handling methodology that’s versatile and comprehensive is key to useful applications.