BLOBs, CLOBs and RPG
A look at BLOBs (binary large objects) and CLOBs (character large objects) in RPG.
By Jon Paris03/02/2006
In the last few months, we've fielded several questions about how to deal with large objects in the database from within an RPG program. Often these are related to accessing data that exists on or has been ported from other databases. Since other folks may need this information, we decided to take a look at the answers to these questions.
First of all, what are "large objects" and why would they appear in our databases? The data types we're talking about here are known as BLOBs (binary large objects) or CLOBs (character large objects). A third large object type, DBCLOB (double-byte character large object), is only used for languages that require a double-byte character set. Another related data type is a DataLink. All of these data types can only be created and accessed using SQL. There's no DDS or native RPG support for them.
A CLOB is similar to a character field except that it is, well, large. How large, you ask? A character field defined using DDS or SQL is limited to 32K. By contrast, a character field in RPG IV can be defined up to 64K. So if you were to have a long product description, for example, that might exceed 32K in size, you could deal with it in RPG (up to the 64K limit), but to store that description in a database record, you would need to use a CLOB.
You may wonder what you would do if your product description were longer than 64K. You could still store it in a CLOB, as long as it wasn't larger than 2 GB. Hopefully that will be large enough for even the largest of your product descriptions. Of course, there's now the issue of how to deal with those larger fields in your RPG programs. We'll come to that later.
The difference between CLOBs and BLOBs is that CLOBs, like other character fields, have a Coded Character Set Identifier (CCSID) associated with them. This can be used to translate the data from the character set in which it's stored into the character set in which it's being used. In the product description example we're using here, perhaps the lengthy product description is developed and maintained using a workstation-based program, which would typically be using an ASCII character set. The description could be stored in the database file in ASCII format, even though the rest of the "ordinary" character fields in the file were in EBCDIC format. If that description, or a portion of it, were to be used in an iSeries host-based application, it would be translated to EBCDIC automatically by the database. Typically, the CCSID of all character fields, including CLOBs, would be the same and defaults to the CCSID of the file.
This brings us to BLOBs, which may also be up to 2 GB in size, but which have no CCSID. Therefore, no character translation is done on the data in these fields, because it isn't character data. Images, music, compiled programs, compressed data, etc., are typically stored in BLOBs. It's far more likely that we will need to deal with CLOBs from an RPG program. But, since the techniques used to deal with both types are similar, we'll deal with CLOBs in our examples.
We also mentioned a DataLink as an alternative way of storing this type of data. Using a DataLink, a reference to a file in the IFS is stored in the database rather than storing the actual data itself. While using DataLinks in outside the scope of this article, be aware that they are another option for storage and management of this type of data.
To create a CLOB, we'll create a product description file with a product code, a short description and a long description. The long description is defined to be up to 70K with 1000 bytes allocated initially. (Note: We specified "Not Null" for all of the fields to simplify our program examples. In reality, CLOBs are more likely than many of the traditional data types to require null capability.)
CREATE TABLE PARTNER400/PRODUCT
(PRODCODE DEC (5 ) NOT NULL WITH DEFAULT,
PRODDESC CHAR (30 ) NOT NULL WITH DEFAULT,
LONGDESC CLOB (70K ) ALLOCATE(1000) NOT NULL WITH DEFAULT)
Now that we know what CLOBs are and how to create them, how do we deal with them from RPG? There's no data type to code in column 40 on the D-spec to correspond to CLOB. Instead, we use a keyword to define these fields: SQLTYPE. As mentioned previously, we'll only be able to access CLOB type fields using SQL. As a matter of fact, if you even include an F-spec in your program for a file containing a CLOB, your program may compile (with warnings--but who looks at those?), but it will fail to open the file at run time. You could natively access the rest of the data in the file in RPG only by creating a logical file omitting the CLOB. So the first thing you must understand before accessing CLOB data is how to use embedded SQL in RPG. If you haven't used embedded SQL before, you may want read "Bringing the Power of SQL to Your RPG."
Four techniques can be used to deal with CLOBs in RPG.
- Use the SQL CAST function to retrieve the data into an "ordinary" RPG field
- Bring the CLOB contents into a data structure using special CLOB support
- Access the CLOB contents via a LOB Locator
- Access the CLOB contents via an IFS file using a file reference
In this article, we'll examine examples of the first three techniques.
These first two techniques are most useful for accessing CLOBs that are no more than 32K in size (or where the data needed is located in the first 32K of data). Of course, data of that length could be stored in a character field, so why use a CLOB to store the data in the first place? This often happens when data is ported from other systems where CLOBs were more prevalent.
For the first technique, you don't need to define anything special in the D-specs for the character field that's to receive the data. Simply define a character field, or, better yet, a variable-length character field. Why is that better? See the article, "The Versatility of Variable-Length Fields." Then simply use the CAST function in SQL to translate the data into a character field, as illustrated below:
C+Select ProdCode, ProdDesc, CAST(LongDesc as VarChar(32000))
C+Into :ProdCode, :ProdDesc, :ClobDescChar
C+From ProductWhere ProdCode = :ProdCode
Using this technique, it would also be possible to retrieve data from the CLOB beyond 32K by using multiple SQL statements and the SQL substring function inside the CAST function. That way you could retrieve one 32K piece of the CLOB at a time, placing the data into one or more RPG fields.
The second way to bring CLOB data of 32K or less into your program is the way that's shown in the SQL manuals from IBM. The first thing needed is a definition of the structure where the CLOB data will be placed. This is defined in RPG IV using the CLOB parameter to the SQLTYPE keyword on the D-spec. The second parameter is the length of data you'll retrieve from the CLOB, which cannot be more than 32,767 in length. While you'll define it as a single field, the SQL precompiler will expand the definition into a DS with the character data and the length of the value in the character field as separate subfields. So if we put this into our program:
D ClobDescSSQLTYPE( CLOB : 30000)
The SQL precompiler generates this:
In your program logic, when you refer to the data from the CLOB, you reference ClobDesc_Data. It's important to note that you must interrogate the ClobDesc_Len field to determine the length of data that SQL put into ClobDesc_Data, since it won't clear the field or pad with blanks when loading the CLOB data into the DS. That's one of the reasons that the first technique is often better than this--you can use a variable-length field to receive the data and there won't be "old" data at the end of the field.
The SQL statement to put the CLOB data into the ClobDesc data structure would look something like the following.
C+Select ProdCode, ProdDesc, LongDesc
C+Into :ProdCode, :ProdDesc, :ClobDesc
C+From Product Where ProdCode = :ProdCode
The third way to access CLOBs is by using a LOB Locator. The locator is defined using the SQLTYPE keyword again, but this time, we use the parameter value CLOB_Locator. The code in our source member looks like this:
D ClobDescLocSSqlType( CLOB_Locator )
The SQL precompiler generates an unsigned integer field:
D CLOBDESCLOCS 10U 0
When using a LOB Locator, the CLOB data contents are accessed using this locator without bringing the entire CLOB content into your program. While a locator isn't actually a pointer data type, it's used in a similar way. If the CLOB is very large and/or if you only need to get to a specific portion of the content which isn't necessarily at the beginning, then using a CLOB locator may be a better way to go.
Having defined the CLOB locator above, the first SQL statement we use to access it is very similar to the one above:
C+Select ProdCode, ProdDesc, LongDesc
C+Into :ProdCode, :ProdDesc, :ClobDescLoc
C+From Product Where ProdCode = :ProdCode
However, we now don't really have any data from the CLOB in our program-just a locator to the data. To get the first part of the CLOB into a character field in our program, we might use the LEFT SQL function in conjunction with the Locator as shown below:
C+Set :DspDescClob = Left(:ClobDescLoc, Length(:ClobDescLoc))
Of course, as demonstrated earlier, there are easier ways to get the first 32K of data from a CLOB into our program. The CLOB Locator is better suited to locating and retrieving some specific subset of the data. Suppose in our long product descriptions we know we always have a section in the data that begins with the characters "Specifications:" which is immediately followed by a section that begins with the characters "Warranty:". We could use the CLOB Locator to find the starting position of the "Specifications" section and the beginning of the "Warranty" section by using the PosStr (Position String) SQL function (much like the SCAN function in RPG). Then we could use the Substring (Subtr) SQL function to retrieve the data from the CLOB between the beginning of one section to the beginning of the next. It would look something like the following:
C+Set :StartPosClob = PosStr(:ClobDescLoc, 'Specifications:')
C+Set :EndPosClob = PosStr(:ClobDescLoc, 'Warranty:')
C Eval LenClob = EndPosClob - StartPosClob
C+Set :DspDescClob = Substr(:ClobDescLoc, :StartPosClob,
It's important to note that CLOB Locators may NOT be used in programs compiled with COMMIT(*None). Commitment Control is required. If you don't normally use commitment control and journaling, this is an extra consideration. Also remember that you may not get a compile error if you specify COMMIT(*None), but the program won't work properly. You may or may not get a run-time error. SQL condition-handling procedures should always be in place and are the only way to programmatically check for this and other run-time errors that would otherwise only appear in the job log.
We covered examples of three different ways to access CLOB data from RPG. The fourth option allows for the CLOB data to be put into an IFS file. An RPG program could then process the data from the IFS using the IFS APIs. In addition, the DataLink data type allows the data to reside in the IFS (or some other file system) and simply be linked to the database records. We don't have the space to go into these other options here, but if this is a topic of great interest, please let us know and we can explore these additional options in a future article.
Jon Paris is a TechChannel technical editor.
See more by Jon Paris