The Difference between Binary and Integer (And Why You Should Care)
To avoid surprises when using binary format in databases (i.e., Int or SmallInt in SQL terms or B data type in DDS), you must understand how it all works.
By Jon Paris and Susan Gantner11/14/2017
We have a chart in our presentation on the new RPG free format declarations that contains the comment "DO NOT USE!!!" next to the data type BinDec—the replacement for type B in fixed format. Shortly after delivering that session recently an attendee contacted us to ask how he could avoid using binary when his SQL-created tables (aka physical files) all seem to have binary data types despite the fact that he had described them in SQL as Integer or SmallInt.
Good question! Our answer was broken down into three parts.
- What is the difference between Binary and Integer?
- What's wrong with type B or BinDec in RPG?
- If I have integers defined externally in SQL tables (aka files), how should I deal with them in RPG programs to avoid the B/BinDec issues?
What's the Difference?
Regardless of whether you define a variable in RPG as BinDec (short for Binary Decimal), Int (short for Integer), or using fixed format declarations, define them as either B or I data types - the data is always stored in the same binary format. The difference is in how RPG handles those binary values. We'll get to that in a moment.
First, let's have a quick refresher on different formats of numeric data. Most RPGers are well versed in the difference between our 2 decimal formats—packed and zoned—but less familiar with binary format numbers.
A zoned decimal number occupies one byte for each digit. Simple. Packed decimal numbers take half a byte, otherwise known as a nyble (yes, really) for each digit. This is possible because each of the 10 possible values for a digit (0-9) can fit within the capacity of four bits. In fact four bits can accommodate the range of values from 0 to 15, so some of the capacity of the four bits is being "wasted" when we use it for packed numbers. Packed also requires a nyble for the sign of the number so a single digit packed field requires one byte. A 2 or 3 digit packed uses two bytes, 4 and 5 digit numbers three bytes and so on.
Both packed and zoned are decimal—i.e., base 10 representations. Binary values use base 2, so each position contains only a value of 0 or 1 versus 0 thru 9 for decimal values. Whereas we said the packed decimal was a bit wasteful, binary format is very efficient - it uses every bit (pun intended) of space. It takes only 1 BIT (that’s 1/8th of a byte) to store a binary value. So a 2-byte signed binary field contains 16 positions which can be used to store 15 1's and 0's plus a sign position. That's 15 binary positions. Translating that to a decimal value, those 2 binary bytes can hold a value of up to 32,767 plus a sign.
So in 2 bytes, we can store a 2-digit zoned decimal value up to 99, a 3-digit packed decimal value up to 999 or a 15-position binary value of up to 32,767. If we step up to 4 bytes, we could hold a zoned decimal value of 9999, a packed decimal value of 9,999,999 or a binary value of 2,147,483,647.
Since we as humans only really "think" in decimal, does it matter how may bytes are taken up by a binary value? In some cases it is important - if only to know how to translate API documentation where parameters are described as Bin(2) or Bin(4). If that's all you really want to know about binary, here's a table (below) you may find helpful.
As you can see from the table, when we express numbers in pure binary there is no longer a strict digits to bytes correlation. And this is where the trouble begins. Now we'll move on to how this works in RPG.
What's Wrong with B or BinDec?
We mentioned earlier that we humans think in terms of decimals. So with an eye toward being human-friendly, back in the mists of time IBM decided that even when a value was stored as binary, it should still be defined in RPG (and in COBOL too) in terms of decimal digits and it should still only store numbers where every digit could run the full range.
Problem is, although a 2 byte binary can hold a value up to 32,767 (5 digits) it can't hold the maximum 5-digit value of 99,999, so originally in the RPG language, a 2 byte binary variable had its maximum value restricted to 4 digits or 9,999.
This restriction is enforced by converting the binary value to a 4 digit packed decimal value. If the binary value couldn't fit into the 4 decimal digits, it was treated as an overflow error in RPG.
So the problem with B/BinDec definitions in RPG are two-fold:
- RPG refuses to accept the full range of values that are technically possible in a binary field of a given number of bytes. E.g., a value of 10,000 "fits" into a 2 byte binary field but RPG rejects that value as too big.
- The constant conversion from binary format to packed decimal format and back again is inherently inefficient.
Fast forward to V3R2 when RPG added proper Integer (and unsigned integer) data types. Those were defined back then as I or U data types. Now in free format, we use Int or Uns. This did away with the restrictions and inefficiencies of the old B data types, but only when developers actually used them rather than continuing to use the older methods.
So all is good now with RPG and binary data, right? Things are certainly better—at least as long as RPG developers define internal RPG variables using the new I/Int (or U/Uns) types rather than continuing to perpetuate the old B data type habits of days gone by.
But what about externally defined data?
What about Db2 binary data used in RPG?
Binary format data can be defined to Db2 using either DDS or SQL DDL. In our experience, it’s pretty rare to run into DDS-described binary data. But with the advent of more and more shops using SQL to create their tables (aka files), they seem to be more likely to use Integer (4 byte binary) or SmallInt (2 byte binary) data types when the values in them clearly don't require decimal positions. So we're seeing more binary data in databases these days.
What is the effect of using those tables in RPG? Sadly, the answer is "it depends."
By default, a 2, 4 or 8 byte binary column (aka field) defined as SmallInt or Int in Db2 will be brought into RPG with the same range restrictions and inefficiencies as the old B or BinDec data type. There is a Ctl-Opt (or H spec) keyword, however, that changes that behavior; we'll cover that in a moment.
But first, how much of a problem is this in real life? The vast majority of the time, it's not likely to cause a major issue. After all, how likely is it that those columns really need the full value range of values possible in an integer of the specified length? Unless you use the binary variables very frequently or have many integer data types in your databases, the inefficiencies of the extra conversion may not have a noticeable impact. However, we've seen situations where databases converted from other platforms contain many integer values and the performance difference was obvious.
Just so you know, here’s what happens if RPG reads a record that does contain a decimal equivalent value of, for example, 12,345 in a column defined as SmallInt in SQL. That is a valid value for a SmallInt but too big to fit into an RPG B/BinDec variable. By default, if that happens and that record is read using READ or CHAIN in RPG, the value will be truncated to fit into 4 digits, so the value will be read as 2,345.
Not good, but it gets worse. Suppose the RPG program subsequently does an UPDATE operation to that record? Even though the value of that field may not have been explicitly changed in the RPG logic, the truncated value will be written back to the database (assuming that %Fields was not used to restrict the columns being updated.)
Surely we must have a way to prevent this from happening, even though it may be a rare occurrence.
Note that you could force the program to produce a run-time error rather than truncating the value when this type of overflow occurs during I/O operations by specifying TRUNCNBR(*NO) - which is not the default and may impact any old ADD?/SUB/MULT type operations left in your code. But a better way to solve this problem would be to prevent the overflow from occurring at all by having RPG treat these externally-defined variables as integer data types.
To do this, on the Ctl-Opt statement or the H spec, simply include the keyword EXTBININT or EXTBININT(*Yes). This will allow the full range of integer values in those externally described binary fields. It will use a length consistent with an integer definition of either 5 or 10, depending on the length of the external definition. Binary fields of length 1 to 4 are treated in RPG as Int(5) and those of length 5 to 9 are treated as Int(10).
Note that this keyword only affects database binary format fields that have 0 decimal positions defined. You may think that surely all binary fields have 0 decimal positions. But when using DDS to define them, it would be possible to have defined decimal positions.
You can read more about the details about the use of EXTBININT in the RPG reference manual and also in this support link from IBM.
One detail we haven't mentioned so far in this discussion is that, by default, RPG redefines as packed decimal all numerics (including binary) that originate from externally described files. This has always been true of RPG, even with zoned decimal data. As a result, while the use of EXTBININT allows the full range of integer values, it does not restrict your code from utilizing a range of values beyond that which an integer field of that length should contain. It will, in other words, allow a value of 99,999 into a field defined externally as SmallInt, which should actually contain a maximum of 32,767. However, this behavior can be circumvented by including the definitions in a data structure in the RPG program. This will force the compiler to honor the external field definitions and therefore to be defined and treated the same as internally defined RPG integers.
The moral of this story is that if you want to avoid "surprises" when using binary format in databases (i.e., Int or SmallInt in SQL terms or B data type in DDS) you need to understand how it all works. In particular it is advisable to take action to ensure the appropriate range of values are allowed in those fields. This can be done partially by using the EXTBININT keyword to ensure RPG allows the full range of values without applying the restrictions from the old binary data types. By including a data structure with the external fields you can go even further to ensure that only the appropriate range of values is allowed for those fields. The easiest way to do this is to define an externally described data structure using the ExtName or LikeRec keywords.
Well, That was Complicated
This is likely way more information than the questioner really expected (or wanted) to hear in answer to what probably seemed to them like a simple question. Many of you probably already understood much of the background information here - or at least enough of it to get your jobs done!
Even with the length of this discussion, we haven't covered all the details. For example, we haven't really discussed the fact that integers can be unsigned as well as signed. Nor have we talked about SQL's BigInt data type because the rules are different for that. And we have only touched very briefly on the definition of binary format data using DDS. If you have more to add to this topic or additional questions to ask, please use the comment section below.
We will continue our explorations on this topic and we'll share additional information in our blog and/or in later articles.
Jon Paris and Susan Gantner are technical editors for TechChannel.
See more by Jon Paris and Susan Gantner