Accessing the Data in Core Dumps
If you're like most UNIX administrators, you probably have a crontab or some other housekeeping program that regularly searches your systems for core dumps, backs them up (maybe), and then deletes them.
This leaves you to try to puzzle out their cause from information contained in the AIX error log. Of course, about half the time, core dumps aren't flagged, so other methods must be used to discern their cause, such as examining them with a utility like strings or some other reader.
Chances are, that's as far as you'll get. You'll dutifully inform your application developers or DBAs about the location of the core dump and when it occurred, and then move on to other duties. Meanwhile, the core dump will sit in its archive, forgotten about. And that's unfortunate.
Core dumps contain a snapshot of the memory address space for an application — or a database, utility or middleware — at the time of the event that caused the core dump to happen. That's a wealth of data that, if diagnosed, could help you avoid future core dumps. So how do you access this data? It can actually be done quickly and fairly easily.
Seriously, this information can be gathered in about 10 minutes. By entering a few commands, you'll not only know which program caused the core dump, but also the file or function that lies at fault and the working thread that was interrupted.
First, you need the proper tool to examine your core dump. The dbx utility is a standard debugger that ships with AIX; it's included on the operating system distribution DVDs. Part of the bos.adt.debug fileset, dbx is usually installed by default. It has many uses, from providing a controlled environment for running programs to diagnosing misbehaving processes live and giving you the ability to step through a program one line at a time. And yes, dbx can be used to examine core dumps. Here's how: To start your examination, change your working directory to its location and call up the debugger on the core file:
dbx -C ./core
Your output will be similar to this:
[lpar:/home/raym] dbx -C ./core Type 'help' for help. warning: Object file is not specified, and the object file "perl5.10.1" mentioned in the core file doesn't exist in the current directory or doesn't match with the core file (ignored). Some info may not be available. [using memory image in ./core] reading symbolic information ...warning: cannot open perl5.10.1 warning: no source compiled with -g Segmentation fault in extend_brk at 0xd011119c ($t1) 0xd011119c (extend_brk+0x2dc) 90040004 stw r0,0x4(r4) (dbx)
In this example, I'm using a core dump that was generated on a system I was working on that was using a 32-bit version of Perl and had recently been updated to AIX 7.1. This initial screen above is basically a summary of the conditions in memory at the time of the core dump. Note the object file name in line 3; it will likely point to the executable that caused the core dump. If you're only looking to identify the program that caused the dump, most of the time, this screen will give you your answer. However, this information can be a bit nebulous, so it's often worth digging deeper.
The last line of the output (dbx) is the dbx utility's command prompt. This is where you'll enter all of your commands. To determine which program caused the dump, enter the corefile command at the (dbx) prompt:
(dbx) corefile Process Name: perl5.10.1 Version: 430 Flags: FULL_CORE | CORE_VERSION_1 | UBLOCK_VALID | USTACK_VALID | LE_VALID Signal: SEGV Process Mode: 32 bit
At this point you'll be presented with several lines of information, including the name of the process that core dumped, the signal that caused the core dump — in this case a segmentation violation (SEGV pr); most core dumps are generated by this signal — and whether the process was executing in 32- or 64-bit mode. The identified process will, of course, point back to the executable having problems. But where in the programming code of that executable did the fault happen? To determine this, issue the dump command:
(dbx) dump extend_brk(internal error: assertion failed at line 3915 in file frame.c ??, internal error: assertion failed at line 3915 in file frame.c ??, internal error: assertion failed at line 3915 in file frame.c ??) at 0xd011119c
In this case, we have a failed assert called from the frame.c file. An assert is simply a reaction to a Boolean expression that should always be true; a "false" reaction will generate a fault, causing the core dump and the executable to fail. Sometimes dbx points to a location in a specific file; sometimes it points to an offending function.
So with just two commands, you've pinpointed the program that faulted as well as the area of programming code that caused the fault. Now you're ready to fix the code. (In this case, as noted, we were running a 32-bit version of Perl on a system that had recently been upgraded to AIX v7.1. When we upgraded to a 64-bit version of Perl, our problems went away.)
Most of the time, running corefile and dump is sufficient for starting remediation of the problem that caused the core dump. However, if you work with application developers who require more information to write a patch or an upgrade, you should be aware of several other dbx commands that tell you about the status of the working thread that encountered the error:
(dbx) th - thread kernel user state flag held tid pri sched state tid pri nice sched state ----------------------------------------------------------------------- mode scope cancellation join- boosted cursig wchan function pending state able >$t1 0x2ea02af 112 other run 0x000001 1 58 other running 0x0002 0x000 no kernel system no ed yes 0 0
Entering th – at a dbx prompt tells you a lot about the thread's state at the time of the fault, and proc gives you a full dump of the process's memory usage. Enter coremap and you'll see all of the libraries the process had loaded as well as whether they were text or data segment. Likewise, the word map by itself gives you verbose information on all of the object files in use by your process.
Analyzing core dumps with the dbx utility provides a comprehensive picture of their causes as well as clear options for dealing with them. Details about individual core dumps can be packaged in an easily understood form and presented to your vendor, the appropriate support personal and even to non-technical personnel in your management chain.
Perhaps best of all, by using the dbx utility, in most instances you'll be able to diagnose and remediate the conditions that create core dumps with any outside help. This should lead to more reliable and stable AIX systems.