AIX Development: Core Dumps vs. System Dumps (and Why You Need Them)
ATS Group’s Justin Richard Bleistein opens his AIX Development series, highlighting software development principles and discussing AIX core dumps and system dumps—and how they can help identify and remediate critical issues
Welcome to the “AIX Development” series, a vital exploration of software development on the AIX operating system running on IBM Power servers. While centered on AIX, this series will also delve into concepts that are universally applicable to software development on any computer platform.
In this inaugural article, I will focus on two essential diagnostic tools in AIX: core dumps and system dumps. Understanding these tools is crucial for any developer or system administrator working with AIX. By the end of this article, you will understand what core and system dumps are, how they differ and why they are indispensable in diagnosing and fixing software and system issues.
I intend to equip you with the knowledge to effectively utilize core and system dumps to identify the root causes of crashes and malfunctions. Whether you’re dealing with a failing user application or a critical system failure, knowing how to interpret these dumps can save valuable time and resources. This knowledge will enable you to restore functionality quickly, maintain system stability and enhance the overall reliability of your IT infrastructure.
Jump to: Core vs. System Dumps
System Software vs. User Software
We must first understand the difference between system and user software to understand software development. System software controls the computer’s hardware and always runs while the computer is powered on. The system software is the intermediary between user programs and the computer hardware. An example of system software is called an operating system. An example of a user program is the Chrome web browser on personal computers or the Oracle database on business-type computers. Examples of operating systems are as follows:
- Mainframe computer: z/OS, TPF, z/VM and VSE
- Personal computer (micro): Linux, MS Windows
- Server computer (mini): UNIX (AIX, Solaris, HP-UX and Linux)
- Apple Mac computer: OS X
- iPhone smartphone and tablet: OS X
- Android smartphone and tablet: Linux
When a program is written on any platform, it is written in human-readable text, adhering to the syntax rules of the programming language in which it is being written. However, computers can’t understand human-readable formats. Computers only read binary—1’s and 0’s. A compiler program must convert that human-readable text to binary so that the computer, specifically the processor, can understand and execute it. The compiler has to be capable of compiling/generating an executable binary file that can be executed on a computer’s OS and processor. Many different compilers are meant to be run on various computer platforms.
There are two primary ways to write programs for computers. We can write a program that won’t run on an operating system but just on the CPU; these programs are written in a language called assembly, which is very specific to the CPU you are writing the program for. One problem with writing in assembly is that it has to be very specific to a CPU, and it is very labor intensive. You have to tell the computer what to do and how to do absolutely everything.
The other approach is to write a program to run on a computer’s CPU within the OS environment. These are compiled or interpreted programs, but we’ll focus on compiled programs for this discussion. It’s important to note that most of the programs we work with today are compiled programs that run within operating systems, providing a practical and reliable solution for software development and instilling confidence in our development practices.
Defining AIX
Now that we’ve established the fundamental concepts of system and user software and the basics of software compilation, it’s time to delve into the specific operating system that will be the focus of this article: AIX (Advanced Interactive eXecutive). Understanding AIX’s role and functionality is crucial, as it forms the backbone for many business-critical applications on IBM POWER servers. Let’s explore what makes AIX unique and how it supports the software development process, particularly in compiling and executing programs on this robust platform.
This is IBM’s proprietary version of the UNIX operating system and runs on IBM POWER servers. This means that these servers have IBM POWER CPUs in them. AIX has been the foundation of many Fortune companies throughout the last few decades and is a trusted business asset. When you compile a program for AIX, like a C program, for example, you compile it in an executable format known as XCOFF eXtended Common Object File Format. This executable format is similar to .exe on Windows or ELF on Linux operating systems. The executable format of a file is structured in a certain way. It contains specific metadata and the binary code for that file to execute in that OS environment—and, ultimately, on that CPU type. In other words, the executable format is the compiled binary the CPU can understand and execute and the format the OS run time understands.
AIX has many compilers for different languages but mainly uses the IBM XL C/C++ compiler. This compiler primarily takes human-readable C or C++ code and translates it to a binary executable format: XCOFF. The IBM POWER server processors can execute this format and run within the AIX operating system runtime environment.
The following trivial C program is in a human-readable format, and it’s usually written in either a special program called an IDE, Integrated Development Environment, or it could be written in the UNIX VI text editor. This C program outputs the text “Hello World!” to the system console or standard out:
#include <stdio.h>
int main() {
printf(“Hello world!”);
return 0;
}
Once the C program is written, it is fed into a compiler, which compiles the file into an XCOFF executable format to execute on AIX. This is basically how all programs are developed in AIX. Now, this is admittedly a very simplified example, but this is the crux of it:
- Write the program in human-readable text adhering to the syntax rules of your language.
- Compile the human-readable text into a machine-readable, binary, executable file.
- Test execute the program.
- Document any bugs observed.
- Fix bugs using human-readable code.
- Compile the human-readable text into machine code again.
- Test execute the program again until desired results are achieved.
- Distribute the program to the user base/market.
I wrote an article on relinking Oracle binaries that gives more in-depth information on the program compilation process.
Core vs. System Dumps
With a solid understanding of AIX and the software development process on this platform, we can now look into two critical diagnostic tools: core dumps and system dumps. These tools are invaluable for diagnosing and resolving software and system failures, ensuring the stability and reliability of your AIX environment. In this section, we will explore the differences between core dumps and system dumps, their specific uses, and how they can aid in identifying and remediating issues. By understanding these dumps, you’ll be better equipped to troubleshoot and maintain your systems, ultimately enhancing your ability to support mission-critical applications.
System Dumps
A system dump occurs when the AIX operating system encounters a critical failure and can no longer function. The AIX operating system has a major software component known as the kernel, which always runs in a protected state. The kernel is the core of the operating system, managing hardware communication and controlling the program execution flow. When a failure happens, the kernel “panics” and dumps all the information currently in real memory into a device, such as a logical volume on disk, known as the system dump device. This process captures the system’s state at the time of the crash, allowing for detailed post-mortem analysis.
Once the system dump is complete, the operating system typically restarts automatically to restore service as quickly as possible. The system dump file is then analyzed, usually by the OS vendor—in this case, IBM. The analysis can reveal the cause of the crash, whether it be a device driver issue, hardware failure, or a bug in the OS code. IBM support may provide solutions such as updated drivers, OS settings modifications, or software patches to prevent future occurrences.
Core Dumps
In contrast, a core dump pertains to user space programs rather than the entire operating system. When a user program crashes, such as the C program example earlier in this article, it generates a core dump file. This file contains the memory contents specific to the crashed program at the time of failure. Unlike a system dump, a core dump does not affect the operating system’s overall stability, allowing the OS to continue running other processes.
The core dump file is typically written to a designated filesystem location. It provides a snapshot of the program’s memory, including variables, call stacks, and other diagnostic information. This dump is crucial for developers to analyze the cause of the program crash. The responsibility for examining core dumps often falls to the in-house development team or third-party software vendors, who can then provide a fix for the user program. Understanding and utilizing core dumps allows for more effective debugging and troubleshooting application-level issues.
Similarities Between Core Dumps and System Dumps
I’m sure you noticed a similarity between these two types of software dumps—kernel/system and user program dumps. Ultimately, we need someone with internal knowledge of the program and potentially access to the source code to read these two types of dumps, determine the cause, and then devise an action plan to correct it. That is because the kernel is just an extensive program that is always running in system memory.
A user program written in C by an in-house developer or a user application vendor is like the kernel C program written by the OS vendor. The kernel is written in C, C++, and some assembly, while the user space programs manually in high-level languages like C and C++. The difference between the two is that the kernel resides and executes in a protected computer area and has higher authority than user application programs; however, they are both just programs that ultimately need to run on the CPU.
The OS kernel program system dumps, or the user application program dump is significant in determining the cause of a critical issue. These two types of dumps are one reason companies must have software vendor support to use these essential troubleshooting tools.
Access to the writers and maintainers of your company’s OS or user application is invaluable. It can significantly reduce downtime, save on labor costs, and ensure rapid resolution of critical issues. Ensuring you have robust support can directly impact your company’s operational efficiency and reliability, especially when dealing with your mission-critical applications.